[Biopython] Biopython local blastn query
Ara Kooser
ghashsnaga at gmail.com
Tue Jul 30 16:10:20 UTC 2013
Peter,
Thanks for catching that! I missed that one. I also needed to upgrade to
biopython 1.62b which I did. I still get one short sequence coming through.
*General question*
Hopefully one last question from me on this project. Can I query multiple
blast databased in a single command? I have all the nt.xx downloaded and
need to query each one to look for all my sequences.
Thanks!
ara
Here is the current code. Once I get this cleaned up I will push it over to
a github repo in case anyone wants it.
########################################################################################
#I want to read in one sequence at a time from a fasta file and blast it
against a local
#blast db.
from Bio.Blast.Applications import NcbiblastnCommandline
from Bio.Blast import NCBIXML
from Bio import SeqIO
from Bio import Seq
from Bio.SeqRecord import SeqRecord
nt = "/Users/arakooser/blast/db/nt.00"
#Where the database is located at
file_out = open("metadata_genus.level.csv","w+")
#Contains all the data my boss wants on the sequences
file_in = open("clear_genus_level.fasta")
#The main fasta file that needs to be blasted
fas_rec = SeqIO.parse(file_in,"fasta")
#Parses the main fasta file
for first_seq in fas_rec:
#Hopefully grabs the first sequence
#Takes that sequence from standard in and sumbits it to the blast
commandline and spits
#out an xml
result = NcbiblastnCommandline(task="megablast",query="-", db=nt,
evalue=0.001,
outfmt=5,
perc_identity=100,out="temp.xml",
max_hsps_per_subject=1, num_alignments=1)
stdout, stderr = result(stdin=first_seq.format("fasta"))
# print result
#Reading in the xml file.
#
record = open("temp.xml")
blast_record = NCBIXML.read(record)
record.close()
#print blast_record
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
title_element = alignment.title.split()
print title_element[1]+" "+title_element[2]+","+"
"+alignment.accession\
+","+" "+str(alignment.length)
file_out.write(title_element[1]+" "+title_element[2]+","+"
"\
+alignment.accession+","+"
"+str(alignment.length)+","+\
" "+hsp.sbjct+"\n")
On Tue, Jul 30, 2013 at 9:36 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> On Tue, Jul 30, 2013 at 4:32 PM, Ara Kooser <ghashsnaga at gmail.com> wrote:
> > Ivan,
> >
> > Thanks! I found the blastn documentation!! This looks like what I want.
> >
> > I am running blast 2.2.26. I am getting an error with those parameters.
> >
> > I entered the parameters as:
> > max_hsps_per_subject=1, num_alignments=1 in the NcbiblastnCommandline
> line
> >
> >
> > Error:
> > Aras-MacBook-Air:CEM Genus arakooser$ python meta_data_local.py
> > File "meta_data_local.py", line 30
> > -out="temp.xml", max_hsps_per_subject=1, num_alignments=1)
> > SyntaxError: keyword can't be an expression
> >
> > I think this means I am not using the correct keyword.
> >
> > ara
>
> Python function argument names can't have minus signs in them,
> check the -out bit which should probably just be out.
>
> Peter
>
--
Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an
sub cardine glacialis ursae.
Geoscience website: http://www.tattooedscience.org/
More information about the Biopython
mailing list