[BioPython] blast parsing errors

Peter biopython at maubp.freeserve.co.uk
Mon Mar 5 15:12:25 UTC 2007


Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:

I am wondering if your trivial example triggered some "unusual" error 
page from the NCBI...

I would suggest you update to CVS, as we have made a lot of changes to 
the Blast XML support.  You would probably be safe just updating the 
following  Bio.Blast files, located here on your machine:

/sw/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIWWW.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py
/sw/lib/python2.5/site-packages/Bio/Blast/Record.py

If you don't know how to use CVS, then just backup the originals, and 
replace them with the new files download one by one from here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython

----------------------------------------------------------------------

This works for me using the CVS version of BioPython.  I have just made 
a string for rather than messing about with a fasta record object to 
keep the code short:

#Protein example, BLASTP
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

#BLAST cutoff
cutoff = 1e-4

fasta_rec = ">GI:121308427\nrslgmevmhernahnfpldlaavevpsing"

b_parser = NCBIXML.BlastParser()
result_handle = NCBIWWW.qblast('blastp', 'nr', fasta_rec, ncbi_gi=1,
                                expect=cutoff, format_type="XML",
                                entrez_query="Viruses [ORGN]")

#This returns a record iterator, changed after release of BioPython 1.42
b_records = b_parser.parse(result_handle)

for b_record in b_records :
     print "%s found %i results" % (b_record.query, 
len(b_record.alignments))
     for alignment in b_record.alignments:
          titles = alignment.title.split('>')
          print titles


Or, if you wanted to do a nucleotide BLASTN search, try:

fasta_rec = '>GI:121308427\nttagccatttatagatggaacttcaacagcagctaagtc' \
           + 'tagagggaaattgtgagcattacgctcgtgcatgacctccataccaagagatct'

and replace 'blastp' with 'blastn' in the call to qblast().

Peter



More information about the Biopython mailing list