[BioPython] blast parsing errors
    Peter 
    biopython at maubp.freeserve.co.uk
       
    Mon Mar  5 15:12:25 UTC 2007
    
    
  
Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:
I am wondering if your trivial example triggered some "unusual" error 
page from the NCBI...
I would suggest you update to CVS, as we have made a lot of changes to 
the Blast XML support.  You would probably be safe just updating the 
following  Bio.Blast files, located here on your machine:
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIStandalone.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIWWW.py
/sw/lib/python2.5/site-packages/Bio/Blast/NCBIXML.py
/sw/lib/python2.5/site-packages/Bio/Blast/Record.py
If you don't know how to use CVS, then just backup the originals, and 
replace them with the new files download one by one from here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython
----------------------------------------------------------------------
This works for me using the CVS version of BioPython.  I have just made 
a string for rather than messing about with a fasta record object to 
keep the code short:
#Protein example, BLASTP
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
#BLAST cutoff
cutoff = 1e-4
fasta_rec = ">GI:121308427\nrslgmevmhernahnfpldlaavevpsing"
b_parser = NCBIXML.BlastParser()
result_handle = NCBIWWW.qblast('blastp', 'nr', fasta_rec, ncbi_gi=1,
                                expect=cutoff, format_type="XML",
                                entrez_query="Viruses [ORGN]")
#This returns a record iterator, changed after release of BioPython 1.42
b_records = b_parser.parse(result_handle)
for b_record in b_records :
     print "%s found %i results" % (b_record.query, 
len(b_record.alignments))
     for alignment in b_record.alignments:
          titles = alignment.title.split('>')
          print titles
Or, if you wanted to do a nucleotide BLASTN search, try:
fasta_rec = '>GI:121308427\nttagccatttatagatggaacttcaacagcagctaagtc' \
           + 'tagagggaaattgtgagcattacgctcgtgcatgacctccataccaagagatct'
and replace 'blastp' with 'blastn' in the call to qblast().
Peter
    
    
More information about the Biopython
mailing list