[BioPython] blast parsing errors

Julius Lucks lucks at fas.harvard.edu
Mon Mar 5 14:13:02 UTC 2007


Hi all,

I am trying to parse a bunch of blast results that I gather via  
NCBIWWW.qblast().  I have the following code snipit:

-----------
from Bio imort Fasta
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import StringIO
import re

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given

title = 'test'
seq = 'ATCG'

fasta_rec = Fasta.Record()
	
#Sanitize title - blast does not like single quotes or \n in titles
title = re.sub("'","prime",title)
title = re.sub("\n","",title)
fasta_rec.title = title
fasta_rec.sequence = seq


b_parser = NCBIXML.BlastParser()

result_handle = NCBIWWW.qblast 
('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
z_query="Viruses [ORGN]")
blast_results = result_handle.read()
			
blast_handle = StringIO.StringIO(blast_results)
b_record = b_parser.parse(blast_handle)

for alignment in b_record.alignments:
     titles = alignment.title.split('>')
     print titles

-------------


The issue is sometimes the blast parser chokes with tracebacks like:

   File "./src/create_annotations.py", line 96, in get_blast_annotations
     b_record = b_parser.parse(blast_handle)  File "/sw/lib/python2.5/ 
site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
     self._parser.parse(handler)  File "/sw/lib/python2.5/xml/sax/ 
expatreader.py", line 107, in parse
     xmlreader.IncrementalParser.parse(self, source)  File "/sw/lib/ 
python2.5/xml/sax/xmlreader.py", line 123, in parse
     self.feed(buffer)
   File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
     self._err_handler.fatalError(exc)
   File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in  
fatalError    raise exception
   xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well- 
formed (invalid token)

I am not sure which alignment it choked on, but I would like to  
rescue it with a try/except block if possible.  But it seems to me  
that if I did something like

try:
     b_record = b_parser.parse(blast_handle)
except:
     ...

Then I would not get anything in b_record if an error raised in the  
parsing.  Rather, I would like to have whatever has been successful  
up to the point of the error stored in b_record.

Is there any way to do this via the BioPython API, or do I have to  
dig into the python xml parsing code?

Also, if anyone has a better idea of how to structure this code, I  
would be very appreciative.

Cheers,

Julius

-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------






More information about the Biopython mailing list