[BioPython] blast parsing errors
Julius Lucks
lucks at fas.harvard.edu
Mon Mar 5 14:13:02 UTC 2007
Hi all,
I am trying to parse a bunch of blast results that I gather via
NCBIWWW.qblast(). I have the following code snipit:
-----------
from Bio imort Fasta
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
import StringIO
import re
#BLAST cutoff
cutoff = 1e-4
#Create a fasta record: title and seq are given
title = 'test'
seq = 'ATCG'
fasta_rec = Fasta.Record()
#Sanitize title - blast does not like single quotes or \n in titles
title = re.sub("'","prime",title)
title = re.sub("\n","",title)
fasta_rec.title = title
fasta_rec.sequence = seq
b_parser = NCBIXML.BlastParser()
result_handle = NCBIWWW.qblast
('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre
z_query="Viruses [ORGN]")
blast_results = result_handle.read()
blast_handle = StringIO.StringIO(blast_results)
b_record = b_parser.parse(blast_handle)
for alignment in b_record.alignments:
titles = alignment.title.split('>')
print titles
-------------
The issue is sometimes the blast parser chokes with tracebacks like:
File "./src/create_annotations.py", line 96, in get_blast_annotations
b_record = b_parser.parse(blast_handle) File "/sw/lib/python2.5/
site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
self._parser.parse(handler) File "/sw/lib/python2.5/xml/sax/
expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source) File "/sw/lib/
python2.5/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in
fatalError raise exception
xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well-
formed (invalid token)
I am not sure which alignment it choked on, but I would like to
rescue it with a try/except block if possible. But it seems to me
that if I did something like
try:
b_record = b_parser.parse(blast_handle)
except:
...
Then I would not get anything in b_record if an error raised in the
parsing. Rather, I would like to have whatever has been successful
up to the point of the error stored in b_record.
Is there any way to do this via the BioPython API, or do I have to
dig into the python xml parsing code?
Also, if anyone has a better idea of how to structure this code, I
would be very appreciative.
Cheers,
Julius
-----------------------------------------------------
http://openwetware.org/wiki/User:Lucks
-----------------------------------------------------
More information about the Biopython
mailing list