[BioPython] blast parsing errors
Christof Winter
winter at biotec.tu-dresden.de
Mon Mar 5 15:07:00 UTC 2007
Running your example, I get:
>>> ## working on region in file /tmp/python-18415Uda.py...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/tmp/python-18415Uda.py", line 25, in ?
result_handle =
NCBIWWW.qblast('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entrez_query="Viruses
[ORGN]")
File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1091, in qblast
rid, rtoe = _parse_qblast_ref_page(handle)
File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1133, in
_parse_qblast_ref_page
return rid, int(rtoe)
ValueError: invalid literal for int(): >
<head>
<title>NCBI Blast</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css">
<link rel="styl
>>>
I think I'm running the newest 1.44 version of NCBIWWW.py
Cheers,
Christof
Julius Lucks wrote:
> Hi all,
>
> I am trying to parse a bunch of blast results that I gather via
> NCBIWWW.qblast(). I have the following code snipit:
>
> -----------
> from Bio imort Fasta
> from Bio.Blast import NCBIWWW
> from Bio.Blast import NCBIXML
> import StringIO
> import re
>
> #BLAST cutoff
> cutoff = 1e-4
>
> #Create a fasta record: title and seq are given
>
> title = 'test'
> seq = 'ATCG'
>
> fasta_rec = Fasta.Record()
>
> #Sanitize title - blast does not like single quotes or \n in titles
> title = re.sub("'","prime",title)
> title = re.sub("\n","",title)
> fasta_rec.title = title
> fasta_rec.sequence = seq
>
>
> b_parser = NCBIXML.BlastParser()
>
> result_handle = NCBIWWW.qblast
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre
> z_query="Viruses [ORGN]")
> blast_results = result_handle.read()
>
> blast_handle = StringIO.StringIO(blast_results)
> b_record = b_parser.parse(blast_handle)
>
> for alignment in b_record.alignments:
> titles = alignment.title.split('>')
> print titles
>
> -------------
>
>
> The issue is sometimes the blast parser chokes with tracebacks like:
>
> File "./src/create_annotations.py", line 96, in get_blast_annotations
> b_record = b_parser.parse(blast_handle) File "/sw/lib/python2.5/
> site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
> self._parser.parse(handler) File "/sw/lib/python2.5/xml/sax/
> expatreader.py", line 107, in parse
> xmlreader.IncrementalParser.parse(self, source) File "/sw/lib/
> python2.5/xml/sax/xmlreader.py", line 123, in parse
> self.feed(buffer)
> File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
> self._err_handler.fatalError(exc)
> File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in
> fatalError raise exception
> xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well-
> formed (invalid token)
>
> I am not sure which alignment it choked on, but I would like to
> rescue it with a try/except block if possible. But it seems to me
> that if I did something like
>
> try:
> b_record = b_parser.parse(blast_handle)
> except:
> ...
>
> Then I would not get anything in b_record if an error raised in the
> parsing. Rather, I would like to have whatever has been successful
> up to the point of the error stored in b_record.
>
> Is there any way to do this via the BioPython API, or do I have to
> dig into the python xml parsing code?
>
> Also, if anyone has a better idea of how to structure this code, I
> would be very appreciative.
>
> Cheers,
>
> Julius
>
> -----------------------------------------------------
> http://openwetware.org/wiki/User:Lucks
> -----------------------------------------------------
>
>
>
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list