[BioPython] blast parsing errors

Christof Winter winter at biotec.tu-dresden.de
Mon Mar 5 15:07:00 UTC 2007


Running your example, I get:

 >>> ## working on region in file /tmp/python-18415Uda.py...
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/tmp/python-18415Uda.py", line 25, in ?
     result_handle = 
NCBIWWW.qblast('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entrez_query="Viruses 
[ORGN]")
   File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1091, in qblast
     rid, rtoe = _parse_qblast_ref_page(handle)
   File "/var/lib/python-support/python2.4/Bio/Blast/NCBIWWW.py", line 1133, in 
_parse_qblast_ref_page
     return rid, int(rtoe)
ValueError: invalid literal for int(): >
<head>
<title>NCBI Blast</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css">
<link rel="styl
 >>>

I think I'm running the newest 1.44 version of NCBIWWW.py

Cheers,
Christof

Julius Lucks wrote:
> Hi all,
> 
> I am trying to parse a bunch of blast results that I gather via  
> NCBIWWW.qblast().  I have the following code snipit:
> 
> -----------
> from Bio imort Fasta
> from Bio.Blast import NCBIWWW
> from Bio.Blast import NCBIXML
> import StringIO
> import re
> 
> #BLAST cutoff
> cutoff = 1e-4
> 
> #Create a fasta record: title and seq are given
> 
> title = 'test'
> seq = 'ATCG'
> 
> fasta_rec = Fasta.Record()
> 	
> #Sanitize title - blast does not like single quotes or \n in titles
> title = re.sub("'","prime",title)
> title = re.sub("\n","",title)
> fasta_rec.title = title
> fasta_rec.sequence = seq
> 
> 
> b_parser = NCBIXML.BlastParser()
> 
> result_handle = NCBIWWW.qblast 
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
> z_query="Viruses [ORGN]")
> blast_results = result_handle.read()
> 			
> blast_handle = StringIO.StringIO(blast_results)
> b_record = b_parser.parse(blast_handle)
> 
> for alignment in b_record.alignments:
>      titles = alignment.title.split('>')
>      print titles
> 
> -------------
> 
> 
> The issue is sometimes the blast parser chokes with tracebacks like:
> 
>    File "./src/create_annotations.py", line 96, in get_blast_annotations
>      b_record = b_parser.parse(blast_handle)  File "/sw/lib/python2.5/ 
> site-packages/Bio/Blast/NCBIXML.py", line 112, in parse
>      self._parser.parse(handler)  File "/sw/lib/python2.5/xml/sax/ 
> expatreader.py", line 107, in parse
>      xmlreader.IncrementalParser.parse(self, source)  File "/sw/lib/ 
> python2.5/xml/sax/xmlreader.py", line 123, in parse
>      self.feed(buffer)
>    File "/sw/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
>      self._err_handler.fatalError(exc)
>    File "/sw/lib/python2.5/xml/sax/handler.py", line 38, in  
> fatalError    raise exception
>    xml.sax._exceptions.SAXParseException: <unknown>:7:70: not well- 
> formed (invalid token)
> 
> I am not sure which alignment it choked on, but I would like to  
> rescue it with a try/except block if possible.  But it seems to me  
> that if I did something like
> 
> try:
>      b_record = b_parser.parse(blast_handle)
> except:
>      ...
> 
> Then I would not get anything in b_record if an error raised in the  
> parsing.  Rather, I would like to have whatever has been successful  
> up to the point of the error stored in b_record.
> 
> Is there any way to do this via the BioPython API, or do I have to  
> dig into the python xml parsing code?
> 
> Also, if anyone has a better idea of how to structure this code, I  
> would be very appreciative.
> 
> Cheers,
> 
> Julius
> 
> -----------------------------------------------------
> http://openwetware.org/wiki/User:Lucks
> -----------------------------------------------------
> 
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list