[Biopython] About BLAST parser

Manu Tamminen mavata at gmail.com
Thu Oct 22 10:06:47 UTC 2009


Hi Peter! Thanks for your prompt reply! I've run the BLAST analysis on  
a supercomputer cluster, saved the results into a XML file and then  
transferred the output file to my computer. I then run the script on  
my computer to parse the results into a tab separated file. With the  
current dataset I have 1115 sequences of around 500 bp each.
Manu

On Oct 22, 2009, at 12:56 PM, Peter wrote:

> On Thu, Oct 22, 2009 at 10:45 AM, Manu Tamminen <mavata at gmail.com>  
> wrote:
>> I have a problem with the Biopython BLAST parser. I'm using the  
>> parser to
>> extract relevant information from an XML result file into a tab- 
>> separated
>> table. It seems the XML file occasionally contains errors that  
>> cause the
>> script to abort. This is especially common and annoying with sequence
>> alignments that contain thousands of sequences.
>>
>> Is it possible to write the script so that when an error occurs,  
>> the script
>> would jump into the next sequence rather than abort completely? I  
>> will
>> include below an example of such error. This error is about a  
>> mismatched tag
>> - sometimes the error has also been about a missing tag.
>>
>>    for blast_record in blast_records:
>>  File
>> "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ 
>> site-packages/Bio/Blast/NCBIXML.py",
>> line 660, in parse
>>    expat_parser.Parse(text, True) # End of XML record
>> xml.parsers.expat.ExpatError: mismatched tag: line 82921, column 4
>
> XML is a strict file format with tags like <item> having a closing
> tag </item>. If the XML file is truncated or something, you can
> have mismatched tags (e.g. an <item> without an  </item>) which
> means the XML file is invalid. This is basically what that error
> message is about.
>
> I can make some suggestions that may help, but it first are you
> running BLAST locally or online? Are you saving the results to
> a file, or parsing directly from the handle? How many query
> sequences do you have?
>
> Peter


---
Manu Tamminen, M.Sc.
University of Helsinki
Department of Applied Chemistry and Microbiology, Division of  
Microbiology
P.O. Box 56
00014 HELSINKI
FINLAND

tel: +358 (0)9191 57585
fax:  +358 (0)9191 59322
e-mail: manu.tamminen at helsinki.fi
home: http://www.mm.helsinki.fi/~mvtammin/




More information about the Biopython mailing list