[Biopython] Problems parsing with PSIBlastParser

Chris Fields cjfields at illinois.edu
Tue Nov 3 08:40:53 EST 2009


On Nov 3, 2009, at 7:32 AM, Peter wrote:

> On Tue, Nov 3, 2009 at 1:16 PM, Chris Fields <cjfields at illinois.edu>  
> wrote:
>>
>> We had the same problem w/ the BioPerl XML parser and ended up  
>> preprocessing
>> the data into separate XML files, carrying over the relevant  
>> information
>> into each file (yes, there is a better way, but it essentially  
>> involves a
>> redesign of the XML parser and related objects).
>>
>> BTW, the same thing happens if one runs multiple queries in the  
>> same file.
>>  All individual report XML are in one single XML file, and  
>> information
>> relevant to all reports is only found into the first report.  I  
>> think this
>> has been known for a while.  I've repeatedly tried contacting NCBI  
>> but
>> haven't had a response re: this problem.
>>
>> chris
>
> Hi Chris,
>
> Old versions of blastall (also) used to produce concatenated XML  
> files for
> multiple queries, but from about 2.2.14 they started (ab)using the  
> iteration
> fields originally for PSI-BLAST output to hold multiple queries  
> (there was
> some discussion of this on Biopython Bugs 1933 and 1970 - Biopython
> *should* cope with either).
>
> Apparently pgpblast was left producing concatenated XML files.
> The upshot of this is multi-query BLASTP etc XML files look just like
> single query multi-round PSI-BLAST XML files. This means having a
> single BLAST XML parser that automatically treats the two differently
> is tricky.
>
> Does that fit with your experience?
>
> Peter

Yes, pretty much.  Ours now handles both report types w/o problems.   
We have a pluggable XML parser that is switched out based on whether  
one expects normal BLAST XML (the default) or PSI-BLAST XML (has to be  
indicated).  With text reports we can determine this on the fly b/c  
the blast type should indicate whether it is PSI BLAST or not, but  
IIRC this wasn't the case with XML.  I haven't checked to see if this  
has been fixed yet on NCBI's end, but I'm assuming it hasn't.

chris


More information about the Biopython mailing list