[Biopython] Problems parsing with PSIBlastParser
Chris Fields
cjfields at illinois.edu
Tue Nov 3 08:40:53 EST 2009
On Nov 3, 2009, at 7:32 AM, Peter wrote:
> On Tue, Nov 3, 2009 at 1:16 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
>>
>> We had the same problem w/ the BioPerl XML parser and ended up
>> preprocessing
>> the data into separate XML files, carrying over the relevant
>> information
>> into each file (yes, there is a better way, but it essentially
>> involves a
>> redesign of the XML parser and related objects).
>>
>> BTW, the same thing happens if one runs multiple queries in the
>> same file.
>> All individual report XML are in one single XML file, and
>> information
>> relevant to all reports is only found into the first report. I
>> think this
>> has been known for a while. I've repeatedly tried contacting NCBI
>> but
>> haven't had a response re: this problem.
>>
>> chris
>
> Hi Chris,
>
> Old versions of blastall (also) used to produce concatenated XML
> files for
> multiple queries, but from about 2.2.14 they started (ab)using the
> iteration
> fields originally for PSI-BLAST output to hold multiple queries
> (there was
> some discussion of this on Biopython Bugs 1933 and 1970 - Biopython
> *should* cope with either).
>
> Apparently pgpblast was left producing concatenated XML files.
> The upshot of this is multi-query BLASTP etc XML files look just like
> single query multi-round PSI-BLAST XML files. This means having a
> single BLAST XML parser that automatically treats the two differently
> is tricky.
>
> Does that fit with your experience?
>
> Peter
Yes, pretty much. Ours now handles both report types w/o problems.
We have a pluggable XML parser that is switched out based on whether
one expects normal BLAST XML (the default) or PSI-BLAST XML (has to be
indicated). With text reports we can determine this on the fly b/c
the blast type should indicate whether it is PSI BLAST or not, but
IIRC this wasn't the case with XML. I haven't checked to see if this
has been fixed yet on NCBI's end, but I'm assuming it hasn't.
chris
More information about the Biopython
mailing list