[Biopython] Problems parsing with PSIBlastParser

Peter biopython at maubp.freeserve.co.uk
Tue Nov 3 13:32:55 UTC 2009


On Tue, Nov 3, 2009 at 1:16 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> We had the same problem w/ the BioPerl XML parser and ended up preprocessing
> the data into separate XML files, carrying over the relevant information
> into each file (yes, there is a better way, but it essentially involves a
> redesign of the XML parser and related objects).
>
> BTW, the same thing happens if one runs multiple queries in the same file.
>  All individual report XML are in one single XML file, and information
> relevant to all reports is only found into the first report.  I think this
> has been known for a while.  I've repeatedly tried contacting NCBI but
> haven't had a response re: this problem.
>
> chris

Hi Chris,

Old versions of blastall (also) used to produce concatenated XML files for
multiple queries, but from about 2.2.14 they started (ab)using the iteration
fields originally for PSI-BLAST output to hold multiple queries (there was
some discussion of this on Biopython Bugs 1933 and 1970 - Biopython
*should* cope with either).

Apparently pgpblast was left producing concatenated XML files.
The upshot of this is multi-query BLASTP etc XML files look just like
single query multi-round PSI-BLAST XML files. This means having a
single BLAST XML parser that automatically treats the two differently
is tricky.

Does that fit with your experience?

Peter




More information about the Biopython mailing list