[Biopython] Problems parsing with PSIBlastParser

Peter biopython at maubp.freeserve.co.uk
Tue Nov 3 13:52:20 UTC 2009


On Tue, Nov 3, 2009 at 1:40 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> On Nov 3, 2009, at 7:32 AM, Peter wrote:
>> ...
>> The upshot of this is multi-query BLASTP etc XML files look just like
>> single query multi-round PSI-BLAST XML files. This means having a
>> single BLAST XML parser that automatically treats the two differently
>> is tricky.
>>
>> Does that fit with your experience?
>>
>> Peter
>
> Yes, pretty much.  Ours now handles both report types w/o problems.  We have
> a pluggable XML parser that is switched out based on whether one expects
> normal BLAST XML (the default) or PSI-BLAST XML (has to be indicated).  With
> text reports we can determine this on the fly b/c the blast type should
> indicate whether it is PSI BLAST or not, but IIRC this wasn't the case with
> XML.  I haven't checked to see if this has been fixed yet on NCBI's end, but
> I'm assuming it hasn't.

Certainly with 2.2.18 (where I have an example handy), the XML from
pgpblast is practically identical to that from blastall. You *may* be able
to infer this from looking at the complete file (e.g. any iteration messages).
Having the user specify if they are expecting PSI-BLAST output (as you
do in BioPerl) seems like the best option.

We might do this via an optional argument to the existing Bio.Blast.NCBIXML
parser, or add a second PSI-Blast specific parser. The later might be best
for dealing with multi-query PSI-BLAST XML files, and using the same PSI
BLAST specific objects as the old plain text parser.

For plain text output, the Biopython use must already explicitly choose our
PSI-BLAST parser over the default parser.

Peter




More information about the Biopython mailing list