[Biopython] Problems parsing with PSIBlastParser
Peter
biopython at maubp.freeserve.co.uk
Tue Nov 3 08:52:20 EST 2009
On Tue, Nov 3, 2009 at 1:40 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> On Nov 3, 2009, at 7:32 AM, Peter wrote:
>> ...
>> The upshot of this is multi-query BLASTP etc XML files look just like
>> single query multi-round PSI-BLAST XML files. This means having a
>> single BLAST XML parser that automatically treats the two differently
>> is tricky.
>>
>> Does that fit with your experience?
>>
>> Peter
>
> Yes, pretty much. Ours now handles both report types w/o problems. We have
> a pluggable XML parser that is switched out based on whether one expects
> normal BLAST XML (the default) or PSI-BLAST XML (has to be indicated). With
> text reports we can determine this on the fly b/c the blast type should
> indicate whether it is PSI BLAST or not, but IIRC this wasn't the case with
> XML. I haven't checked to see if this has been fixed yet on NCBI's end, but
> I'm assuming it hasn't.
Certainly with 2.2.18 (where I have an example handy), the XML from
pgpblast is practically identical to that from blastall. You *may* be able
to infer this from looking at the complete file (e.g. any iteration messages).
Having the user specify if they are expecting PSI-BLAST output (as you
do in BioPerl) seems like the best option.
We might do this via an optional argument to the existing Bio.Blast.NCBIXML
parser, or add a second PSI-Blast specific parser. The later might be best
for dealing with multi-query PSI-BLAST XML files, and using the same PSI
BLAST specific objects as the old plain text parser.
For plain text output, the Biopython use must already explicitly choose our
PSI-BLAST parser over the default parser.
Peter
More information about the Biopython
mailing list