[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?

Peter Cock p.j.a.cock at googlemail.com
Fri Sep 21 13:22:48 UTC 2012


On Sat, Sep 15, 2012 at 2:22 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi guys,
>
>> > 2) If we add a function to Biopython that generates Blast plain-text
>> > output (or something close to it) from Blast XML output, then a user can
>> > generate the Blast output in XML format, parse it with Biopython,
>> > optionally
>> > filter it, and then generate the corresponding plain-text output;
>>
>> The new 'SearchIO' results objects str/repr should be familiar to
>> anyone who has looked at the plain text BLAST output - but
>> not identical. We could apply some of these improvements
>> to the current BLAST parsers, but I favour aiming to simply
>> deprecate them in favour of 'SearchIO' (namespace to be
>> decided).
>>
>> However, we certainly could try and offer a plain-text BLAST
>> output format from 'SearchIO', although IIRC Bow has not tried
>> that yet. It shouldn't be too complicated - unless you aim for
>> 100% agreement with the latest BLAST output (moving target).
>
> Yes, this has not been attempted ~ mostly because I feel that the
> BLAST plain text is indeed a moving target. But, if we are in favor of
> choosing one format from one BLAST version and always stick to it, it
> sounds more reasonable.
>
> There are one missing detail that is only present in the plain text
> format, though: the hit-level e-values. If we do decide to write a
> plain text writer, we either have to demand the user supply these
> values, or we omit the entire hit-level e-value table, or we fill it
> with something else.

Bow and I have just been over the BLAST+ source code,
and confirmed the 'hit level e-value' shown in the plain text
description table before the alignments is in fact just the
e-value of the best HSP. i.e. The minimum e-value.

So that isn't a problem afterall.

Peter



More information about the Biopython mailing list