[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?

Peter Cock p.j.a.cock at googlemail.com
Fri Sep 14 10:00:33 UTC 2012


On Fri, Sep 14, 2012 at 10:52 AM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
> Hi Peter,
>
> Peter Cock wrote:
>> On Fri, Sep 14, 2012 at 9:12 AM, Martin Mokrejs
>> <mmokrejs at fold.natur.cuni.cz> wrote:
>>> Hi all,
>>>   as a long-term subscriber to this list and bioperl in the past as well I do know
>>> that the plaintext output is being changed silently and that it is a hassle to
>>> maintainers. On the other hand, the XML tags and syntax is way too verbose.
>>> That in turn means lots of disc&memory IO, long parsing times and of course file size.
>>> At least if the XML tags would be scrambled to be shorter strings. ;-)
>>> Umm, I also hit a bug in legacy blastn XML output, still no answer from NCBI:
>>> https://redmine.open-bio.org/issues/3354
>>
>> Earlier this week the NCBI released BLAST 2.2.27+ which might
>> fix this...
>>
> ...
>>
>> I find the BLAST+ tabular output very useful - you can control which
>> columns you get if the default 12 are not enough - and trivial to parse.
>> This is also supported in Bow's SearchIO branch.
>
> Based on the 2.2.27 number you seem to talk about old/legacy blast ...
> but the plus means the new blast from NCBI?

The NCBI call version "2.2.27" of the new C++ rewrite "BLAST v2.2.27+"
(while personally I'd have called it BLAST+ v2.2.27 instead).

The NCBI have now stopped updating legacy BLAST.

> I don't like the new blast, it just gives different=bad results and I
> just don't have time to make up a good bug report with testcases. :((

You are not alone in having problems/regressions with BLAST+
compared to legacy BLAST. I can think of several people still
using 'blastall' for this reason.

> Will see what Wibowo's code. Well, the XML result is same I
> think from both programs.

I think it is practically the same.

Peter



More information about the Biopython mailing list