[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?

Martin Mokrejs mmokrejs at fold.natur.cuni.cz
Fri Sep 14 05:52:24 EDT 2012


Hi Peter,

Peter Cock wrote:
> On Fri, Sep 14, 2012 at 9:12 AM, Martin Mokrejs
> <mmokrejs at fold.natur.cuni.cz> wrote:
>> Hi all,
>>   as a long-term subscriber to this list and bioperl in the past as well I do know
>> that the plaintext output is being changed silently and that it is a hassle to
>> maintainers. On the other hand, the XML tags and syntax is way too verbose.
>> That in turn means lots of disc&memory IO, long parsing times and of course file size.
>> At least if the XML tags would be scrambled to be shorter strings. ;-)
>> Umm, I also hit a bug in legacy blastn XML output, still no answer from NCBI:
>> https://redmine.open-bio.org/issues/3354
> 
> Earlier this week the NCBI released BLAST 2.2.27+ which might
> fix this...
> 
...
> 
> I find the BLAST+ tabular output very useful - you can control which
> columns you get if the default 12 are not enough - and trivial to parse.
> This is also supported in Bow's SearchIO branch.

Based on the 2.2.27 number you seem to talk about old/legacy blast ...
but the plus means the new blast from NCBI? I don't like the new blast, it just
gives different=bad results and I just don't have time to make up a good bug report
with testcases. :((

Will see what Wibowo's code. Well, the XML result is same I think from both programs.

Martin



More information about the Biopython mailing list