[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?

Fields, Christopher J cjfields at illinois.edu
Fri Sep 14 01:32:19 UTC 2012


On Sep 13, 2012, at 7:37 PM, Michiel de Hoon <mjldehoon at yahoo.com>
 wrote:

> --- On Thu, 9/13/12, Martin Mokrejs <mmokrejs at fold.natur.cuni.cz> wrote:
>> P.S.: And yes, I would love to parse blastn plaintext output
>> or some other more compact one, the XML is really an overkill.
> 
> What exactly is the advantage of plain text parsing compared to XML? File size?
> 
> Best,
> -Michiel.

There isn't any.  In fact, NCBI has consistently stated that one should never rely on parsing BLAST text output, primarily b/c they reserve the right to make changes to the output at any given point, whereas XML output should remain stable.  As someone who has taken care of legacy BLAST code for a number of years (BioPerl), I can state that is fairly close to the truth (the caveat being they have made changes that break some XML parsing, but they do try to fix them).  BLAST XML has simply been much easier to deal with in terms of fixing issues than text.

chris





More information about the Biopython mailing list