[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?

Michiel de Hoon mjldehoon at yahoo.com
Fri Sep 14 09:27:50 UTC 2012


Hi Martin,

--- On Fri, 9/14/12, Martin Mokrejs <mmokrejs at fold.natur.cuni.cz> wrote:
> Legacy blastn search using 59 queries through dataset
> that takes 17 minutes and yields XML with 3957MB
> in size. Parsing the XML file through biopython takes 56
> minutes to convert the results into my own CSV file

How does this compare to parsing human-readable plain text output? Is it significantly faster than the XML parser?

> With plaintext I actually meant more some tabular
> output format which would be enough for my purposes
> (match and query coordinates, scores, gaps, identities).

Maintaining the tabular Blast output parser has not been a problem, and I expect that it will continue to be supported in Biopython. On the other hand, maintaining the human-readable plain text parser has been a recurring headache. If Biopython can parse tabular Blast output, then do you still need the human-readable plain text parser?

Best,
-Michiel.





More information about the Biopython mailing list