[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?
Michiel de Hoon
mjldehoon at yahoo.com
Fri Sep 14 05:27:50 EDT 2012
Hi Martin,
--- On Fri, 9/14/12, Martin Mokrejs <mmokrejs at fold.natur.cuni.cz> wrote:
> Legacy blastn search using 59 queries through dataset
> that takes 17 minutes and yields XML with 3957MB
> in size. Parsing the XML file through biopython takes 56
> minutes to convert the results into my own CSV file
How does this compare to parsing human-readable plain text output? Is it significantly faster than the XML parser?
> With plaintext I actually meant more some tabular
> output format which would be enough for my purposes
> (match and query coordinates, scores, gaps, identities).
Maintaining the tabular Blast output parser has not been a problem, and I expect that it will continue to be supported in Biopython. On the other hand, maintaining the human-readable plain text parser has been a recurring headache. If Biopython can parse tabular Blast output, then do you still need the human-readable plain text parser?
Best,
-Michiel.
More information about the Biopython
mailing list