[Biopython] Legacy blastn XML outfile parsing is slow. What XML	parser is actually used?
    Michiel de Hoon 
    mjldehoon at yahoo.com
       
    Fri Sep 14 09:27:50 UTC 2012
    
    
  
Hi Martin,
--- On Fri, 9/14/12, Martin Mokrejs <mmokrejs at fold.natur.cuni.cz> wrote:
> Legacy blastn search using 59 queries through dataset
> that takes 17 minutes and yields XML with 3957MB
> in size. Parsing the XML file through biopython takes 56
> minutes to convert the results into my own CSV file
How does this compare to parsing human-readable plain text output? Is it significantly faster than the XML parser?
> With plaintext I actually meant more some tabular
> output format which would be enough for my purposes
> (match and query coordinates, scores, gaps, identities).
Maintaining the tabular Blast output parser has not been a problem, and I expect that it will continue to be supported in Biopython. On the other hand, maintaining the human-readable plain text parser has been a recurring headache. If Biopython can parse tabular Blast output, then do you still need the human-readable plain text parser?
Best,
-Michiel.
    
    
More information about the Biopython
mailing list