[BioRuby] Plugins, Biogem and Christmas 2010
Pjotr Prins
pjotr.public14 at thebird.nl
Mon Feb 14 14:46:39 UTC 2011
Yet another BioRuby plugin.
I just released a fast BLAST XML file parser for big data (i.e. it
does not necessarily load everything in memory). It is based on
Nokogiri+libxml2. A quick test shows it is 50x faster than the ReXML
parser that comes with BioRuby.
Install with
gem install bio-blastxmlparse
It comes with a utility to produce tabular output
blastxmlparser --help
Docs at
https://github.com/pjotrp/blastxmlparser
(you may need to install libxml2-dev first, to build the native
extension).
There is a choice of two parsers, loading the DOM in memory, or split
the XML file in smaller sections.
I ran quite a few test to see what type of parsing would give best
results. Currently I parse the DOM, walk the low level nodes, and use
(lazy) XPath for the values. There is probably still room for
improvement.
One thing I will still try, when I have time, is parallelized parsing
on JRuby. With that it should be one of the fastest BLAST parsers on
the planet.
Enjoy,
Pj.
On Fri, Dec 24, 2010 at 12:08:04PM +0100, Raoul Bonnal wrote:
> BioRuby plugin system was firstly announced at [BOSC 2010] and will be implemented by the Christmas 2010. Hopefully. :) -- Yes, we made it! Check out the BiogemInstallation and BiogemDevelopment sections.
More information about the BioRuby
mailing list