[BioRuby] Parsing large Blast xml files - a new bioruby plugin
Pjotr Prins
pjotr.public14 at thebird.nl
Wed Jun 1 07:30:16 UTC 2011
Hi Rob,
Why did you not start from my lazy fast and big-data XML parser for
BLAST?
https://github.com/pjotrp/blastxmlparser
I hear it is being used in the NGS plugin. Be good to do some
performance tests, when you introduce something new.
I have a feeling you were simply not aware of it.
Pj.
On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote:
> I've written a quick bioruby plugin to help parse blast results that
> are too large to fit into memory.
>
> Install: gem install bio-lazyblastxml
> Code: github.com/robsyme/bioruby-lazyblastxml
> Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/
>
> The plugin uses LibXML::Reader to iterate through nodes, yielding ruby
> objects when required.
> The interface is as close to Bio::Blast::Report as I could keep it,
> but there are a few changes:
> Iteration.hits, hit.hsps etc do not return arrays. Instead, Report
> is a enumerable that yields iterations, Iteration is an enumerable
> that yields hits, Hits are enumerables that yield hsps, etc.
>
> This is my first attempt real shared code, and all comments and
> criticism are very welcome.
>
> -r
>
> Rob Syme
> PhD Candidate
> Curtin University
> Western Australia
>
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
More information about the BioRuby
mailing list