[BioRuby] Bio-gff3 plugin 0.8.6
Pjotr Prins
pjotr.public14 at thebird.nl
Mon Jan 17 10:08:12 UTC 2011
Released bio-gff3 parser plugin 0.8.6 on rubygems, and can be used
from the command-line. E.g.
gem install bio-gff3
gff3-fetch --help
Introduced LRU cache, replaced the BioRuby GFF line parser and
added lazy parsing. All with significant speedups compared to the
original (No-cache, BioRuby parser, non-lazy).
The LRU version has limited RAM use for any sized data (730MB), and
currently runs 6 times slower than the full memory version.
Digesting parser:
Cache real user sys version RAM
------------------------------------------------------------
full,bioruby 12m41 12m28 0m09 (0.8.0)
full,line 12m13 12m06 0m07 (0.8.5)
full,line,lazy 11m51 11m43 0m07 (0.8.6) 6,600M
none,bioruby 504m 477m 26m50 (0.8.0)
none,line 297m 267m 28m36 (0.8.5)
none,line,lazy 132m 106m 26m01 (0.8.6) 650M
lru,bioruby 533m 510m 22m47 (0.8.5)
lru,line 353m 326m 26m44 (0.8.5) 1K
lru,line 305m 281m 22m30 (0.8.5) 10K
lru,line,lazy 182m 161m 21m10 (0.8.6) 10K
lru,line,lazy 75m 75m 0m17 (0.8.6) 50K 730M
------------------------------------------------------------
where
52M m_hapla.WS217.dna.fa
456M m_hapla.WS217.gff3
ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-linux]
on 64-bits CPU 2.6 GHz (6MB cache), 16 GB RAM machine.
Note bio-gff3 0.8.6 is a fully digesting parser, with scope for full
validation of the GFF3 relations. The next step, a limited
'optimistic' digestion, will speed things up.
Note also that bio-gff3 exploits the bio-logger plugin - it is a good
example.
Pj.
More information about the BioRuby
mailing list