[BioRuby] Benchmarking FASTA file parsing
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Fri Aug 13 14:47:35 UTC 2010
Hi,
On Fri, 13 Aug 2010 14:25:46 +0200
Martin Asser Hansen <mail at maasha.dk> wrote:
> io1 = StringIO.new(data)
> io2 = StringIO.new(data)
> fasta1 = Fasta.new(io1)
> fasta2 = Bio::FastaFormat.open(io2)
>
> Benchmark.bm(5) do |timer|
> timer.report('Hack') { 10_000_000.times { fasta1.each { |entry1| } } }
> timer.report('Bio') { 10_000_000.times { fasta2.each { |entry2| } } }
> end
To rewind the IO (StringIO or Bio::FlatFile object) every time
after reading will be needed during the benchmark.
#(snip)
Benchmark.bm(5) do |timer|
timer.report('Hack') { 10_000_000.times {
fasta1.each { |entry1| }; io1.rewind } }
timer.report('Bio') { 10_000_000.times {
fasta2.each { |entry2| }; fasta2.rewind } }
end
Why using "fasta2.rewind" instead of "io2.rewind" is that
the "fasta2" is an instance of Bio::FlatFile, IO wrapper
used in BioRuby, and to keep consistency of information
inside the wrapper, it is recommended using fasta2.rewind
rather than io2.rewind.
I applied above changes, and reduced iteration count to
100,000 times, and get the result with the same tendency.
(ruby 1.8.7-p299 (debian Squeeze 1.8.7.299-1))
user system total real
Hack 7.240000 0.160000 7.400000 ( 7.390807)
Bio 23.250000 0.850000 24.100000 ( 24.100267)
(ruby 1.9.1-p243 with env LANG=C)
user system total real
Hack 5.600000 0.010000 5.610000 ( 5.605175)
Bio 15.920000 0.000000 15.920000 ( 15.917899)
With E.coli genome ORF data, the difference become smaller,
especially in Ruby 1.9.1.
(snip)
# ftp://ftp.ncbi.nih.gov:/genbank/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655/U00096.ffn
io1 = File.open('U00096.ffn')
io2 = File.open('U00096.ffn')
fasta1 = Fasta.new(io1)
fasta2 = Bio::FastaFormat.open(io2)
Benchmark.bm(5) do |timer|
timer.report('Hack') { 100.times {
fasta1.each { |entry1| }; io1.rewind } }
timer.report('Bio') { 100.times {
fasta2.each { |entry2| }; fasta2.rewind }
}
end
(ruby 1.8.7-p299)
user system total real
Hack 8.340000 0.140000 8.480000 ( 8.492107)
Bio 13.480000 0.520000 14.000000 ( 13.998213)
(Ruby 1.9.1-p243 with env LANG=C)
user system total real
Hack 9.130000 0.140000 9.270000 ( 9.270361)
Bio 9.380000 0.180000 9.560000 ( 9.565899)
--
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
More information about the BioRuby
mailing list