[BioRuby] EMBL parsing
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Sat May 5 06:57:28 UTC 2007
Hi,
On Thu, 3 May 2007 12:48:03 +0100
Anthony Underwood <email2ants at gmail.com> wrote:
> Hi Mitsiteru,
>
> Any of the embl files downloaded from the ebi site have this problem.
>
> for example http://www.ebi.ac.uk/cgi-bin/dbfetch?
> db=embl&style=raw&id=CP000360
>
> Ruby takes all of the cpu power :(
It seems it is caused by thousands of iterations of str1 += str2
because it creates a new string object every time.
A patch is attached. (Ruby 1.8.0 or newer version required)
--- lib/bio/db.rb 5 Apr 2007 23:35:39 -0000 0.37
+++ lib/bio/db.rb 5 May 2007 06:08:39 -0000
@@ -313,12 +313,12 @@
# Returns the contents of the entry as a Hash.
def entry2hash(entry)
- hash = Hash.new('')
+ hash = Hash.new { |h, k| h[k] = '' }
entry.each_line do |line|
tag = tag_get(line)
next if tag == 'XX'
tag = 'R' if tag =~ /^R./ # Reference lines
- hash[tag] += line
+ hash[tag].concat line
end
return hash
end
Naohisa Goto
ng at bioruby.org
More information about the BioRuby
mailing list