[BioRuby] problem while handling large fasta files

Fri Sep 5 01:47:21 UTC 2008

On Thu, 4 Sep 2008 15:32:27 +0200 (CEST)
"K. Patil" <kpatil at science.uva.nl> wrote:

> Oops, sorry for incomplete information. Here it is;
> 
> Ruby: 1.8
> Bioruby: 1.0.0
> OS/CPU: 2.6.24.2.1.amd64-smp #1 SMP Mon Feb 11 12:43:21 UTC 2008 x86_64
> GNU/Linux

The BioRuby 1.0.0 is too old!

The only thing I can say is the problem may not occur
in the latest version of BioRuby, at least 1.2.1.

> Also I cannot upgrade Ruby/Bioruby easily as I don't have appropriate
> permissions (all packages are installed by the administrator on request).

BioRuby (and also Ruby) can be installed in your home directory,
without root (administrator) permission.

The simplest way is:

 % cd somewhere
 % wget http://bioruby.open-bio.org/archive/bioruby-1.2.1.tar.gz
 % tar zxvf bioruby-1.2.1.tar.gz

And then, when running your script,

 % ruby -I /full/path/to/somewhere/bioruby-1.2.1/lib example.rb
 (The "/full/path/to/somewhere" is the path you extracted
  the bioruby archive.)

If you want to use irb,

 % ruby -I /full/path/to/somewhere/bioruby-1.2.1/lib -r bio

Alternatively, put

 $LOAD_PATH.unshift("/full/path/to/somewhere/bioruby-1.2.1/lib")

before the require 'bio' in your script.

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org

> 
> thanks and regards,
> kaustubh
> 
> 
> > Hi,
> >
> > Please show which BioRuby version, Ruby version, OS,
> > architecture (type of CPU) you are using.
> >
> > Is the Ruby and/or BioRuby version older?
> >
> > Naohisa Goto
> > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
> >
> > On Thu, 4 Sep 2008 14:02:19 +0200 (CEST)
> > "K. Patil" <kpatil at science.uva.nl> wrote:
> >
> >> Hi,
> >>
> >> I am trying to do some simple processing on fasta files. It works file
> >> for
> >> small files (upto several MB). But as soon as I move to very large files
> >> (e.g. 2.2 GB) the program crashes. Any help/suggestions highly
> >> appreciated.
> >>
> >> Best regards,
> >> Kaustubh Patil
> >>
> >> I am pasting a very simple example below (the file is 2.2GB);
> >>
> >> irb(main):021:0> fasta = Bio::FastaFormat.open("9606.2.fna")
> >> => #<Bio::FlatFile:0x2b2484e9c4a0
> >> @splitter=#<Bio::FlatFile::Splitter::Default:0x2b2484e9a420
> >> @stream=#<Bio::FlatFile::BufferedInputStream:0x2b2484e9c158
> >> @io=#<Bio::FlatFile::BufferedInputStream:0x2b2484e9c3b0
> >> @io=#<File:9606.2.fna>, @buffer="", @path="9606.2.fna">,
> >> @buffer=">9606.2.fna\ntaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaac\n",
> >> @path="9606.2.fna">, @header=nil, @delimiter="\n>",
> >> @delimiter_overrun=1>,
> >> @firsttime_flag=true,
> >> @stream=#<Bio::FlatFile::BufferedInputStream:0x2b2484e9c158
> >> @io=#<Bio::FlatFile::BufferedInputStream:0x2b2484e9c3b0
> >> @io=#<File:9606.2.fna>, @buffer="", @path="9606.2.fna">,
> >> @buffer=">9606.2.fna\ntaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaac\n",
> >> @path="9606.2.fna">, @skip_leader_mode=:firsttime, @raw=false,
> >> @dbclass=Bio::FastaFormat>
> >> irb(main):022:0> fasta.each do |seq|
> >> irb(main):023:1* print seq.data
> >> irb(main):024:1> end
> >> NoMethodError: private method `sub' called for nil:NilClass
> >>         from /usr/lib/ruby/1.8/bio/db/fasta.rb:156:in `initialize'
> >>         from /usr/lib/ruby/1.8/bio/io/flatfile.rb:579:in `new'
> >>         from /usr/lib/ruby/1.8/bio/io/flatfile.rb:579:in `next_entry'
> >>         from /usr/lib/ruby/1.8/bio/io/flatfile.rb:609:in `each'
> >>         from (irb):22
> >>
> >>
> >> _______________________________________________
> >> BioRuby mailing list
> >> BioRuby at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioruby
> >
> >
> >
> 
>