[BioRuby] Ruby EMBOSS mapping (using Biolib)

Thu Nov 26 13:08:30 UTC 2009

Hi all,

The last year I have been working on C library mappings to Ruby. A
comparison of Bioruby against Biolib/EMBOSS six frame translation of a
C.elegans dataset shows the Ruby with EMBOSS version is about 30x
faster. On my (outdated) machine:

Bioruby version:

  22929 records 137574 times translated!
   real    9m30.952s
   user    8m42.877s
   sys     0m32.878s

Biolib version:

  22929 records 137574 times translated!
   real    0m20.306s
   user    0m15.997s
   sys     0m1.344s

This is including IO - which is handled by Ruby. 

The Bioruby code reads:

  nt = FastaReader.new(fn)
  nt.each { | rec |
      seq = Bio::Sequence::NA.new(rec.seq)
      [-3,-2,-1,1,2,3].each do | frame |
        print "> ",rec.id," ",frame.to_s,"\n"
        print seq.translate(frame),"\n"
      end
  }
  $stderr.print nt.size," records ",nt.size*6*iter," times translated!"

The Biolib code reads

  nt = FastaReader.new(fn)
  trnTable = Biolib::Emboss.ajTrnNewI(1);
  nt.each { | rec |
      ajpseq   = Biolib::Emboss.ajSeqNewNameC(rec.seq,"Test sequence")
      [-3,-2,-1,1,2,3].each do | frame |
        ajpseqt  = Biolib::Emboss.ajTrnSeqOrig(trnTable,ajpseq,frame)
        aa       = Biolib::Emboss.ajSeqGetSeqCopyC(ajpseqt)
        print "> ",rec.id," ",frame.to_s,"\n"
        print aa,"\n"
      end
  }
  $stderr.print nt.size," records ",nt.size*6*iter," times translated!"

A write up of the mapping effort is at:

  http://biolib.open-bio.org/wiki/Mapping_EMBOSS