[BioRuby] Bio::Faster plugin

Wed Jan 4 15:05:00 UTC 2012

Hi Francesco,
It's very cool!

And you can access to the seq object/array also in this way:
Bio::Faster.parse(File.join(TEST_DATA,"sample.fastq")) do |id, comments,
sequence, quality|
 puts "#{id} #{comments} #{sequence} #{quality}"
end

Obviously I like it more than using the raw array :-)
I suppose in case of no quality value you get a nil object

+1

On 04/01/12 10.50, "Francesco Strozzi" <francesco.strozzi at gmail.com> wrote:

> Hi guys,
> 
> I have created a BioRuby plugin called bio-faster, that implements a fast
> and simple parser for FastA and FastQ files. It's based on the C library
> Kseq written by Heng Li (author of Samtools and BWA). Compared to
> Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files.
> The code will not create a Bio object for each sequence but it will return
> a simple array with sequence data and quality values for FastQ (it supports
> Sanger/Phred format only).
> Bio::Faster could be a good choice when you just need to parse huge files,
> for example to extract information or to store sequence data in a database,
> and you don't need to create an object for each sequence but you only want
> to parse the dataset easily and quickly.
> 
> Here is the code: https://github.com/fstrozzi/bioruby-faster
> Here is the wiki for more details:
> https://github.com/fstrozzi/bioruby-faster/wiki
> To get the gem: gem install bio-faster
> 
> Tested with Ruby 1.9 only.
> 
> Any comment or feedback is much appreciated!
> 
> Cheers