[BioRuby] Bio::Faster plugin

Fields, Christopher J cjfields at illinois.edu
Sat Mar 3 13:44:40 UTC 2012


I have a perl binding to the same C lib implementation (Heng Li's readfq), haven't benchmarked it against our low-level pure perl version but it does seem to be significantly faster.  Not to mention it covers both FASTQ and FASTA.

Not sure how it handles the FASTQ test suite we came up with, might be worth checking out.

chris

On Mar 3, 2012, at 1:38 AM, Francesco Strozzi wrote:

> Hi,
> I haven't done this comparison, but if you want to run it I will be happy
> to put the outputs on the wiki.
> 
> Cheers
> 
> On Friday, March 2, 2012, Mic <mictadlo at gmail.com> wrote:
>> Did also compare the speed between biofaster and reading 4 lines (fastq)/2
>> lines (fasta) at the time?
>> 
>> Cheers,
>> 
>> On Thu, Jan 5, 2012 at 1:20 AM, George Githinji <georgkam at gmail.com>
> wrote:
>> 
>>> ++1 Sounds cool!
>>> 
>>> 
>>> On Wed, Jan 4, 2012 at 6:05 PM, Raoul Bonnal <bonnal at ingm.org> wrote:
>>>> Hi Francesco,
>>>> It's very cool!
>>>> 
>>>> And you can access to the seq object/array also in this way:
>>>> Bio::Faster.parse(File.join(TEST_DATA,"sample.fastq")) do |id,
> comments,
>>>> sequence, quality|
>>>> puts "#{id} #{comments} #{sequence} #{quality}"
>>>> end
>>>> 
>>>> Obviously I like it more than using the raw array :-)
>>>> I suppose in case of no quality value you get a nil object
>>>> 
>>>> 
>>>> +1
>>>> 
>>>> 
>>>> On 04/01/12 10.50, "Francesco Strozzi" <francesco.strozzi at gmail.com>
>>> wrote:
>>>> 
>>>>> Hi guys,
>>>>> 
>>>>> I have created a BioRuby plugin called bio-faster, that implements a
>>> fast
>>>>> and simple parser for FastA and FastQ files. It's based on the C
> library
>>>>> Kseq written by Heng Li (author of Samtools and BWA). Compared to
>>>>> Bio::FastQ it is actually 4-5 times faster in parsing large FastQ
> files.
>>>>> The code will not create a Bio object for each sequence but it will
>>> return
>>>>> a simple array with sequence data and quality values for FastQ (it
>>> supports
>>>>> Sanger/Phred format only).
>>>>> Bio::Faster could be a good choice when you just need to parse huge
>>> files,
>>>>> for example to extract information or to store sequence data in a
>>> database,
>>>>> and you don't need to create an object for each sequence but you only
>>> want
>>>>> to parse the dataset easily and quickly.
>>>>> 
>>>>> Here is the code: https://github.com/fstrozzi/bioruby-faster
>>>>> Here is the wiki for more details:
>>>>> https://github.com/fstrozzi/bioruby-faster/wiki
>>>>> To get the gem: gem install bio-faster
>>>>> 
>>>>> Tested with Ruby 1.9 only.
>>>>> 
>>>>> Any comment or feedback is much appreciated!
>>>>> 
>>>>> Cheers
>>>> 
>>>> 
>>>> _______________________________________________
>>>> BioRuby Project - http://www.bioruby.org/
>>>> BioRuby mailing list
>>>> BioRuby at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>> 
>>> 
>>> 
>>> --
>>> ---------------
>>> Sincerely
>>> George
>>> Skype: george_g2
>>> Blog: http://biorelated.wordpress.com/
>>> Twitter: http://twitter.com/#!/george_l
>>> 
>>> _______________________________________________
>>> BioRuby Project - http://www.bioruby.org/
>>> BioRuby mailing list
>>> BioRuby at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>> 
>> _______________________________________________
>> BioRuby Project - http://www.bioruby.org/
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>> 
> 
> -- 
> 
> Francesco
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby





More information about the BioRuby mailing list