[Bioperl-l] Next-gen modules

Wed Jun 17 18:20:00 UTC 2009

Tristan Lefebure wrote:
> Hello,
> Regarding next-gen sequences and bioperl, following my 
> experience, another issue is bioperl speed. For example, if 
> you want to trim bad quality bases at ends of 1E6 Solexa 
> reads using Bio::SeqIO::fastq and some methods in 
> Bio::Seq::Quality, well, you've got to be patient (but may 
> be I missed some shortcuts...).

This is my concern as well. Or, rather, is there actually a significant 
set of users out there who are dealing with next-gen sequencing and 
would consider using BioPerl for their work?

I'm working with all the 1000-genomes data at the Sanger, and we at 
least are probably never going to use BioPerl for the work.

> A pure perl solution will be between 100 to 1000x faster... 
> Would it be possible to have an ultra-light quality object 
> with few simple methods for next-gen reads?

The fastq parser itself already seems pretty fast. The way to get the 
speedup is to not create any Bio::Seq* objects but just return the data 
directly. At that point it's not taking much advantage of BioPerl. But 
certainly it could be done...