[Bioperl-l] Next-gen modules

Wed Jun 17 17:09:54 UTC 2009

On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:

> Hello,
> Regarding next-gen sequences and bioperl, following my
> experience, another issue is bioperl speed. For example, if
> you want to trim bad quality bases at ends of 1E6 Solexa
> reads using Bio::SeqIO::fastq and some methods in
> Bio::Seq::Quality, well, you've got to be patient (but may
> be I missed some shortcuts...).

The key issues affecting speed in bioperl are contained object  
instantiation and inheritance (and between those two, the latter much  
more so as it plays a role with contained objects as well as the  
container).

http://www.bioperl.org/wiki/Why_BioPerl_is_slow

Moose/Perl6 roles/traits are one way around that issue, but we are a  
ways off from getting that running.  I think to get that working  
decently would be a from-ground-up endeavor (see my past posts on  
biomoose/bioperl6).

> A pure perl solution will be between 100 to 1000x faster...
> Would it be possible to have an ultra-light quality object
> with few simple methods for next-gen reads?
>
> I can contribute some tests if that sounds like an important
> point.
>
> -Tristan

The quality objects themselves I don't think are that heavy; I think  
the main impediment is inheritance.  One could get around that a bit  
by using a direct_new method to create a blessed hash directly, then  
reimplement methods to lazily create any objects contained on the fly.

chris