[Bioperl-l] Next-gen modules

Tristan Lefebure tristan.lefebure at gmail.com
Wed Jun 17 14:09:42 EDT 2009


Thanks both for the light.

That probably means that the place bioperl will take in the 
handling of the next-gen sequencing raw data (i.e. reads) is 
very limited, nope? (at least until bioperl6). A single GA2 
solexa lane generates about 9 million reads, and I would 
really not called that a big project...

BTW, is there a simple way to see object instantiation and 
inheritance, as well as time consumption for each, when once 
calls next_seq() (or any other method)?

-Tristan

On Wednesday 17 June 2009 13:49:38 Elia Stupka wrote:
> I would suggest developing the "standard" version first,
> then moving onto potential optimizations.
>
> When we went through a similar argument in Ensembl about
> 8 years ago we ended up dropping Bio::Root completely...
>
> If one is truly after performance for these large
> next-gen projects, it'd be down to pure piping, shell,
> and worrying about location and copying of files,
> sticking to systems-level as much as possible, and quite
> far from Bioperl altogether, so I think it's a whole
> different level of optimization issues, probably outside
> the scope of Bioperl.
>
> Elia
>
> On 17 Jun 2009, at 18:09, Chris Fields wrote:
> > On Jun 17, 2009, at 8:27 AM, Tristan Lefebure wrote:
> >> Hello,
> >> Regarding next-gen sequences and bioperl, following my
> >> experience, another issue is bioperl speed. For
> >> example, if you want to trim bad quality bases at ends
> >> of 1E6 Solexa reads using Bio::SeqIO::fastq and some
> >> methods in Bio::Seq::Quality, well, you've got to be
> >> patient (but may be I missed some shortcuts...).
> >
> > The key issues affecting speed in bioperl are contained
> > object instantiation and inheritance (and between those
> > two, the latter much more so as it plays a role with
> > contained objects as well as the container).
> >
> > http://www.bioperl.org/wiki/Why_BioPerl_is_slow
> >
> > Moose/Perl6 roles/traits are one way around that issue,
> > but we are a ways off from getting that running.  I
> > think to get that working decently would be a
> > from-ground-up endeavor (see my past posts on
> > biomoose/bioperl6).
> >
> >> A pure perl solution will be between 100 to 1000x
> >> faster... Would it be possible to have an ultra-light
> >> quality object with few simple methods for next-gen
> >> reads?
> >>
> >> I can contribute some tests if that sounds like an
> >> important point.
> >>
> >> -Tristan
> >
> > The quality objects themselves I don't think are that
> > heavy; I think the main impediment is inheritance.  One
> > could get around that a bit by using a direct_new
> > method to create a blessed hash directly, then
> > reimplement methods to lazily create any objects
> > contained on the fly.
> >
> > chris
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801




More information about the Bioperl-l mailing list