[Bioperl-l] Next-gen modules
Chris Fields
cjfields at illinois.edu
Wed Jun 17 19:30:15 UTC 2009
On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
> Tristan Lefebure wrote:
>> Hello,
>> Regarding next-gen sequences and bioperl, following my experience,
>> another issue is bioperl speed. For example, if you want to trim
>> bad quality bases at ends of 1E6 Solexa reads using
>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,
>> you've got to be patient (but may be I missed some shortcuts...).
>
> This is my concern as well. Or, rather, is there actually a
> significant set of users out there who are dealing with next-gen
> sequencing and would consider using BioPerl for their work?
>
> I'm working with all the 1000-genomes data at the Sanger, and we at
> least are probably never going to use BioPerl for the work.
Are you using pure perl or (gasp) something else? ;>
Judging by the feedback there are definitely a set of users who would
like to integrate nextgen into bioperl somehow, probably to take
advantage of other aspects of bioperl.
>> A pure perl solution will be between 100 to 1000x faster... Would
>> it be possible to have an ultra-light quality object with few
>> simple methods for next-gen reads?
>
> The fastq parser itself already seems pretty fast. The way to get
> the speedup is to not create any Bio::Seq* objects but just return
> the data directly. At that point it's not taking much advantage of
> BioPerl. But certainly it could be done...
I suppose the best way to assess what needs to be done is come up with
a set of 'use cases' specifying what users want so we can design
around them, otherwise we're shooting in the dark.
I'm personally wondering if this could be done as a sequence database,
something similar in theme to Lincoln's SeqFeature::Store, but
sequence only, and returns quality objects in a similar manner (ala
Storable)? Not sure whether that's feasible, but it's appears at
least scalable.
chris
More information about the Bioperl-l
mailing list