[Bioperl-l] Next-gen modules

Thu Jun 18 01:08:55 UTC 2009

On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my  > >  
>> experience, another issue is bioperl speed. For example, if  > >  
>> you want to trim bad quality bases at ends of 1E6 Solexa  > > reads  
>> using Bio::SeqIO::fastq and some methods in  > > Bio::Seq::Quality,  
>> well, you've got to be patient (but may  > > be I missed some  
>> shortcuts...).
>> >  > This is my concern as well. Or, rather, is there actually a  
>> significant  > set of users out there who are dealing with next-gen  
>> sequencing and  > would consider using BioPerl for their work?
>> >  > I'm working with all the 1000-genomes data at the Sanger, and  
>> we at  > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.  
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much  
> functionality in any case. If I have to write new parsers and  
> wrappers, I may as well make them fast (which means they don't "fit"  
> into BioPerl).

That's (unfortunately) true.  It may be easy to whip up something that  
works, but it probably won't be fast.

>> What *are* you using?
>
> There are already great tools written in C that do all the heavy  
> lifting and the rest is done in perl written for speed and low memory.

Like this one?

http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml

I suppose if one were inclined, this could be wrapped with SWIG in  
BioLib, but would it be worth it (maybe beyond grabbing the file  
indices)?

chris