[Bioperl-l] Next-gen modules

Chris Fields cjfields at illinois.edu
Thu Jun 18 01:08:55 UTC 2009


On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:

> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my  > >  
>> experience, another issue is bioperl speed. For example, if  > >  
>> you want to trim bad quality bases at ends of 1E6 Solexa  > > reads  
>> using Bio::SeqIO::fastq and some methods in  > > Bio::Seq::Quality,  
>> well, you've got to be patient (but may  > > be I missed some  
>> shortcuts...).
>> >  > This is my concern as well. Or, rather, is there actually a  
>> significant  > set of users out there who are dealing with next-gen  
>> sequencing and  > would consider using BioPerl for their work?
>> >  > I'm working with all the 1000-genomes data at the Sanger, and  
>> we at  > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.  
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much  
> functionality in any case. If I have to write new parsers and  
> wrappers, I may as well make them fast (which means they don't "fit"  
> into BioPerl).

That's (unfortunately) true.  It may be easy to whip up something that  
works, but it probably won't be fast.

>> What *are* you using?
>
> There are already great tools written in C that do all the heavy  
> lifting and the rest is done in perl written for speed and low memory.

Like this one?

http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml

I suppose if one were inclined, this could be wrapped with SWIG in  
BioLib, but would it be worth it (maybe beyond grabbing the file  
indices)?

chris



More information about the Bioperl-l mailing list