[Bioperl-l] Next-gen modules
Chris Fields
cjfields at illinois.edu
Thu Jun 18 01:08:55 UTC 2009
On Jun 17, 2009, at 5:24 PM, Sendu Bala wrote:
> George Hartzell wrote:
>> Sendu Bala writes:
>> > Tristan Lefebure wrote:
>> > > Hello,
>> > > Regarding next-gen sequences and bioperl, following my > >
>> experience, another issue is bioperl speed. For example, if > >
>> you want to trim bad quality bases at ends of 1E6 Solexa > > reads
>> using Bio::SeqIO::fastq and some methods in > > Bio::Seq::Quality,
>> well, you've got to be patient (but may > > be I missed some
>> shortcuts...).
>> > > This is my concern as well. Or, rather, is there actually a
>> significant > set of users out there who are dealing with next-gen
>> sequencing and > would consider using BioPerl for their work?
>> > > I'm working with all the 1000-genomes data at the Sanger, and
>> we at > least are probably never going to use BioPerl for the work.
>> > [...]
>> Is it purely a speed issue, or are there other issues (e.g.
>> stability,
>> correctness, compatibility) that are contributing to your decision?
>
> Too heavy-weight, too slow, too memory intensive, missing too much
> functionality in any case. If I have to write new parsers and
> wrappers, I may as well make them fast (which means they don't "fit"
> into BioPerl).
That's (unfortunately) true. It may be easy to whip up something that
works, but it probably won't be fast.
>> What *are* you using?
>
> There are already great tools written in C that do all the heavy
> lifting and the rest is done in perl written for speed and low memory.
Like this one?
http://www.sanger.ac.uk/Users/lh3/parsefastq.shtml
I suppose if one were inclined, this could be wrapped with SWIG in
BioLib, but would it be worth it (maybe beyond grabbing the file
indices)?
chris
More information about the Bioperl-l
mailing list