[Bioperl-l] Next-gen modules

Elia Stupka e.stupka at ucl.ac.uk
Wed Jun 17 16:36:31 EDT 2009


Better than colorspaced discussions for sure ;)

Elia

On 17 Jun 2009, at 21:35, Chris Fields wrote:

> So, #1 priority is to get fastq up-to-speed, then maybe assess other  
> options.
>
> Illuminating discussion, thanks Elia!
>
> urgh, excuse unintended bad pun above...
>
> chris
>
> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>
>> Interesting that you mention the database issue. We found that for  
>> specific memory/CPU intenstive things we also switch to using dbs.  
>> For example, after many years of loyal use of disconnected_ranges  
>> we switched to a simple SQL implementation of it, because of the  
>> large performance gains it would give us.  Similarly in Ensembl as  
>> well as in the old days of bioperl-db we opted for doing subseq  
>> within SQL where possible.
>>
>> Some lean way of SQL'izing specific components could be less  
>> "disruptive" than avoiding object creation and provide significant  
>> gains in performance. Could be set as an optional flag, and could  
>> use temporary ad hoc SQL databases?
>>
>> Still, priority now is to make SeqIO compliant with all those  
>> formats, than we can worry about performance :)
>>
>> Elia
>>
>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>
>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>
>>>> Tristan Lefebure wrote:
>>>>> Hello,
>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads using  
>>>>> Bio::SeqIO::fastq and some methods in Bio::Seq::Quality, well,  
>>>>> you've got to be patient (but may be I missed some shortcuts...).
>>>>
>>>> This is my concern as well. Or, rather, is there actually a  
>>>> significant set of users out there who are dealing with next-gen  
>>>> sequencing and would consider using BioPerl for their work?
>>>>
>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>> at least are probably never going to use BioPerl for the work.
>>>
>>> Are you using pure perl or (gasp) something else?  ;>
>>>
>>> Judging by the feedback there are definitely a set of users who  
>>> would like to integrate nextgen into bioperl somehow, probably to  
>>> take advantage of other aspects of bioperl.
>>>
>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>> Would it be possible to have an ultra-light quality object with  
>>>>> few simple methods for next-gen reads?
>>>>
>>>> The fastq parser itself already seems pretty fast. The way to get  
>>>> the speedup is to not create any Bio::Seq* objects but just  
>>>> return the data directly. At that point it's not taking much  
>>>> advantage of BioPerl. But certainly it could be done...
>>>
>>>
>>> I suppose the best way to assess what needs to be done is come up  
>>> with a set of 'use cases' specifying what users want so we can  
>>> design around them, otherwise we're shooting in the dark.
>>>
>>> I'm personally wondering if this could be done as a sequence  
>>> database, something similar in theme to Lincoln's  
>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>> feasible, but it's appears at least scalable.
>>>
>>> chris
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK

Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374

Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801



More information about the Bioperl-l mailing list