[Bioperl-l] Next-gen modules

Chris Fields cjfields at illinois.edu
Fri Jun 19 20:57:36 UTC 2009


So, to follow up (and make sure we don't have any overlapping tuits)  
we should probably determine who wants to work on what (i.e. fastq  
updating, etc). I think it's possible to quickly add in Solexa/ 
Illumina/Sanger fastq similar to BioPython, just don't want to step on  
anyone's toes if they are halfway through doing this.

chris

On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:

> Better than colorspaced discussions for sure ;)
>
> Elia
>
> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>
>> So, #1 priority is to get fastq up-to-speed, then maybe assess  
>> other options.
>>
>> Illuminating discussion, thanks Elia!
>>
>> urgh, excuse unintended bad pun above...
>>
>> chris
>>
>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>
>>> Interesting that you mention the database issue. We found that for  
>>> specific memory/CPU intenstive things we also switch to using dbs.  
>>> For example, after many years of loyal use of disconnected_ranges  
>>> we switched to a simple SQL implementation of it, because of the  
>>> large performance gains it would give us.  Similarly in Ensembl as  
>>> well as in the old days of bioperl-db we opted for doing subseq  
>>> within SQL where possible.
>>>
>>> Some lean way of SQL'izing specific components could be less  
>>> "disruptive" than avoiding object creation and provide significant  
>>> gains in performance. Could be set as an optional flag, and could  
>>> use temporary ad hoc SQL databases?
>>>
>>> Still, priority now is to make SeqIO compliant with all those  
>>> formats, than we can worry about performance :)
>>>
>>> Elia
>>>
>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>
>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>
>>>>> Tristan Lefebure wrote:
>>>>>> Hello,
>>>>>> Regarding next-gen sequences and bioperl, following my  
>>>>>> experience, another issue is bioperl speed. For example, if you  
>>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads  
>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,  
>>>>>> well, you've got to be patient (but may be I missed some  
>>>>>> shortcuts...).
>>>>>
>>>>> This is my concern as well. Or, rather, is there actually a  
>>>>> significant set of users out there who are dealing with next-gen  
>>>>> sequencing and would consider using BioPerl for their work?
>>>>>
>>>>> I'm working with all the 1000-genomes data at the Sanger, and we  
>>>>> at least are probably never going to use BioPerl for the work.
>>>>
>>>> Are you using pure perl or (gasp) something else?  ;>
>>>>
>>>> Judging by the feedback there are definitely a set of users who  
>>>> would like to integrate nextgen into bioperl somehow, probably to  
>>>> take advantage of other aspects of bioperl.
>>>>
>>>>>> A pure perl solution will be between 100 to 1000x faster...  
>>>>>> Would it be possible to have an ultra-light quality object with  
>>>>>> few simple methods for next-gen reads?
>>>>>
>>>>> The fastq parser itself already seems pretty fast. The way to  
>>>>> get the speedup is to not create any Bio::Seq* objects but just  
>>>>> return the data directly. At that point it's not taking much  
>>>>> advantage of BioPerl. But certainly it could be done...
>>>>
>>>>
>>>> I suppose the best way to assess what needs to be done is come up  
>>>> with a set of 'use cases' specifying what users want so we can  
>>>> design around them, otherwise we're shooting in the dark.
>>>>
>>>> I'm personally wondering if this could be done as a sequence  
>>>> database, something similar in theme to Lincoln's  
>>>> SeqFeature::Store, but sequence only, and returns quality objects  
>>>> in a similar manner (ala Storable)?  Not sure whether that's  
>>>> feasible, but it's appears at least scalable.
>>>>
>>>> chris
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>>
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>>
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list