[Bioperl-l] Next-gen modules
Elia Stupka
e.stupka at ucl.ac.uk
Sat Jun 20 16:12:18 EDT 2009
Hi Chris,
I agree. I have not written a single line of code so far, while Heikki
has some (but has been silent for a while) and you have perhaps some
code ready to roll. I am happy to help where needed, just let me know
what you'd like me to focus on. If you want to go ahead and implement
the fastq staff discussed I can focus on bioperl-run.
cheers
Elia
On 19 Jun 2009, at 21:57, Chris Fields wrote:
> So, to follow up (and make sure we don't have any overlapping tuits)
> we should probably determine who wants to work on what (i.e. fastq
> updating, etc). I think it's possible to quickly add in Solexa/
> Illumina/Sanger fastq similar to BioPython, just don't want to step
> on anyone's toes if they are halfway through doing this.
>
> chris
>
> On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:
>
>> Better than colorspaced discussions for sure ;)
>>
>> Elia
>>
>> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>>
>>> So, #1 priority is to get fastq up-to-speed, then maybe assess
>>> other options.
>>>
>>> Illuminating discussion, thanks Elia!
>>>
>>> urgh, excuse unintended bad pun above...
>>>
>>> chris
>>>
>>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>>
>>>> Interesting that you mention the database issue. We found that
>>>> for specific memory/CPU intenstive things we also switch to using
>>>> dbs. For example, after many years of loyal use of
>>>> disconnected_ranges we switched to a simple SQL implementation of
>>>> it, because of the large performance gains it would give us.
>>>> Similarly in Ensembl as well as in the old days of bioperl-db we
>>>> opted for doing subseq within SQL where possible.
>>>>
>>>> Some lean way of SQL'izing specific components could be less
>>>> "disruptive" than avoiding object creation and provide
>>>> significant gains in performance. Could be set as an optional
>>>> flag, and could use temporary ad hoc SQL databases?
>>>>
>>>> Still, priority now is to make SeqIO compliant with all those
>>>> formats, than we can worry about performance :)
>>>>
>>>> Elia
>>>>
>>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>>
>>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>>
>>>>>> Tristan Lefebure wrote:
>>>>>>> Hello,
>>>>>>> Regarding next-gen sequences and bioperl, following my
>>>>>>> experience, another issue is bioperl speed. For example, if
>>>>>>> you want to trim bad quality bases at ends of 1E6 Solexa reads
>>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,
>>>>>>> well, you've got to be patient (but may be I missed some
>>>>>>> shortcuts...).
>>>>>>
>>>>>> This is my concern as well. Or, rather, is there actually a
>>>>>> significant set of users out there who are dealing with next-
>>>>>> gen sequencing and would consider using BioPerl for their work?
>>>>>>
>>>>>> I'm working with all the 1000-genomes data at the Sanger, and
>>>>>> we at least are probably never going to use BioPerl for the work.
>>>>>
>>>>> Are you using pure perl or (gasp) something else? ;>
>>>>>
>>>>> Judging by the feedback there are definitely a set of users who
>>>>> would like to integrate nextgen into bioperl somehow, probably
>>>>> to take advantage of other aspects of bioperl.
>>>>>
>>>>>>> A pure perl solution will be between 100 to 1000x faster...
>>>>>>> Would it be possible to have an ultra-light quality object
>>>>>>> with few simple methods for next-gen reads?
>>>>>>
>>>>>> The fastq parser itself already seems pretty fast. The way to
>>>>>> get the speedup is to not create any Bio::Seq* objects but just
>>>>>> return the data directly. At that point it's not taking much
>>>>>> advantage of BioPerl. But certainly it could be done...
>>>>>
>>>>>
>>>>> I suppose the best way to assess what needs to be done is come
>>>>> up with a set of 'use cases' specifying what users want so we
>>>>> can design around them, otherwise we're shooting in the dark.
>>>>>
>>>>> I'm personally wondering if this could be done as a sequence
>>>>> database, something similar in theme to Lincoln's
>>>>> SeqFeature::Store, but sequence only, and returns quality
>>>>> objects in a similar manner (ala Storable)? Not sure whether
>>>>> that's feasible, but it's appears at least scalable.
>>>>>
>>>>> chris
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> ---
>>>> Senior Lecturer, Bioinformatics
>>>> UCL Cancer Institute
>>>> Paul O' Gorman Building
>>>> University College London
>>>> Gower Street
>>>> WC1E 6BT
>>>> London
>>>> UK
>>>>
>>>> Office (UCL): +44 207 679 6493
>>>> Office (ICMS): +44 0207 8822374
>>>>
>>>> Mobile: +44 7597 566 194
>>>> Mobile (Italy): +39 338 8448801
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> ---
>> Senior Lecturer, Bioinformatics
>> UCL Cancer Institute
>> Paul O' Gorman Building
>> University College London
>> Gower Street
>> WC1E 6BT
>> London
>> UK
>>
>> Office (UCL): +44 207 679 6493
>> Office (ICMS): +44 0207 8822374
>>
>> Mobile: +44 7597 566 194
>> Mobile (Italy): +39 338 8448801
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
---
Senior Lecturer, Bioinformatics
UCL Cancer Institute
Paul O' Gorman Building
University College London
Gower Street
WC1E 6BT
London
UK
Office (UCL): +44 207 679 6493
Office (ICMS): +44 0207 8822374
Mobile: +44 7597 566 194
Mobile (Italy): +39 338 8448801
More information about the Bioperl-l
mailing list