[Bioperl-l] Next-gen modules
Chris Fields
cjfields at illinois.edu
Fri Jun 19 16:57:36 EDT 2009
So, to follow up (and make sure we don't have any overlapping tuits)
we should probably determine who wants to work on what (i.e. fastq
updating, etc). I think it's possible to quickly add in Solexa/
Illumina/Sanger fastq similar to BioPython, just don't want to step on
anyone's toes if they are halfway through doing this.
chris
On Jun 17, 2009, at 3:36 PM, Elia Stupka wrote:
> Better than colorspaced discussions for sure ;)
>
> Elia
>
> On 17 Jun 2009, at 21:35, Chris Fields wrote:
>
>> So, #1 priority is to get fastq up-to-speed, then maybe assess
>> other options.
>>
>> Illuminating discussion, thanks Elia!
>>
>> urgh, excuse unintended bad pun above...
>>
>> chris
>>
>> On Jun 17, 2009, at 3:06 PM, Elia Stupka wrote:
>>
>>> Interesting that you mention the database issue. We found that for
>>> specific memory/CPU intenstive things we also switch to using dbs.
>>> For example, after many years of loyal use of disconnected_ranges
>>> we switched to a simple SQL implementation of it, because of the
>>> large performance gains it would give us. Similarly in Ensembl as
>>> well as in the old days of bioperl-db we opted for doing subseq
>>> within SQL where possible.
>>>
>>> Some lean way of SQL'izing specific components could be less
>>> "disruptive" than avoiding object creation and provide significant
>>> gains in performance. Could be set as an optional flag, and could
>>> use temporary ad hoc SQL databases?
>>>
>>> Still, priority now is to make SeqIO compliant with all those
>>> formats, than we can worry about performance :)
>>>
>>> Elia
>>>
>>> On 17 Jun 2009, at 20:30, Chris Fields wrote:
>>>
>>>> On Jun 17, 2009, at 1:20 PM, Sendu Bala wrote:
>>>>
>>>>> Tristan Lefebure wrote:
>>>>>> Hello,
>>>>>> Regarding next-gen sequences and bioperl, following my
>>>>>> experience, another issue is bioperl speed. For example, if you
>>>>>> want to trim bad quality bases at ends of 1E6 Solexa reads
>>>>>> using Bio::SeqIO::fastq and some methods in Bio::Seq::Quality,
>>>>>> well, you've got to be patient (but may be I missed some
>>>>>> shortcuts...).
>>>>>
>>>>> This is my concern as well. Or, rather, is there actually a
>>>>> significant set of users out there who are dealing with next-gen
>>>>> sequencing and would consider using BioPerl for their work?
>>>>>
>>>>> I'm working with all the 1000-genomes data at the Sanger, and we
>>>>> at least are probably never going to use BioPerl for the work.
>>>>
>>>> Are you using pure perl or (gasp) something else? ;>
>>>>
>>>> Judging by the feedback there are definitely a set of users who
>>>> would like to integrate nextgen into bioperl somehow, probably to
>>>> take advantage of other aspects of bioperl.
>>>>
>>>>>> A pure perl solution will be between 100 to 1000x faster...
>>>>>> Would it be possible to have an ultra-light quality object with
>>>>>> few simple methods for next-gen reads?
>>>>>
>>>>> The fastq parser itself already seems pretty fast. The way to
>>>>> get the speedup is to not create any Bio::Seq* objects but just
>>>>> return the data directly. At that point it's not taking much
>>>>> advantage of BioPerl. But certainly it could be done...
>>>>
>>>>
>>>> I suppose the best way to assess what needs to be done is come up
>>>> with a set of 'use cases' specifying what users want so we can
>>>> design around them, otherwise we're shooting in the dark.
>>>>
>>>> I'm personally wondering if this could be done as a sequence
>>>> database, something similar in theme to Lincoln's
>>>> SeqFeature::Store, but sequence only, and returns quality objects
>>>> in a similar manner (ala Storable)? Not sure whether that's
>>>> feasible, but it's appears at least scalable.
>>>>
>>>> chris
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> ---
>>> Senior Lecturer, Bioinformatics
>>> UCL Cancer Institute
>>> Paul O' Gorman Building
>>> University College London
>>> Gower Street
>>> WC1E 6BT
>>> London
>>> UK
>>>
>>> Office (UCL): +44 207 679 6493
>>> Office (ICMS): +44 0207 8822374
>>>
>>> Mobile: +44 7597 566 194
>>> Mobile (Italy): +39 338 8448801
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> ---
> Senior Lecturer, Bioinformatics
> UCL Cancer Institute
> Paul O' Gorman Building
> University College London
> Gower Street
> WC1E 6BT
> London
> UK
>
> Office (UCL): +44 207 679 6493
> Office (ICMS): +44 0207 8822374
>
> Mobile: +44 7597 566 194
> Mobile (Italy): +39 338 8448801
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list