[Biojava-l] Codon count
Andy Yates
ayates at ebi.ac.uk
Thu Apr 21 12:06:35 UTC 2011
There will be a performance hit but you'll be rewriting the translation code so maybe the speed reduction isn't worth the recoding task. Give it a benchmark before recoding. I can't remember the exact speed but it isn't too slow
Andy
Khalil El Mazouari <khalil.elmazouari at gmail.com> wrote:
>Hi Andy,
>
>I am actually counting codons via 6 ORFs translations. I am working on
>±100.000 seq/run => 600.000 ORFs to check. So, performance is an issue
>for my job.
>
>I am just wondering if counting Codons directly on NT seq (both strand)
>will be faster vs translation + AA counting.
>
>Regards,
>
>khalil
>
>
>On 21 Apr 2011, at 13:40, Andy Yates wrote:
>
>> Hi Khalil,
>>
>> Then I think windowed sequence is the only way to go. Actually one
>particularly "interesting" idea has just sprung to mind. What if you
>translated the entire sequence in frame 1 forward & reverse? Then
>finding the amount of correct codons is a case of looking for amino
>acids which are not a stop or unknown amino acid.
>>
>> Andy
>>
>> On 21 Apr 2011, at 12:37, Khalil El Mazouari wrote:
>>
>>> Thanks Andy,
>>> it's the second option I am looking for.
>>>
>>> Regards,
>>> khalil
>>>
>>>
>>>
>>> On 21 Apr 2011, at 13:23, Andy Yates wrote:
>>>
>>>> Hi Khalil,
>>>>
>>>> I'm not 100% sure what you want here. If you just want to know the
>potential number of codons on both strands of DNA then it would be
>(length / 3)*2. If what you are actually asking for is how many codons
>code for an amino acid then you would have to perform work similar to
>the transcription engine in BJ3. All codon tables are available from
>the IUPACParser class & then it would be up to you to use a
>WindowedSequence over the top of your NT sequence to get the windows or
>SequenceMixin.nonOverlappingKmers() which shortcuts the creation of the
>WindowedSequence.
>>>>
>>>> Regards,
>>>>
>>>> Andy
>>>>
>>>> On 21 Apr 2011, at 11:36, Khalil El Mazouari wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking for a simple method or class to count the number of a
>specific AA codon on NT seq. Counting on both strands.
>>>>>
>>>>> Any suggestion is welcome.
>>>>>
>>>>> Regards,
>>>>>
>>>>> khalil
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>> --
>>>> Andrew Yates Ensembl Genomes Engineer
>>>> EMBL-EBI Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Andrew Yates Ensembl Genomes Engineer
>> EMBL-EBI Tel: +44-(0)1223-492538
>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
>>
>>
>>
>>
More information about the Biojava-l
mailing list