[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Nov 6 15:27:07 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2381
------- Comment #45 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 10:27 EST -------
(In reply to comment #43)
> (In reply to comment #39)
> > I would be happy with EITHER of these options, as both can be used to
> > translate a complete coding sequence:
> >
> > (1) the "init" argument (under another name, maybe "cds_start"?)
> > illustrated in attachment 1032. This would check the start
> > codon is valid AND translate it as a methionine.
> >
> > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> > illustrated in this patch. This would check the start codon is valid AND
> > translate it as a methionine AND check there are a whole number of codons
> > AND check it ends with a stop codon AND check there are no extra in-frame
> > stop codons.
> >
>
>
> I support (1) but strongly disagree with (2) because 'cds' refers to
> a complete DNA sequence not just if the sequence starts with M.
> http://www.yeastgenome.org/help/glossary.html
> "CDS: CoDing Sequence, region of nucleotides that corresponds to the
> sequence of amino acids in the predicted protein. The CDS includes start and
> stop codons, therefore coding sequences begin with an "ATG" and end with a
> stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
> introns, or bases not expressed due to frameshifting, are not included within
> a CDS. Note that the CDS does not correspond to the actual mRNA sequence."
Starting with that definition but being aware of atypical start codons gives:
"The CDS includes start and stop codons, therefore coding sequences begin with
an "ATG" [or other valid start codon] and end with a stop codon."
This then fits exactly with what I'm doing in the "complete_cds" option
(attachment 1040). So why the disagreement?
> However, I do like being able to obtain the translation of the actual
> CDS - just not here.
Back in comment 11, I previously mooted having separate methods like
translate_to_stop, and translate_cds - but we currently seem to be leaning
towards one method with some options.
> I do not support the name 'init' because of reasons discussed.
I think that is settled, "init" is too ambiguous.
> I do not support the name 'cds_start' because of the DNA interpretation and
> that many Genbank records include the upstream and downstream non-coding
> regions. In such cases, I would have to find the actual start codon, then I
> might as well do the translation after that start codon than rely on a check
> that might be wrong.
In such cases, if your sequence might includes upstream and downstream
non-coding regions, then you shouldn't be trying to use the "init"/"cds_start"
option (or the "complete_cds" option). By the nature of your uncertain
dataset, you'll have to do some extra work to find the start/stop. I don't see
how this is an argument against providing an option useful for when you do know
where the CDS starts (or do already have the CDS).
> Perhaps some variant of:
> a) Similar cases in Python:
> has_met or has_met1
> get_met or get_met1
> b) More direct meaning:
> starts_with_methionine, starts_with_met, starts_with_m
>
I'd been avoiding names with methionine in them, preferring to focus on
initiation or start codon based names.
I guess "starts_with_met" is OK. Or maybe "start_met"?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list