[Bioperl-l] parsing coded_by subfeature

Will Fischer wfischer at uts.cc.utexas.edu
Mon Jul 21 15:54:05 EDT 2003


On Sunday, July 20, 2003, at 01:33 PM, Jack Chen wrote:

> I am also curious how to handle the cases where the 'coded_by' 
> subfeature
> contains the ">" and "<" signs. I am not really sure what they mean. 
> And I
> noticed that wherever these signs appear, the protein sequences 
> retrieved
> are different from the conceptual translation from the nucleotide
> sequences. For example:
>
> [nchen at whey blast_db_checked]$ ./test.pl "gi|8573628|gb|AAF77462.1|"
> Protein obtained from GenBank:
> MPQMAPISWLLLFIIFSITFILFCSINYYSYMPNSPKSNELKNINLNSMNWKW
> CDS sequence is: [deleted]
> Conceptual translation is:
> IPQIAPIR*LLLFIIFSITFILFCSINYYSYMPNSPKSNELKNINLNSIN*K**

and
> I think I was confused by the fact that the protein sequence provided 
> by
> the GenBank does not match that conceptually translated sequence. Say,
> most of the nucleotide suquences (after joining together) are actually
> longer than the protein sequences.

"<" means "before this base"; ">", "after this base.  These are 
typically used for features that are not completely included in the 
sequence, or (less often) where the actual start and end of a feature 
is not precisely known.

As for the differences in translation, they're due to the /transl_table 
qualifier:  different critters use different genetic codes; and codon 
usage in mitochondria (as in your example) reflects their bacterial 
origins.  Any code for translating GenBank entries ought to take these 
translation tables into account.  One example (in a non-bioperl 
context) can be seen in my standalone translation script, nt2aa, at 
http://sunflower.bio.indiana.edu/~wfischer/Perl_Scripts/#nt2aa .

Mitochondria, in particular, use canonical stop codons to encode 
tryptophan (W); I'm guessing that premature stop codons (introduced by 
failure to account for this) explain your observed "nucleotide 
sequences ... longer than the protein sequences".
_____________________________________________________________
Will Fischer                       wfischer at uts.cc.utexas.edu

University of Texas at Austin      Lab Ph.: 512-232-7114
Integrative Biology                Lab Fax: 512-471-3878
1 University Station C0930
Austin, TX 78712-0253


_____________________________________________________________
Will Fischer                       wfischer at uts.cc.utexas.edu

University of Texas at Austin      Lab Ph.: 512-232-7114
Integrative Biology                Lab Fax: 512-471-3878
1 University Station C0930
Austin, TX 78712-0253



More information about the Bioperl-l mailing list