[Bioperl-l] TGA as U in selenocystine fullCDS

Heikki Lehvaslaiho heikki at ebi.ac.uk
Fri Feb 18 06:28:12 EST 2005


Albert,

The best way to deal with this would be to have genetic code that correctly 
translates into selenocysteine. Unfortunately I could not find anything on 
the topic on Taxonomy Genetic codes home page: 
<http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi>.
I guess I should ask around if there are plans to deal with this.
Are those CDSs from EMBL or Genbank? If so, could send me a few accession 
numbers to check.

The translate method has already too many optional arguments, so rather not 
put in any more solely for dealing with celenocysteine. 

Could you put together (and send to me) data lines for @NAMES, @TABLES and 
@STARTS in Bio::Tools::CodonTables and call it tentatively "Standard with 
celenocystein" and use id 20 which has been merged with existing codes and 
not currently in use. That should provide a working code for your purposes 
while I try to find a consensus on this.

	-Heikki




On Thursday 17 February 2005 13:55, Albert Vilella wrote:
> Hi,
>
> I'm dealing with some CDS having a "U" selenocystines for which I use
> the translate method:
>
> while($seq = $input->next_seq()){
>     $pseq = $seq->translate(undef, undef, undef, undef, $tableid);
>     $aa_input->write_seq($pseq);
>     push (@seq_array, $pseq);
> }
>
> These being CDS (aka fullCDS in translate() notation), they shouldn't
> have stop codons in the middle of the sequence, so after checking in
> Genbank, I found that the TGA's are actually selenocystines.
>
> Right now, using bioperl's translate w/fullCDS, if a stop is found in
> the middle of the sequence, it will result in a warn or a throw.
>
> Maybe we could add another option to deal with selenocysteines.
>
> Comments?
>
>     Albert.
>
> Bio/PrimarySeqI.pm
> -------------------
>
> # only if we are expecting to translate a complete coding region
> if ($fullCDS) {
>     my $id = $self->display_id;
>     #remove the stop character
>     if( substr($output,-1,1) eq $stop ) {
>         chop $output;
>     } else {
>         $throw && $self->throw("Seq [$id]: Not using a valid terminator
> codon!");
>         $self->warn("Seq [$id]: Not using a valid terminator codon!");
>     }
>     # test if there are terminator characters inside the protein
> sequence!
>     if ($output =~ /\*/) {
>         $throw && $self->throw("Seq [$id]: Terminator codon inside
> CDS!");
>         $self->warn("Seq [$id]: Terminator codon inside CDS!");
>     }
>     # if the initiator codon is not ATG, the amino acid needs to changed
> into M
>     if ( substr($output,0,1) ne 'M' ) {
>         if ($codonTable->is_start_codon(substr($seq, 0, 3)) ) {
>     	$output = 'M'. substr($output,1);
>         }
>         elsif ($throw) {
>     	$self->throw("Seq [$id]: Not using a valid initiator codon!");
>         } else {
>     	$self->warn("Seq [$id]: Not using a valid initiator codon!");
>         }
>     }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list