[Bioperl-l] TGA as U in selenocystine fullCDS

Albert Vilella avilella at ub.edu
Thu Feb 17 08:55:07 EST 2005


Hi,

I'm dealing with some CDS having a "U" selenocystines for which I use
the translate method:

while($seq = $input->next_seq()){
    $pseq = $seq->translate(undef, undef, undef, undef, $tableid);
    $aa_input->write_seq($pseq);
    push (@seq_array, $pseq);
}

These being CDS (aka fullCDS in translate() notation), they shouldn't
have stop codons in the middle of the sequence, so after checking in
Genbank, I found that the TGA's are actually selenocystines.

Right now, using bioperl's translate w/fullCDS, if a stop is found in
the middle of the sequence, it will result in a warn or a throw.

Maybe we could add another option to deal with selenocysteines.

Comments?

    Albert.

Bio/PrimarySeqI.pm
-------------------

# only if we are expecting to translate a complete coding region
if ($fullCDS) {
    my $id = $self->display_id;
    #remove the stop character
    if( substr($output,-1,1) eq $stop ) {
        chop $output;
    } else {
        $throw && $self->throw("Seq [$id]: Not using a valid terminator
codon!");
        $self->warn("Seq [$id]: Not using a valid terminator codon!");
    }
    # test if there are terminator characters inside the protein
sequence!
    if ($output =~ /\*/) {
        $throw && $self->throw("Seq [$id]: Terminator codon inside
CDS!");
        $self->warn("Seq [$id]: Terminator codon inside CDS!");
    }
    # if the initiator codon is not ATG, the amino acid needs to changed
into M
    if ( substr($output,0,1) ne 'M' ) {
        if ($codonTable->is_start_codon(substr($seq, 0, 3)) ) {
    	$output = 'M'. substr($output,1);
        }
        elsif ($throw) {
    	$self->throw("Seq [$id]: Not using a valid initiator codon!");
        } else {
    	$self->warn("Seq [$id]: Not using a valid initiator codon!");
        }
    }
}


More information about the Bioperl-l mailing list