[Bioperl-l] Bio::SeqFeature::Generic unexpected behavior

Chris Fields cjfields at illinois.edu
Wed Oct 7 23:52:12 EDT 2009


On Oct 7, 2009, at 5:13 PM, Jonathan Flowers wrote:

> Hi,
>
> I am building Bio::SeqFeature::Gene::Transcript objects to  
> facilitate easy
> parsing of cds and intron sequences.  I ran across an unexpected  
> problem
> when creating two or more Transcript objects and attaching independent
> Bio::Seq objects to them. In the following code, 2 transcript  
> objects are
> constructed independently (not cloned).  However, extracting the cds  
> for one
> transcript gives an errant (and unexpected result).  Curiously,  
> extracting
> the introns from the same transcript objects gives the correct  
> result.  I
> have highlighted the unexpected outcome below.  Finally, I noticed  
> that
> reversing the order of the attach_seq commands determines which cds is
> printed.

So, is this a problem with Bio::SeqFeature::Generic, or  
Bio::SeqFeature::Gene?  Just a bit confused by the subject line.

> The following example illustrates the problem.
>
> Can someone explain this?
>
> Jonathan
>
> #! usr/bin/env perl
> use strict;
> use warnings;
> use Bio::FeatureIO::gff;
> use Bio::SeqFeature::Annotated;
> use Bio::SeqFeature::Gene::Transcript;
> use Bio::SeqFeature::Gene::Exon;
> #use Bio::SeqFeature::Gene::UTR;
> #use Bio::SeqFeature::Gene::Intron;
>
> #Build the objects...
> my $seq1 = Bio::Seq->new(-seq => 'AAAAA');
> my $seq2 = Bio::Seq->new(-seq => 'TTTTT');
>
> my $tscript1 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, - 
> end =>
> 5, -strand => +1);
> my $tscript2 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, - 
> end =>
> 5, -strand => +1);
>
> my $exon1 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2,
> -strand => +1);
> my $exon2 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5,
> -strand => +1);
>
> $tscript1->add_exon($exon1);
> $tscript1->add_exon($exon2);
>
> $tscript2->add_exon($exon1);
> $tscript2->add_exon($exon2);
>
> $tscript1->attach_seq($seq1);
> $tscript2->attach_seq($seq2);

So, my guess is Bio::SF::Gene::Transcript is attaching the sequences  
to the contained exons (you are using the same two exon instances for  
both transcripts).  I don't think this is a bug; you should create a  
different set of exons for the second transcript, not use the same  
exon instances for both.  This behavior, to me, makes sense using the  
following logic: if two (or more) transcripts share the same exons  
(such as in alternatively spliced transcripts), underlying sequence  
changes should affect both transcripts and their shared exons.   
Otherwise they shouldn't share exons (i.e. require different instances).

If you change the above to the following:

my $exon3 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2,
-strand => +1);
my $exon4 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5,
-strand => +1);

$tscript2->add_exon($exon3);
$tscript2->add_exon($exon4);

it works; the first statement prints 'AAAA' and the second 'TTTT'.

There is a subtle bug pointed with your example, though.  The the two  
transcripts have two different sequences but the exons for both now  
point to the same sequence (your first transcript prints 'A', but  
everything else has T's).  I don't think there is an easy workaround  
for that, unless there is a way to call the parent transcript feature  
from the exon features.

> #now extract the desired features and print
> print $tscript1->cds->seq,"\n"; #prints 'TTTT' <- unexpected !!!
> print $tscript2->cds->seq,"\n"; #prints 'TTTT' <- as expected
>
> foreach($tscript1->introns){
>    print $_->seq->seq,"\n";     #prints 'A'   <- as expected
> }
>
> foreach($tscript2->introns){
>    print $_->seq->seq,"\n";    #prints 'T'  <- as expected
> }

Hope that explanation helps!

chris




More information about the Bioperl-l mailing list