[Bioperl-l] Bio::SeqFeature::Generic unexpected behavior
Chris Fields
cjfields at illinois.edu
Wed Oct 7 23:52:12 EDT 2009
On Oct 7, 2009, at 5:13 PM, Jonathan Flowers wrote:
> Hi,
>
> I am building Bio::SeqFeature::Gene::Transcript objects to
> facilitate easy
> parsing of cds and intron sequences. I ran across an unexpected
> problem
> when creating two or more Transcript objects and attaching independent
> Bio::Seq objects to them. In the following code, 2 transcript
> objects are
> constructed independently (not cloned). However, extracting the cds
> for one
> transcript gives an errant (and unexpected result). Curiously,
> extracting
> the introns from the same transcript objects gives the correct
> result. I
> have highlighted the unexpected outcome below. Finally, I noticed
> that
> reversing the order of the attach_seq commands determines which cds is
> printed.
So, is this a problem with Bio::SeqFeature::Generic, or
Bio::SeqFeature::Gene? Just a bit confused by the subject line.
> The following example illustrates the problem.
>
> Can someone explain this?
>
> Jonathan
>
> #! usr/bin/env perl
> use strict;
> use warnings;
> use Bio::FeatureIO::gff;
> use Bio::SeqFeature::Annotated;
> use Bio::SeqFeature::Gene::Transcript;
> use Bio::SeqFeature::Gene::Exon;
> #use Bio::SeqFeature::Gene::UTR;
> #use Bio::SeqFeature::Gene::Intron;
>
> #Build the objects...
> my $seq1 = Bio::Seq->new(-seq => 'AAAAA');
> my $seq2 = Bio::Seq->new(-seq => 'TTTTT');
>
> my $tscript1 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -
> end =>
> 5, -strand => +1);
> my $tscript2 = Bio::SeqFeature::Gene::Transcript->new(-start => 1, -
> end =>
> 5, -strand => +1);
>
> my $exon1 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2,
> -strand => +1);
> my $exon2 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5,
> -strand => +1);
>
> $tscript1->add_exon($exon1);
> $tscript1->add_exon($exon2);
>
> $tscript2->add_exon($exon1);
> $tscript2->add_exon($exon2);
>
> $tscript1->attach_seq($seq1);
> $tscript2->attach_seq($seq2);
So, my guess is Bio::SF::Gene::Transcript is attaching the sequences
to the contained exons (you are using the same two exon instances for
both transcripts). I don't think this is a bug; you should create a
different set of exons for the second transcript, not use the same
exon instances for both. This behavior, to me, makes sense using the
following logic: if two (or more) transcripts share the same exons
(such as in alternatively spliced transcripts), underlying sequence
changes should affect both transcripts and their shared exons.
Otherwise they shouldn't share exons (i.e. require different instances).
If you change the above to the following:
my $exon3 = Bio::SeqFeature::Gene::Exon->new(-start => 1 , -end => 2,
-strand => +1);
my $exon4 = Bio::SeqFeature::Gene::Exon->new(-start => 4 , -end => 5,
-strand => +1);
$tscript2->add_exon($exon3);
$tscript2->add_exon($exon4);
it works; the first statement prints 'AAAA' and the second 'TTTT'.
There is a subtle bug pointed with your example, though. The the two
transcripts have two different sequences but the exons for both now
point to the same sequence (your first transcript prints 'A', but
everything else has T's). I don't think there is an easy workaround
for that, unless there is a way to call the parent transcript feature
from the exon features.
> #now extract the desired features and print
> print $tscript1->cds->seq,"\n"; #prints 'TTTT' <- unexpected !!!
> print $tscript2->cds->seq,"\n"; #prints 'TTTT' <- as expected
>
> foreach($tscript1->introns){
> print $_->seq->seq,"\n"; #prints 'A' <- as expected
> }
>
> foreach($tscript2->introns){
> print $_->seq->seq,"\n"; #prints 'T' <- as expected
> }
Hope that explanation helps!
chris
More information about the Bioperl-l
mailing list