[Bioperl-l] Pulling exons out of a Genbank mRNA

Amir Karger akarger at CGR.Harvard.edu
Mon Feb 13 20:57:08 UTC 2006


I'm trying to get the sequences of each exon in a gene. I have a genbank
file with mRNA and exon features (among others) that look like: 
     mRNA            join(complement(22257..22386),complement(22067..22186),
                     complement(16753..17101),complement(13840..13962),
                     complement(10649..10820),complement(502..3028))
                     /gene="ENSG00000005812"
                     /note="transcript_id=ENST00000355619"
     exon            complement(13840..13962)
                     /note="exon_id=ENSE00000802462"

I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
the mRNA above. I tried writing the below code, but it doesn't do what I
want. (You'll note that the code is stolen from the Bio::Seq and Feature
HOWTOs.)

my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
while (my $seq = $inseq->next_seq) {
    my @features = $seq->get_SeqFeatures(); # just top level
    foreach my $feat ( @features ) {
        my $type = $feat->primary_tag;
        if ($type eq "mRNA") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
                my @feats = $feat->get_SeqFeatures();
                print "Found ", scalar @feats, " sub-features\n";
        } elsif ($type eq "exon") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
        }
     }
}

When I run the above, it says that the mRNA features have no sub-features.
So how do I pull out the 6 sequences?

Thanks,
- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626



More information about the Bioperl-l mailing list