[Bioperl-l] Pulling exons out of a Genbank mRNA

Brian Osborne osborne1 at optonline.net
Tue Feb 14 01:38:30 UTC 2006


Amir,

The idea is to look at the sub-locations in the SplitLocation object, this
is discussed in FAQ 5.2:

http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st
atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F

The sequence of the feature itself can be obtained by using the entire_seq()
method:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences


Brian O.


On 2/13/06 3:57 PM, "Amir Karger" <akarger at CGR.Harvard.edu> wrote:

> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
> 
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
> 
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
> 
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list