[Bioperl-l] Re: Getting CDS boundaries from Unflattener

Scott Cain cain at cshl.org
Fri Dec 19 10:48:17 EST 2003


On Thu, 2003-12-18 at 16:52, Chris Mungall wrote:
> On Thu, 18 Dec 2003, Scott Cain wrote:

> > The biggest problem with this set of data is that the CDS spans
> > introns.  The CDS really ought to be broken up into segments to match
> > the exon boundaries.  As it is, it breaks display in gbrowse whether it
> > is using chado or a GFF database as a backend.
> 
> When I use the unflattener on AE003644, the CDSs I get out have split
> locations which match the coding exon boundaries - are you sure this isn't
> a problem with the GFF code? Are you doing all the usual weird stuff like:
> 
>         if ($sf->location->isa("Bio::Location::SplitLocationI")) {
>             @locs = $sf->location->each_Location;
>         }

Oops--read that documentation, Scott.  OK, I fixed Bio::Tools::GFF to
deal with split locations.
> 
> > The other problem is that the exons' parentage is incorrect.  The exons
> > should be features of the gene, not the mRNA.
> 
> I think you have this the wrong way round. Again, this must be a problem
> with how you're assigning parent tags in the GFF output, when I try
> AE003644 the exons are children of the mRNA, which is correct.
> 
I don't think so; here are the relevant lines from SO:

    @is_a at gene ; SO:0000704 ; SOFA:SOFA ; SOFA:region
     @part_of at transcript ; SO:0000673 ; SOFA:SOFA ; SOFA:region
      @part_of at exon ; SO:0000147 ; SOFA:SOFA ; SOFA:region
      @is_a at processed_transcript ; SO:0000233 ; SOFA:SOFA ; SOFA:region
       @is_a at mRNA ; SO:0000234 ; SOFA:SOFA ; SOFA:region ; synonym:messenger_RNA
        @part_of at CDS ; SO:0000316 ; SOFA:SOFA ; SOFA:region ; synonym:coding_sequence

Now, I am not one to be lecturing on ontologies, so I may have
misinterpreted something here, but it looks to me like exon is part of a
transcript, but not part of an mRNA.  And since we typically don't have
transcript features in Genbank records, exon should be part_of gene.  An
alternative would be to infer a transcript feature for each mRNA feature
and tie the exons to the transcript features, but leaving the mRNAs and
CDSs as is.

Thanks,
Scott


> >
> >
> >
> >
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list