[Bioperl-l] Getting CDS boundaries from Unflattener

Scott Cain cain at cshl.org
Thu Dec 18 09:45:47 EST 2003


Hi Chris,

I very much what to reimplement Bio::DB::GFF::Adaptor::biofetch using
Unflattener, but but there are a few problems I am having.  Below is a
section of GFF that I generate using Unflattener from AE003644:

AE003644        EMBL/GenBank/SwissProt  gene    20111   23268   .       +       .       ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001
AE003644        EMBL/GenBank/SwissProt  mRNA    20111   23268   .       +       .       ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA
AE003644        EMBL/GenBank/SwissProt  CDS     20495   22410   .       +       .       Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGGGGV...
AE003644        EMBL/GenBank/SwissProt  exon    20111   20584   .       +       .       Parent=noc_mRNA_1
AE003644        EMBL/GenBank/SwissProt  exon    20887   23268   .       +       .       Parent=noc_mRNA_1

The biggest problem with this set of data is that the CDS spans
introns.  The CDS really ought to be broken up into segments to match
the exon boundaries.  As it is, it breaks display in gbrowse whether it
is using chado or a GFF database as a backend.

The other problem is that the exons' parentage is incorrect.  The exons
should be features of the gene, not the mRNA.

Thanks,
Scott



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list