[Bioperl-l] Unflattener and GFF3 questions

Scott Cain cain at cshl.org
Mon Dec 15 13:50:28 EST 2003


Chris,

More Unflattener questions.  When I process the Genbank record for
AE003644, I produce the following GFF3:

AE003644        EMBL/GenBank/SwissProt  gene    20111   23268   .       +       .       ID=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;map=35B2-35B2;note=last+curated+on+Thu+Dec+13+16:51:32+PST+2001
AE003644        EMBL/GenBank/SwissProt  mRNA    20111   23268   .       +       .       ID=noc_mRNA_1;Parent=noc;db_xref=FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;product=CG4491-RA
AE003644        EMBL/GenBank/SwissProt  CDS     20495   22410   .       +       .       Parent=noc_mRNA_1;codon_start=1;db_xref=GI:7298163,FLYBASE:FBgn0005771;gene=noc;locus_tag=CG4491;note=noc+gene+product;product=CG4491-PA;protein_id=AAF53399.1;translation=MVVLEGG...
AE003644        EMBL/GenBank/SwissProt  exon    20111   20584   .       +       .       Parent=noc_mRNA_1
AE003644        EMBL/GenBank/SwissProt  exon    20887   23268   .       +       .       Parent=noc_mRNA_1



The first question directly relates to Unflattener: the bounds on the
CDS feature don't seem right; that is, they include intronic regions in
the CDS, whereas in the Genbank file, the CDS is indicated properly with
a 'join':

  CDS             join(20495..20584,20887..22410)

I am guessing this is a problem with the way the CDS feature is created,
correct?

The second question has less to do with Unflattener and more to do with
GFF3.  Do you have any suggestions for encoding relationship types in
GFF3 that is generated like this?  It really matters that exons are
'part_of' and CDSs are 'product_of' mRNAs.  I am trying to decide if
this should be done when the GFF3 is produced, or when the GFF3 is
loaded to the database.  Any suggestions?

Thanks,
Scott



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list