[Bioperl-l] module for unflattening GenBank/EMBL/DDBJ records

Chris Mungall cjm at fruitfly.org
Thu Jul 17 16:09:32 EDT 2003


Hi Peili

I thought I'd tested for this case, apparently not.

Ok, I think the correct approach here is to just ditch the introns, and
attach the exons underneath the mRNAs. I'll make this fix and commit.

On Thu, 17 Jul 2003, Peili Zhang wrote:

> Hi Chris,
>
> I tried to use your Unflatener, but don't seem to understand the results I got
> back. can you take a look and let me know if I'm using the unflattener
> correctly?
>
> I have my test script (testUnflattener.pl) and one of the ARGS GB files
> (AnnIX.v003) attached. below is the output from running testUnflattener.pl
> ('unknown' is for features w/o the /symbol tag):
>
>  source: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  mRNA: AnnIX-RA
>  mRNA: AnnIX-RB
>  intron: unknown
>  CDS: AnnIX-P2
>  CDS: AnnIX-P1
>  intron: unknown
>  intron: unknown
>  intron: unknown
>  intron: unknown
>
>
> then I added the /gene tag for all the mRNA/CDS/exon/source features in
> AnnIX.v003 and changed the 'source' feature to be 'gene' feature. the output now
> changed to:
>
>  gene: unknown
>          mRNA: AnnIX-RA
>                  CDS: AnnIX-P1
>          mRNA: AnnIX-RB
>                  CDS: AnnIX-P2
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  exon: unknown
>  intron: unknown
>  intron: unknown
>  intron: unknown
>  intron: unknown
>  intron: unknown
>
> I'm not worried about the introns, they're not going into chado. but I'm
> concerned that exons are not put into the hierarchy but CDS's are instead. this
> comes back to our discussion on the chado feature graph/object model etc. I
> understand I can infer exons from the join locations of mRNA's, but I have to
> include the exons in the tree if they're explicitly listed in the GB file, since
> the tags for the exons are important annotation information to be loaded into
> chado. is it hard for you to make such changes to your code? furthermore,
> according to our chado implementation, I'll need to change CDS's to be 'protein'
> features.
>
> let me know what you think. thanks.
>
> Peili
>
> >Date: Tue, 15 Jul 2003 11:49:40 -0700 (PDT)
> >From: Chris Mungall <cjm at fruitfly.org>
> >X-X-Sender: <cjm at heartbroken.lbl.gov>
> >To: Peili Zhang <peili at morgan.harvard.edu>
> >Cc: <birney at ebi.ac.uk>, <bioperl-l at bioperl.org>, <emmert at morgan.harvard.edu>
> >Subject: Re: [Bioperl-l] module for unflattening GenBank/EMBL/DDBJ records
> >MIME-Version: 1.0
> >X-Virus-Scanned: by amavisd-new
> >X-Spam-Status: No, hits=-103.0 required=3.0
> tests=EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT,
> SPAM_PHRASE_00_01,USER_AGENT_PINE,USER_IN_WHITELIST version=2.43
> >X-Spam-Level:
> >
> >Yes, it is committed
> >
> >Bio::SeqFeature::Tools::Unflattener
> >
> >cheers
> >Chris
> >
>



More information about the Bioperl-l mailing list