[Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region?

Chris Fields cjfields at illinois.edu
Wed Mar 24 09:06:01 EDT 2010


On Mar 24, 2010, at 7:05 AM, Leighton Pritchard wrote:

> Hi,
> 
> I'm surprised that this issue hasn't come up already, as the change to the
> gene model is quite significant.  For comparison, this is what the old
> bp_genbank2gff3.pl script would produce with --CDS:
> ...
> So, although the new script improves the parent-child relationships by
> identifying parents on the locus_tag field (guaranteed to be unique), rather
> than gene name (not guaranteed to be unique), the GFF3 gene model has
> apparently changed from canonical:
> 
> gene <- mRNA <- {polypeptide/CDS, exon}
> 
> to this:
> 
> region ; exon ; gene <- gene_component_region
> 
> So I guess I don't understand the region-exon-gene part of the new model,
> after all.  This new model doesn't appear to be Sequence Ontology-compatible
> any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon
> is no longer considered part_of the transcript.  In fact, there's not a
> transcript.  Given that the SO cite bp_genbank2gff3.pl as a way to get
> SO-compliant GFF3 
> (http://www.sequenceontology.org/resources/faq.html#convert), this might be
> an issue requiring a prompt fix or reversion.


I agree.  I think this commit needs more code review to understand the reasoning behind it, though it will be a little trickier than a simple reversion (I think there have been additional unrelated commits since then).  Nathan, was this the intent, or is this a bug?  I would agree with Leighton that it's the latter.

chris

> For now, due to the downstream problems this model causes with GBROWSE and
> ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the
> script to use the locus_tag field rather than the gene field for the feature
> ID.
> 
> Cheers,
> 
> L.







More information about the Bioperl-l mailing list