[Bioperl-l] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region?
Leighton Pritchard
Leighton.Pritchard at scri.ac.uk
Tue Mar 23 16:35:42 UTC 2010
Hi,
I can't seem to find any discussion of this on the mailing list archives (if
anyone has a link, I'll happily follow it), so I was wondering what the
rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live
mapping CDS features to gene_component_region.
For example, if I use the script on the E.coli sequence/annotation
NC_000913.gbk, the gene:
gene 190..255
/gene="thrL"
/locus_tag="b0001"
/note="synonyms: ECK0001, JW4367"
/db_xref="EcoGene:EG11277"
/db_xref="ECOCYC:EG11277"
/db_xref="GeneID:944742"
CDS 190..255
/gene="thrL"
/locus_tag="b0001"
/function="leader; Amino acid biosynthesis: Threonine"
/function="1.5.1.8 metabolism; building block
biosynthesis; amino acids; threonine"
/note="GO_process: threonine biosynthetic process [goid
0009088]"
/codon_start=1
/transl_table=11
/product="thr operon leader peptide"
/protein_id="NP_414542.1"
/db_xref="ASAP:ABE-0000006"
/db_xref="UniProtKB/Swiss-Prot:P0AD86"
/db_xref="GI:16127995"
/db_xref="EcoGene:EG11277"
/db_xref="ECOCYC:EG11277"
/db_xref="GeneID:944742"
/translation="MKRISTTITTTITITTGNGAG"
Is mapped to
NC_000913 GenBank region 190 255 . + .
ID=GenBank:region:NC_000913:190:255
NC_000913 GenBank exon 190 255 . + .
ID=GenBank:exon:NC_000913:190:255
NC_000913 GenBank gene 190 255 . + .
ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms:
ECK0001%2C JW4367;gene=thrL;locus_tag=b0001
NC_000913 GenBank gene_component_region 190 255 . +
.
Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995
,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine
biosynthetic process [goid
0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino
acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block
biosynthesis%3B amino acids%3B
threonine;gene=thrL;locus_tag=b0001;product=thr operon leader
peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG
NGAG
I understand the region-exon-gene part of the model, but not the
gene_component_region, which appears to be a catch-all. I would have
assumed that the CDS is better mapped to a polypeptide, as described in the
CHADO documentation:
http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model
There is no difference in script output whether --CDS or --noCDS is used.
Cheers,
L.
--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.
The Scottish Crop Research Institute is a charitable company limited by guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
DISCLAIMER:
This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________
More information about the Bioperl-l
mailing list