[BioPython] Re: finding poly_A signal and poly_A site
Ravinder Singh
Ravinder.Singh@colorado.edu
Mon, 20 May 2002 17:03:37 -0600
Hi,
I am interested in finding out what Drosophila genes use more than 1
polyadenylation site and then analyze sequences around the poly_A signal
and poly_A site.
In the Bio.GenBank.genbank_format.py I find a reference, around line
324-325, to feature_key_names, which has many--2 names including :
"polyA_signal", # Signal for cleavage & polyadenylation
"polyA_site", # Site at which polyadenine is added to mRNA
At
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html,
there is a reference to polyA_signal in the annotation shown below.
However, in the genbank files that I've downloaded, I don't find this
feature.
Any advice on how to obtain genes that may use alternative poly(a) sites
from the Drosophila genome or just location of poly(A) signals. The
GenBank sequence file that I have is given at the end, which does not
contain poly(A) site annotation. Am I using a wrong Genbank file or am I
expecting too much from annotation?
Many thanks.
Ravinder
-----
5 Examples of sequence annotation
Note that the examples given below are
only samples of one way a sequence may be annotated and other ways may
also be acceptable.
5.1 Eukaryotic gene
source 1..1509
/organism="Mus musculus"
/strain="CD1"
promoter <1..9
/gene="ubc42"
mRNA join(10..567,789..1320)
/gene="ubc42"
CDS join(54..567,789..1254)
/gene="ubc42"
/product="ubiquitin
conjugating enzyme"
/function="cell division
control"
/translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY
QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS
AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE"
exon 10..567
/gene="ubc42"
/number=1
intron 568..788
/gene="ubc42"
/number=1
exon 789..1320
/gene="ubc42"
/number=2
polyA_signal 1310..1317
/gene="ubc42"
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
LOCUS AE002566 10563131 bp DNA INV 18-OCT-2000
DEFINITION Drosophila melanogaster genomic scaffold 142000013386054,
complete
sequence.
ACCESSION AE002566
VERSION AE002566
KEYWORDS HTG.
SOURCE fruit fly.
ORGANISM Drosophila melanogaster
Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda;
Insecta;
Pterygota; Neoptera; Endopterygota; Diptera; Brachycera;
Muscomorpha; Ephydroidea; Drosophilidae; Drosophila.
REFERENCE 1 (bases 1 to 10563131)
AUTHORS Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A.,
Gocayne,J.D.,
...
mRNA
join(<7998999..7999026,7999137..7999267,7999716..>7999727)
/gene="CG12658"
/product="CT35342"
/db_xref="FLYBASE:FBan0012658"
/db_xref="FLYBASE:FBgn0030020"
/evidence=not_experimental
gene <7998999..>7999727
/gene="CG12658"
/map="7D16-7D17"
/db_xref="FLYBASE:FBan0012658"
/db_xref="FLYBASE:FBgn0030020"
/evidence=not_experimental
CDS
join(7998999..7999026,7999137..7999267,7999716..7999727)
/gene="CG12658"
/note="CG12658 gene product"
/codon_start=1
/db_xref="FLYBASE:FBan0012658"
/db_xref="FLYBASE:FBgn0030020"
/evidence=not_experimental
/protein_id="AAF46347.2"
/db_xref="GI:10728526"
/translation="MPSSPDQIFSWDIGNSVQDAFVIAVEEHARERLQRLAALNRVTP
VDITQLSKKLRN"
...
....
...
BASE COUNT 2942912 a2224721 c2217264 g2927577 t 250657 others
ORIGIN
1 tttattttat ttattcggaa tctgtatttt ctcaatagca ttaaaaataa
ctgtccaaga
61 gcgaaatgcc atacctcatt gaattcgtaa caaaattccc catcgacctg
catttagaaa
...
10563061 taagacgaaa acggaggact cgagtagcca ctctctgaca ataaacttca
tactgatttt
10563121 aacttcaaga a
//
********************************************************************************
Dr. Ravinder Singh
Assistant Professor
MCD Biology
347 UCB
University of Colorado
Boulder, CO 80309-0347
(303)492-8886 (voice)
(303)492-7744 (fax)