[BioPython] Re: finding poly_A signal and poly_A site

Ravinder Singh Ravinder.Singh@colorado.edu
Mon, 20 May 2002 17:03:37 -0600


Hi,
I am interested in finding out what Drosophila genes use more than 1
polyadenylation site and then analyze sequences around the poly_A signal
and poly_A site.

In the Bio.GenBank.genbank_format.py I find a reference, around line
324-325, to feature_key_names, which has many--2 names including :
    "polyA_signal",     # Signal for cleavage & polyadenylation
    "polyA_site",       # Site at which polyadenine is added to mRNA

At
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html,
there is a reference to polyA_signal in the annotation shown below.
However, in the genbank files that I've downloaded, I don't find this
feature.

Any advice on how to obtain genes that may use alternative poly(a) sites
from the Drosophila genome or just location of poly(A) signals. The
GenBank sequence file that I have is given at the end, which does not
contain poly(A) site annotation. Am I using a wrong Genbank file or am I
expecting too much from annotation?
Many thanks.
Ravinder
-----


5 Examples of sequence annotation

                              Note that the examples given below are
only samples of one way a sequence may be annotated and other ways may
                               also                 be acceptable.

                              5.1 Eukaryotic gene

                              source          1..1509
                                              /organism="Mus musculus"
                                              /strain="CD1"
                              promoter        <1..9
                                              /gene="ubc42"
                              mRNA            join(10..567,789..1320)
                                              /gene="ubc42"
                              CDS             join(54..567,789..1254)
                                              /gene="ubc42"
                                              /product="ubiquitin
conjugating enzyme"
                                              /function="cell division
control"

/translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY

QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS

AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE"
                              exon            10..567
                                              /gene="ubc42"
                                              /number=1
                              intron          568..788
                                              /gene="ubc42"
                                              /number=1
                              exon            789..1320
                                              /gene="ubc42"
                                              /number=2
                              polyA_signal    1310..1317
                                              /gene="ubc42"

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

LOCUS       AE002566  10563131 bp    DNA    INV       18-OCT-2000
DEFINITION  Drosophila melanogaster genomic scaffold 142000013386054,
complete
            sequence.
ACCESSION   AE002566
VERSION     AE002566
KEYWORDS    HTG.
SOURCE      fruit fly.
  ORGANISM  Drosophila melanogaster
            Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda;
Insecta;
            Pterygota; Neoptera; Endopterygota; Diptera; Brachycera;
            Muscomorpha; Ephydroidea; Drosophilidae; Drosophila.
REFERENCE   1  (bases 1 to 10563131)
  AUTHORS   Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A.,
Gocayne,J.D.,
...
 mRNA
join(<7998999..7999026,7999137..7999267,7999716..>7999727)
                     /gene="CG12658"
                     /product="CT35342"
                     /db_xref="FLYBASE:FBan0012658"
                     /db_xref="FLYBASE:FBgn0030020"
                     /evidence=not_experimental
     gene            <7998999..>7999727
                     /gene="CG12658"
                     /map="7D16-7D17"
                     /db_xref="FLYBASE:FBan0012658"
                     /db_xref="FLYBASE:FBgn0030020"
                     /evidence=not_experimental
     CDS
join(7998999..7999026,7999137..7999267,7999716..7999727)
                     /gene="CG12658"
                     /note="CG12658 gene product"
                     /codon_start=1
                     /db_xref="FLYBASE:FBan0012658"
                     /db_xref="FLYBASE:FBgn0030020"
                     /evidence=not_experimental
                     /protein_id="AAF46347.2"
                     /db_xref="GI:10728526"

/translation="MPSSPDQIFSWDIGNSVQDAFVIAVEEHARERLQRLAALNRVTP
                     VDITQLSKKLRN"
...
....
...
BASE COUNT  2942912 a2224721 c2217264 g2927577 t 250657 others
ORIGIN
        1 tttattttat ttattcggaa tctgtatttt ctcaatagca ttaaaaataa
ctgtccaaga
       61 gcgaaatgcc atacctcatt gaattcgtaa caaaattccc catcgacctg
catttagaaa
...
 10563061 taagacgaaa acggaggact cgagtagcca ctctctgaca ataaacttca
tactgatttt
 10563121 aacttcaaga a
//
********************************************************************************

Dr. Ravinder Singh
Assistant Professor
MCD Biology
347 UCB
University of Colorado
Boulder, CO 80309-0347

(303)492-8886 (voice)
(303)492-7744 (fax)