[BioPython] splice variants in GenBank/Entrez

Bruce Southey bsouthey at gmail.com
Mon Jun 9 13:25:44 UTC 2008


Albert Krewinkel wrote:
> Hi Steve,
>
> On Sun, Jun 08, 2008 at 10:21:50PM -0700, C. G. wrote:
>   
>> I've been using BioPython for a few projects the last
>> two months to process BLAST results but now I need to
>> take those results and determine which of them have
>> known splice variants. By "known" I mean those that
>> have annotations contained in a database that indicate
>> they have (or are) splice variants.
>>     
>
> Depending on which organism you are looking at, you might want to use
> the Ensembl genome database.  There is no biopython interface, but you
> can use the jython interface from their website (at least they once
> had one, I didn't check if that's still the case).  Otherwise you
> might have to use perl or java packages for that.
>
> Another good resource for this is the Alternative Splicing Database:
> http://www.ebi.ac.uk/asd/
>
> Hope that helps,
>
> Albert
>
>
>   
The 'ALTERNATIVE PRODUCTS' section of CC lines in a UniProt (SwissProt) 
record can contain alternative splicing information. See for example, 
the manual section:
**3.12.5. Syntax of the topic 'ALTERNATIVE PRODUCTS'**
http://ca.expasy.org/sprot/userman.html#CCAP
(Given below for completeness).

Bruce

Example of the CC lines and the corresponding FT lines for an entry with 
alternative splicing:

    CC   -!- ALTERNATIVE PRODUCTS:
    CC       Event=Alternative splicing, Alternative initiation; Named isoforms=8;
    CC         Comment=Additional isoforms seem to exist;
    CC       Name=1; Synonyms=Non-muscle isozyme;
    CC         IsoId=Q15746-1; Sequence=Displayed;
    CC       Name=2;
    CC         IsoId=Q15746-2; Sequence=VSP_004791;
    CC       Name=3A;
    CC         IsoId=Q15746-3; Sequence=VSP_004792, VSP_004794;
    CC       Name=3B;
    CC         IsoId=Q15746-4; Sequence=VSP_004791, VSP_004792, VSP_004794;
    CC       Name=4;
    CC         IsoId=Q15746-5; Sequence=VSP_004792, VSP_004793;
    CC       Name=Del-1790;
    CC         IsoId=Q15746-6; Sequence=VSP_004795;
    CC       Name=5; Synonyms=Smooth-muscle isozyme;
    CC         IsoId=Q15746-7; Sequence=VSP_018845;
    CC         Note=Produced by alternative initiation at Met-923 of isoform 1;
    CC       Name=6; Synonyms=Telokin;
    CC         IsoId=Q15746-8; Sequence=VSP_018846;
    CC         Note=Produced by alternative initiation at Met-1761 of isoform
    CC         1. Has no catalytic activity;
    ...
    FT   VAR_SEQ       1   1760       Missing (in isoform 6).
    FT                                /FTId=VSP_018846.
    FT   VAR_SEQ       1    922       Missing (in isoform 5).
    FT                                /FTId=VSP_018845.
    FT   VAR_SEQ     437    506       VSGIPKPEVAWFLEGTPVRRQEGSIEVYEDAGSHYLCLLKA
    FT                                RTRDSGTYSCTASNAQGQVSCSWTLQVER -> G (in
    FT                                isoform 2 and isoform 3B).
    FT                                /FTId=VSP_004791.
    FT   VAR_SEQ    1433   1439       DEVEVSD -> MKWRCQT (in isoform 3A,
    FT                                isoform 3B and isoform 4).
    FT                                /FTId=VSP_004792.
    FT   VAR_SEQ    1473   1545       Missing (in isoform 4).
    FT                                /FTId=VSP_004793.
    FT   VAR_SEQ    1655   1705       Missing (in isoform 3A and isoform 3B).
    FT                                /FTId=VSP_004794.
    FT   VAR_SEQ    1790   1790       Missing (in isoform Del-1790).
    FT                                /FTId=VSP_004795.
      

    CC   -!- ALTERNATIVE PRODUCTS:
    CC       Event=Alternative splicing, Alternative initiation; Named isoforms=3;
    CC         Comment=Isoform 1 and isoform 2 arise due to the use of two
    CC         alternative first exons joined to a common exon 2 at the same
    CC         acceptor site but in different reading frames, resulting in two
    CC         completely different isoforms;
    CC       Name=1; Synonyms=p16INK4a;
    CC         IsoId=O77617-1; Sequence=Displayed;
    CC       Name=3;
    CC         IsoId=O77617-2; Sequence=VSP_018701;
    CC         Note=Produced by alternative initiation at Met-35 of isoform 1.
    CC         No experimental confirmation available;
    CC       Name=2; Synonyms=p19ARF;
    CC         IsoId=O77618-1; Sequence=External;
    ..
    FT   VAR_SEQ       1     34       Missing (in isoform 3).
    FT                                /FTId=VSP_004099.
      







More information about the Biopython mailing list