[Biopython] The problem of using Bio.SwissProt

De-Chang Yang yangdc at mail.cbi.pku.edu.cn
Tue Sep 10 15:21:23 UTC 2019


Dear Biopython team,
    Hi, this is Dechang Yang.
    I want to search some information from swissProt databases by using BioPython. Then i find the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc139.
    But to my surprise, i find the  KeyWList module of Bio.SwissProt seems to be out of date......
    When i type: help(KeyWList) 
    I get the class infomation:

     |      ---------  ---------------------------     ----------------------
     |      Line code  Content                         Occurrence in an entry
     |      ---------  ---------------------------     ----------------------
     |      ID         Identifier (keyword)            Once; starts a keyword entry
     |      IC         Identifier (category)           Once; starts a category entry
     |      AC         Accession (KW-xxxx)             Once
     |      DE         Definition                      Once or more
     |      SY         Synonyms                        Optional; once or more
     |      GO         Gene ontology (GO) mapping      Optional; once or more
     |      HI         Hierarchy                       Optional; once or more
     |      WW         Relevant WWW site               Optional; once or more
     |      CA         Category                        Once per keyword entry; absent
     |                                                 in category entries

   You can see the Line Code include some KEYS, but i have to say those KEYS are inconsistent with the lastest swissProt KeyWList file. Which are like the content below:(DR CC RX and most of the lines will be ignored by the KeyWList module)


RP   TISSUE SPECIFICITY, AND SUBCELLULAR LOCATION.
RX   PubMed=24154973; DOI=10.1002/ijc.28557;
RA   Peltekova V.D., Lemire M., Qazi A.M., Zaidi S.H., Trinh Q.M.,
RA   Bielecki R., Rogers M., Hodgson L., Wang M., D'Souza D.J., Zandi S.,
RA   Chong T., Kwan J.Y., Kozak K., De Borja R., Timms L., Rangrej J.,
RA   Volar M., Chan-Seng-Yue M., Beck T., Ash C., Lee S., Wang J.,
RA   Boutros P.C., Stein L.D., Dick J.E., Gryfe R., McPherson J.D.,
RA   Zanke B.W., Pollett A., Gallinger S., Hudson T.J.;
RT   "Identification of genes expressed by immune cells of the colon that
RT   are regulated by colorectal cancer-associated variants.";
RL   Int. J. Cancer 134:2330-2341(2014).
CC   -!- SUBCELLULAR LOCATION: Membrane {ECO:0000269|PubMed:24154973};
CC       Single-pass membrane protein {ECO:0000269|PubMed:24154973}.
CC       Note=Co-localizes with crystalloid granules of eosinophils and
CC       granular organelles of mast cells, neutrophils, macrophages and
CC       dendritic cells.
CC   -!- TISSUE SPECIFICITY: Expressed in gastrointestinal and immune
CC       tissue, as well as prostate, testis and ovary. Expressed in lamina
CC       propria and eosinophils but not in epithelial cells. Expression is
CC       greater in benign adjacent tissues than in colon tumors.
CC       {ECO:0000269|PubMed:24154973}.
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   -----------------------------------------------------------------------
DR   EMBL; AK127703; -; NOT_ANNOTATED_CDS; mRNA.
DR   EMBL; AP002448; -; NOT_ANNOTATED_CDS; Genomic_DNA.
DR   RefSeq; NP_001289573.1; NM_001302644.1.
DR   RefSeq; NP_001289574.1; NM_001302645.1.
DR   RefSeq; NP_001289575.1; NM_001302646.1.
DR   RefSeq; NP_001289576.1; NM_001302647.1.
DR   RefSeq; NP_001289577.1; NM_001302648.1.
DR   RefSeq; NP_997312.1; NM_207429.3.
DR   BioMuta; HGNC:33789; -.
DR   DMDM; 74711342; -.
DR   PaxDb; Q6ZS62; -.
DR   PRIDE; Q6ZS62; -.
DR   ProteomicsDB; 68193; -.
DR   GeneID; 399948; -.
DR   KEGG; hsa:399948; -.
DR   CTD; 399948; -.
DR   DisGeNET; 399948; -.
DR   GeneCards; COLCA1; -.
DR   HGNC; HGNC:33789; COLCA1.
DR   MIM; 615693; gene.
DR   neXtProt; NX_Q6ZS62; -.
DR   PharmGKB; PA164716768; -.
DR   eggNOG; ENOG410JDIH; Eukaryota.
DR   eggNOG; ENOG4111630; LUCA.
DR   HOGENOM; HOG000111748; -.
DR   InParanoid; Q6ZS62; -.
DR   OrthoDB; 1566774at2759; -.
DR   PhylomeDB; Q6ZS62; -.
DR   TreeFam; TF354066; -.
DR   ChiTaRS; COLCA1; human.
DR   GenomeRNAi; 399948; -.
DR   PRO; PR:Q6ZS62; -.
DR   Proteomes; UP000005640; Unplaced.
DR   GO; GO:0016021; C:integral component of membrane; IEA:UniProtKB-KW.
DR   GO; GO:0016020; C:membrane; IDA:UniProtKB.
PE   2: Evidence at transcript level;
KW   Complete proteome; Membrane; Reference proteome; Transmembrane;
KW   Transmembrane helix.
FT   CHAIN         1    124       Colorectal cancer-associated protein 1.
FT                                /FTId=PRO_0000340692.




   Could you please help me to find if there are any mistakes i have made?

Best Regards,
Dechang



More information about the Biopython mailing list