[Biopython] The problem of using Bio.SwissProt

Peter Cock p.j.a.cock at googlemail.com
Tue Sep 10 16:26:11 UTC 2019


Hello Dechang,

It is entirely possible that the file format has changed a little
since the last major work on Bio.SwissProt.KeyWList back in 2008:

https://github.com/biopython/biopython/blob/master/Bio/SwissProt/KeyWList.py

I suggest you open an issue on our Github repository with a specific
example (UniProt URL), showing the mismatch in fields. If you want to
work on a pull request to cope with the changes, even better :)

If you are familiar with SwissProt / UniProt and can find a relevant
announcement about these changes, that would also be very helpful.

Thank you,

Peter

On Tue, Sep 10, 2019 at 4:25 PM De-Chang Yang
<yangdc at mail.cbi.pku.edu.cn> wrote:
>
> Dear Biopython team,
>     Hi, this is Dechang Yang.
>     I want to search some information from swissProt databases by using BioPython. Then i find the tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc139.
>     But to my surprise, i find the  KeyWList module of Bio.SwissProt seems to be out of date......
>     When i type: help(KeyWList)
>     I get the class infomation:
>
>      |      ---------  ---------------------------     ----------------------
>      |      Line code  Content                         Occurrence in an entry
>      |      ---------  ---------------------------     ----------------------
>      |      ID         Identifier (keyword)            Once; starts a keyword entry
>      |      IC         Identifier (category)           Once; starts a category entry
>      |      AC         Accession (KW-xxxx)             Once
>      |      DE         Definition                      Once or more
>      |      SY         Synonyms                        Optional; once or more
>      |      GO         Gene ontology (GO) mapping      Optional; once or more
>      |      HI         Hierarchy                       Optional; once or more
>      |      WW         Relevant WWW site               Optional; once or more
>      |      CA         Category                        Once per keyword entry; absent
>      |                                                 in category entries
>
>    You can see the Line Code include some KEYS, but i have to say those KEYS are inconsistent with the lastest swissProt KeyWList file. Which are like the content below:(DR CC RX and most of the lines will be ignored by the KeyWList module)
>
>
> RP   TISSUE SPECIFICITY, AND SUBCELLULAR LOCATION.
> RX   PubMed=24154973; DOI=10.1002/ijc.28557;
> RA   Peltekova V.D., Lemire M., Qazi A.M., Zaidi S.H., Trinh Q.M.,
> RA   Bielecki R., Rogers M., Hodgson L., Wang M., D'Souza D.J., Zandi S.,
> RA   Chong T., Kwan J.Y., Kozak K., De Borja R., Timms L., Rangrej J.,
> RA   Volar M., Chan-Seng-Yue M., Beck T., Ash C., Lee S., Wang J.,
> RA   Boutros P.C., Stein L.D., Dick J.E., Gryfe R., McPherson J.D.,
> RA   Zanke B.W., Pollett A., Gallinger S., Hudson T.J.;
> RT   "Identification of genes expressed by immune cells of the colon that
> RT   are regulated by colorectal cancer-associated variants.";
> RL   Int. J. Cancer 134:2330-2341(2014).
> CC   -!- SUBCELLULAR LOCATION: Membrane {ECO:0000269|PubMed:24154973};
> CC       Single-pass membrane protein {ECO:0000269|PubMed:24154973}.
> CC       Note=Co-localizes with crystalloid granules of eosinophils and
> CC       granular organelles of mast cells, neutrophils, macrophages and
> CC       dendritic cells.
> CC   -!- TISSUE SPECIFICITY: Expressed in gastrointestinal and immune
> CC       tissue, as well as prostate, testis and ovary. Expressed in lamina
> CC       propria and eosinophils but not in epithelial cells. Expression is
> CC       greater in benign adjacent tissues than in colon tumors.
> CC       {ECO:0000269|PubMed:24154973}.
> CC   -----------------------------------------------------------------------
> CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
> CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
> CC   -----------------------------------------------------------------------
> DR   EMBL; AK127703; -; NOT_ANNOTATED_CDS; mRNA.
> DR   EMBL; AP002448; -; NOT_ANNOTATED_CDS; Genomic_DNA.
> DR   RefSeq; NP_001289573.1; NM_001302644.1.
> DR   RefSeq; NP_001289574.1; NM_001302645.1.
> DR   RefSeq; NP_001289575.1; NM_001302646.1.
> DR   RefSeq; NP_001289576.1; NM_001302647.1.
> DR   RefSeq; NP_001289577.1; NM_001302648.1.
> DR   RefSeq; NP_997312.1; NM_207429.3.
> DR   BioMuta; HGNC:33789; -.
> DR   DMDM; 74711342; -.
> DR   PaxDb; Q6ZS62; -.
> DR   PRIDE; Q6ZS62; -.
> DR   ProteomicsDB; 68193; -.
> DR   GeneID; 399948; -.
> DR   KEGG; hsa:399948; -.
> DR   CTD; 399948; -.
> DR   DisGeNET; 399948; -.
> DR   GeneCards; COLCA1; -.
> DR   HGNC; HGNC:33789; COLCA1.
> DR   MIM; 615693; gene.
> DR   neXtProt; NX_Q6ZS62; -.
> DR   PharmGKB; PA164716768; -.
> DR   eggNOG; ENOG410JDIH; Eukaryota.
> DR   eggNOG; ENOG4111630; LUCA.
> DR   HOGENOM; HOG000111748; -.
> DR   InParanoid; Q6ZS62; -.
> DR   OrthoDB; 1566774at2759; -.
> DR   PhylomeDB; Q6ZS62; -.
> DR   TreeFam; TF354066; -.
> DR   ChiTaRS; COLCA1; human.
> DR   GenomeRNAi; 399948; -.
> DR   PRO; PR:Q6ZS62; -.
> DR   Proteomes; UP000005640; Unplaced.
> DR   GO; GO:0016021; C:integral component of membrane; IEA:UniProtKB-KW.
> DR   GO; GO:0016020; C:membrane; IDA:UniProtKB.
> PE   2: Evidence at transcript level;
> KW   Complete proteome; Membrane; Reference proteome; Transmembrane;
> KW   Transmembrane helix.
> FT   CHAIN         1    124       Colorectal cancer-associated protein 1.
> FT                                /FTId=PRO_0000340692.
>
>
>
>
>    Could you please help me to find if there are any mistakes i have made?
>
> Best Regards,
> Dechang
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list