[Biopython] Local Uniprot

Peter Cock p.j.a.cock at googlemail.com
Wed May 13 19:10:49 UTC 2015


Hi David,

I think you are looking for this page: http://www.uniprot.org/downloads

Reviewed (Swiss-Prot) offers the XML, simple FASTA, and the legacy
plain text "swiss" format:

* ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz
* ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
* ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz

Or similarly for Unreviewed (TrEMBL).

Once unzipped, I would suggest trying these with the
Bio.SeqIO.index_db(...) function for efficient random access.

If you are tight on space, try recompressing with bgzip as BGZF
files (blocked zgip) which also works with SeqIO.index_db - see
http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html

Peter

On Wed, May 13, 2015 at 4:49 PM, Sauer, David <David.Sauer at med.nyu.edu> wrote:
> Hi all,
> I have a script where I query the UniProt website for particular protein
> entries, following BioPython and UniProt’s own python access notes. However,
> as a way to be polite, I wait a few seconds between queries, but this makes
> my script fairly slow. I would like to keep the database locally, but the
> only downloads I can find for the UniProtKB are as two huge xml files for
> Swiss-Prot and TrEMBL. I am unclear how to parse these compared to the
> individual protein xml files on the website, which are easily parsable by
> BioPython.
>
> Does anyone have guidance on parsing and running the UniProtKB locally?
>
> Thanks in advance!
>
> David Sauer
>
> Da-Neng Wang Lab
> Structural Biology Program
> New York University School of Medicine
>
> Publications via Google Scholar
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list