[Biopython] Local Uniprot
Peter Cock
p.j.a.cock at googlemail.com
Wed May 13 19:10:49 UTC 2015
Hi David,
I think you are looking for this page: http://www.uniprot.org/downloads
Reviewed (Swiss-Prot) offers the XML, simple FASTA, and the legacy
plain text "swiss" format:
* ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz
* ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
* ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
Or similarly for Unreviewed (TrEMBL).
Once unzipped, I would suggest trying these with the
Bio.SeqIO.index_db(...) function for efficient random access.
If you are tight on space, try recompressing with bgzip as BGZF
files (blocked zgip) which also works with SeqIO.index_db - see
http://blastedbio.blogspot.co.uk/2011/11/bgzf-blocked-bigger-better-gzip.html
Peter
On Wed, May 13, 2015 at 4:49 PM, Sauer, David <David.Sauer at med.nyu.edu> wrote:
> Hi all,
> I have a script where I query the UniProt website for particular protein
> entries, following BioPython and UniProt’s own python access notes. However,
> as a way to be polite, I wait a few seconds between queries, but this makes
> my script fairly slow. I would like to keep the database locally, but the
> only downloads I can find for the UniProtKB are as two huge xml files for
> Swiss-Prot and TrEMBL. I am unclear how to parse these compared to the
> individual protein xml files on the website, which are easily parsable by
> BioPython.
>
> Does anyone have guidance on parsing and running the UniProtKB locally?
>
> Thanks in advance!
>
> David Sauer
>
> Da-Neng Wang Lab
> Structural Biology Program
> New York University School of Medicine
>
> Publications via Google Scholar
>
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list