[Biopython] Fwd: problems searching swiss prot

Peter Cock p.j.a.cock at googlemail.com
Mon Sep 13 20:40:30 UTC 2010


Forwarding a query from Jessica Grant since she appears
to have had trouble posting to the mailing list.

Jessica wrote:

> Hello,
>
> I am running a few scripts to try to extract sequence information
> out of uniprot.  One program called AutoFACT gives me ID numbers
> associated with that database.  Most of these look like this:
>
> D2V5S4_NAEGR
> Q48KU2_PSE14
> Q22B72_TETTH
>
>
> and my downstream scripts, which are written in biopython, are
> fine with this.  Then, every once in a while, a sequence will come
> back with a name that looks like this:
>
> UPI00006CC162
>
> and everything goes bad.  My script can't handle these names,
> apparently, although if I go to uniprot.org and search for it, the
> sequence comes up.
>
> My script uses the following, where RepID is the number
> extracted from AutoFACT:
>
>        handle = ExPASy.get_sprot_raw(RepID, cgi=None)
>        seq_record = SeqIO.read(handle, "swiss")
>
> Any thoughts?
>
> Thank you,
>
> Jessica

Hi Jessica,

I think the problem is that these unusual identifiers are
not UniProt/SwissProt accession identifiers. The URL
this Biopython function uses was originally from
www.expasy.ch but is now on www.uniprot.org as
described here:

http://www.expasy.ch/expasy_urls.html

I think the ID UPI00006CC162 is a UniProt ID of some
kind, so it may be possible to access the information
you want somehow. See for example:

http://www.uniprot.org/uniparc/UPI00006CC162

However, it is not clear to me right away if you can get
this record back as a plain text "swiss" format entry...

Peter




More information about the Biopython mailing list