[Biopython] Fwd: problems searching swiss prot
Peter Cock
p.j.a.cock at googlemail.com
Mon Sep 13 20:40:30 UTC 2010
Forwarding a query from Jessica Grant since she appears
to have had trouble posting to the mailing list.
Jessica wrote:
> Hello,
>
> I am running a few scripts to try to extract sequence information
> out of uniprot. One program called AutoFACT gives me ID numbers
> associated with that database. Most of these look like this:
>
> D2V5S4_NAEGR
> Q48KU2_PSE14
> Q22B72_TETTH
>
>
> and my downstream scripts, which are written in biopython, are
> fine with this. Then, every once in a while, a sequence will come
> back with a name that looks like this:
>
> UPI00006CC162
>
> and everything goes bad. My script can't handle these names,
> apparently, although if I go to uniprot.org and search for it, the
> sequence comes up.
>
> My script uses the following, where RepID is the number
> extracted from AutoFACT:
>
> handle = ExPASy.get_sprot_raw(RepID, cgi=None)
> seq_record = SeqIO.read(handle, "swiss")
>
> Any thoughts?
>
> Thank you,
>
> Jessica
Hi Jessica,
I think the problem is that these unusual identifiers are
not UniProt/SwissProt accession identifiers. The URL
this Biopython function uses was originally from
www.expasy.ch but is now on www.uniprot.org as
described here:
http://www.expasy.ch/expasy_urls.html
I think the ID UPI00006CC162 is a UniProt ID of some
kind, so it may be possible to access the information
you want somehow. See for example:
http://www.uniprot.org/uniparc/UPI00006CC162
However, it is not clear to me right away if you can get
this record back as a plain text "swiss" format entry...
Peter
More information about the Biopython
mailing list