[Biopython] Dealing with Non-RefSeq IDs / InParanoid

Matthew Strand stran104 at chapman.edu
Wed Jul 1 03:01:14 UTC 2009


For the benefit of future users who find this thread through a search, I
would like to share how to retreive a sequence from NCBI given a non-NCBI
protein ID (or other ID). This was question 3 in my original message.

Suppose you have a non-NCBI protein ID, say CE23997 (from WormBase) and you
want to retrieve the sequence from NCBI.

You can use Bio.Entrez.esearch(db='protein', term='CE23997') to get a list
of NCBI GIs that refrence this identifer. In this case there is only one
(17554770).

Then you can get the sequence using Entrez.efetch(db="protein",
id='17554770', rettype="fasta").

This may be obvious to some, but it was not to me; primarially because I was
unaware of the esearch functionality.

-- 
Matthew Strand


More information about the Biopython mailing list