[BioPython] Bio.Entez - Help

Thu Mar 5 10:42:02 UTC 2009

On Thu, Mar 5, 2009 at 4:04 AM, Rodrigo faccioli
<rodrigo_faccioli at uol.com.br> wrote:
> I want to know where I can find examples about Bio.Entez. Specifically, I'm
> developing a program which has a protein primary sequence and I need to
> search its conserved domain and read it to show for user.
>
> I'm reading this link
> http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc64 . However,
> I'm not understanding very well. I know that I will work with CDD database.

The CDD database is one of several protein motif databases the NCBI
make available for use with their tool RPS-BLAST.  CDD is a composite
database which includes domains from PFAM, SMART, KOG etc.

Have a look at  http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
with your example and you'll get a hit to pfam00321.

It sound like what you want is a script which runs RPS-BLAST using
your query protein against the CDD motif database.

You can run BLASTN, BLASTP etc online at the NCBI using a script, but
as far as I know, the NCBI do not make RPS-BLAST (or PSI-BLAST)
available in this way.  I haven't checked this in recent months.

However, I have done task myself using standalone BLAST installed on
my computer, i.e. the tool rpsblast from the NCBI.  You'll also need
to install the databases (which are big - you'll need plenty of disk
space and RAM).  Once this is installed and working, you can rpsblast
this from Biopython using the Bio.Blast.NCBIStandalone.rpsblast(...)
function.

> I made a simple example which is below.
>
> from Bio import Entrez
> Entrez.email = "rodrigo.faccioli at gmail.com" # Always tell NCBI who you are
> handle = Entrez.esearch(db="cdd",
> term="TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN")
> record = Entrez.read(handle)
> print record["IdList"]
>
> Thanks for any helps.

I think if you use Entrez to access the CDD database, you can just
access the domains themselves (using their names - not searching by
sequence), e.g.

>>> from Bio import Entrez
>>> Entrez.email = "Your.Name.Here at example.com"
>>> handle = Entrez.esearch(db="cdd", term="pfam00321", retmode="XML")
>>> record = Entrez.read(handle)
>>> print record["IdList"]
['109381']

You can check this ID works via their website:
http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=109381

I've tried a few variations but efetch doesn't seem to support the CDD
database (yet).

Peter