[Biopython] Searching a local copy of the PDB

Fri Jul 24 14:15:09 UTC 2009

Greetings,

You could also do this using the PDB Advanced Search option.  Although not a scriptable solution, it's perfect for a few manual queries.  Here are my suggested parameters:

Match **all** of the following conditions

Subquery 1: Keyword: Advanced, Keywords: **carbonic andydrade** (did you mean anhydrase?), Search Scope: **Full Text**
Subquery 2: Sequence Features: Number of Chains, Between: **2** and **<blank>**

**<checkbox>** Remove Similar Sequences at **30%** Identity

Query comes back with 12 structures and 25 unreleased structures for "carbonic anhydrase."  No results for "andydrade."

Regards,
Steve Darnell

-----Original Message-----
From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter
Sent: Friday, July 24, 2009 4:39 AM
To: chen Ku
Cc: biopython at lists.open-bio.org
Subject: [Biopython] Searching a local copy of the PDB

Hi Chen,

When replying to a digest email, it is a good idea to change the subject line to something specific.

On Fri, Jul 24, 2009 at 3:28 AM, chen Ku<biopython.chen at gmail.com> wrote:
> Hi
>         I got successed in downloading all the pdb file by biopython module.

Good.

> But now I want to fectch an output file where my  keyword word is 
>('carbonic andydrade')
> second criteria is >=2 chains
> third criteria is homology =30%
>
> Can you please write me few lines of codes to do it as I have some 
> problem in doing this.Please suggest me step by step if possible as I 
> am struggling for few days in this .

If I understand you correctly, you have download all the PDB files to your computer (as plain text PDB format data). And now you want to search them?

Are you using Unix or Windows? There are several Unix command line tools like grep, which are very good at searching plain text files. That might be a good way to look for PDB files containing the words 'carbonic andydrade'.

I'm not sure what the fastest way to count the chains in a PDB file would be. If you only find a few hundred PDB files with 'carbonic andydrade', it might be OK just to parse them with Bio.PDB and count the chains that way.

Finally, your third criteria is homology =30% - but homology to what?
And how are you measuring homology? I guess you mean 30% sequence identity to a reference carbonic andydrade protein?

If what you want to do is take a known carbonic andydrade protein, and search the PDB for similar sequences then there are better ways to do this. I would run BLASTP against the PDB sequences.
You can do this at the NCBI via their webpages, or from within Biopython using the Bio.Blast.NCBIWWW.qblast function.

Peter

_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython