[Biopython] Searching a local copy of the PDB

Fri Jul 24 09:38:43 UTC 2009

Hi Chen,

When replying to a digest email, it is a good idea to change the subject
line to something specific.

On Fri, Jul 24, 2009 at 3:28 AM, chen Ku<biopython.chen at gmail.com> wrote:
> Hi
>         I got successed in downloading all the pdb file by biopython module.

Good.

> But now I want to fectch an output file where my
> keyword word is ('carbonic andydrade')
> second criteria is >=2 chains
> third criteria is homology =30%
>
> Can you please write me few lines of codes to do it as I have some problem
> in doing this.Please suggest me step by step if possible as I am struggling
> for few days in this .

If I understand you correctly, you have download all the PDB files to your
computer (as plain text PDB format data). And now you want to search them?

Are you using Unix or Windows? There are several Unix command line
tools like grep, which are very good at searching plain text files. That
might be a good way to look for PDB files containing the words 'carbonic
andydrade'.

I'm not sure what the fastest way to count the chains in a PDB file would
be. If you only find a few hundred PDB files with 'carbonic andydrade',
it might be OK just to parse them with Bio.PDB and count the chains
that way.

Finally, your third criteria is homology =30% - but homology to what?
And how are you measuring homology? I guess you mean 30%
sequence identity to a reference carbonic andydrade protein?

If what you want to do is take a known carbonic andydrade protein,
and search the PDB for similar sequences then there are better
ways to do this. I would run BLASTP against the PDB sequences.
You can do this at the NCBI via their webpages, or from within
Biopython using the Bio.Blast.NCBIWWW.qblast function.

Peter