[Biopython] Searching a local copy of the PDB

Fri Jul 24 15:19:27 UTC 2009

Just for the record, a few years back I ran some Biopython-based code
to check structural statistics of a local copy of the entire PDB.  I
was parsing to the level of each alpha-carbon, but it was still fast
enough to be a very viable way to run the calculations.  Clearly in
this case it's not the best solution to use Bio.PDB, but if you have a
local mirror then there's no reason you couldn't do it via
structure-parsing.

Also, the PDB Advanced search should be scriptable, just not in a
convenient way.  The Python module ClientForm should handle it.

Jonathan

On Fri, Jul 24, 2009 at 8:15 AM, Steve Darnell<darnells at dnastar.com> wrote:
> Greetings,
>
> You could also do this using the PDB Advanced Search option.  Although not a scriptable solution, it's perfect for a few manual queries.  Here are my suggested parameters:
>
> Match **all** of the following conditions
>
> Subquery 1: Keyword: Advanced, Keywords: **carbonic andydrade** (did you mean anhydrase?), Search Scope: **Full Text**
> Subquery 2: Sequence Features: Number of Chains, Between: **2** and **<blank>**
>
> **<checkbox>** Remove Similar Sequences at **30%** Identity
>
> Query comes back with 12 structures and 25 unreleased structures for "carbonic anhydrase."  No results for "andydrade."
>
> Regards,
> Steve Darnell
>
>
> -----Original Message-----
> From: biopython-bounces at lists.open-bio.org [mailto:biopython-bounces at lists.open-bio.org] On Behalf Of Peter
> Sent: Friday, July 24, 2009 4:39 AM
> To: chen Ku
> Cc: biopython at lists.open-bio.org
> Subject: [Biopython] Searching a local copy of the PDB
>
> Hi Chen,
>
> When replying to a digest email, it is a good idea to change the subject line to something specific.
>
> On Fri, Jul 24, 2009 at 3:28 AM, chen Ku<biopython.chen at gmail.com> wrote:
>> Hi
>>         I got successed in downloading all the pdb file by biopython module.
>
> Good.
>
>> But now I want to fectch an output file where my  keyword word is
>>('carbonic andydrade')
>> second criteria is >=2 chains
>> third criteria is homology =30%
>>
>> Can you please write me few lines of codes to do it as I have some
>> problem in doing this.Please suggest me step by step if possible as I
>> am struggling for few days in this .
>
> If I understand you correctly, you have download all the PDB files to your computer (as plain text PDB format data). And now you want to search them?
>
> Are you using Unix or Windows? There are several Unix command line tools like grep, which are very good at searching plain text files. That might be a good way to look for PDB files containing the words 'carbonic andydrade'.
>
> I'm not sure what the fastest way to count the chains in a PDB file would be. If you only find a few hundred PDB files with 'carbonic andydrade', it might be OK just to parse them with Bio.PDB and count the chains that way.
>
> Finally, your third criteria is homology =30% - but homology to what?
> And how are you measuring homology? I guess you mean 30% sequence identity to a reference carbonic andydrade protein?
>
> If what you want to do is take a known carbonic andydrade protein, and search the PDB for similar sequences then there are better ways to do this. I would run BLASTP against the PDB sequences.
> You can do this at the NCBI via their webpages, or from within Biopython using the Bio.Blast.NCBIWWW.qblast function.
>
> Peter
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>