[Biopython] Mining the PDB header

Thu May 11 20:57:11 UTC 2017

I believe that corresponds to the "HEAD" line of the PDB file, which you
can get like so:

>>> from Bio import PDB
>>> header_dict = PDB.parse_pdb_header("3pdc.pdb")
>>> header_dict['head']
'hydrolase/hydrolase inhibitor'

On Thu, May 11, 2017 at 4:38 PM, Ahmad Abdelzaher <underoath006 at gmail.com>
wrote:

> I occasionally see it in the pdb. For example "Classification:
> HYDROLASE / HYDROLASE INHIBITOR" at
> http://www.rcsb.org/pdb/explore.do?structureId=3PDC
>
>
>
> On Thu, May 11, 2017 at 10:34 PM, Lenna Peterson
> <lenna.peterson at gmail.com> wrote:
> > I am not aware of any formal information about either function or
> > localization within PDB files.
> >
> > On Thu, May 11, 2017 at 3:55 PM, Ahmad Abdelzaher <
> underoath006 at gmail.com>
> > wrote:
> >>
> >> Sorry for the misunderstanding, but I'm mining the structures
> >> themselves. To be more precise, I'm doing structure prediction
> >> analysis. I'm not doing text mining of the headers themselves. I'm not
> >> sure which classification is used in the PDB, how consistent it
> >> is,...etc. I would be interested in any information regarding the
> >> function, or localization,..etc. If you think I'm not quite sure what
> >> I want, you'd be correct. Ultimately I will be clustering the
> >> structures based on that classification.
> >>
> >> On Thu, May 11, 2017 at 8:28 PM, Lenna Peterson
> >> <lenna.peterson at gmail.com> wrote:
> >> > Well, your original question did ask how to mine the PDB header with
> >> > Python.
> >> >
> >> > "protein classification" is not a specific term (do you mean organism?
> >> > function? fold? etc.) - is this something that appears in the PDB
> >> > header? If
> >> > so, what PDB header field is it in?
> >> >
> >> > Lenna
> >> >
> >> >
> >> > On Thu, May 11, 2017 at 1:49 PM, Ahmad Abdelzaher
> >> > <underoath006 at gmail.com>
> >> > wrote:
> >> >>
> >> >> I'm not trying to mine the actual header. I would definitely be
> >> >> interested in an option that retrieves the protein classification
> >> >> without having to write any additional code. Does such option exist?
> >> >>
> >> >> Regards.
> >> >>
> >> >> On Thu, May 11, 2017 at 7:06 AM, João Rodrigues
> >> >> <j.p.g.l.m.rodrigues at gmail.com> wrote:
> >> >> > You can do *some* mining. Look at parse_pdb_header.
> >> >> >
> >> >> > 2017-05-10 18:58 GMT-07:00 Ahmad Abdelzaher <
> underoath006 at gmail.com>:
> >> >> >>
> >> >> >> Hey guys,
> >> >> >>
> >> >> >> Unfortunately  I read this in the FAQ page:
> >> >> >>
> >> >> >> " If you are interested in data mining the PDB header, you might
> >> >> >> want
> >> >> >> to look elsewhere because there is only limited support for this."
> >> >> >>
> >> >> >> So if I can't do it with biopython, what other alternatives do I
> >> >> >> have?
> >> >> >> I'm doing some PDB mining and I'm interested to retrieve the
> >> >> >> classification of the structure, to do some clustering analysis
> >> >> >> later.
> >> >> >>
> >> >> >> Cheers.
> >> >> >> _______________________________________________
> >> >> >> Biopython mailing list  -  Biopython at mailman.open-bio.org
> >> >> >> http://mailman.open-bio.org/mailman/listinfo/biopython
> >> >> >
> >> >> >
> >> >>
> >> >> _______________________________________________
> >> >> Biopython mailing list  -  Biopython at mailman.open-bio.org
> >> >> http://mailman.open-bio.org/mailman/listinfo/biopython
> >> >
> >> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20170511/1171cbb0/attachment.html>