[Biopython] Mining the PDB header

Ahmad Abdelzaher underoath006 at gmail.com
Thu May 11 20:38:57 UTC 2017


I occasionally see it in the pdb. For example "Classification:
HYDROLASE / HYDROLASE INHIBITOR" at
http://www.rcsb.org/pdb/explore.do?structureId=3PDC



On Thu, May 11, 2017 at 10:34 PM, Lenna Peterson
<lenna.peterson at gmail.com> wrote:
> I am not aware of any formal information about either function or
> localization within PDB files.
>
> On Thu, May 11, 2017 at 3:55 PM, Ahmad Abdelzaher <underoath006 at gmail.com>
> wrote:
>>
>> Sorry for the misunderstanding, but I'm mining the structures
>> themselves. To be more precise, I'm doing structure prediction
>> analysis. I'm not doing text mining of the headers themselves. I'm not
>> sure which classification is used in the PDB, how consistent it
>> is,...etc. I would be interested in any information regarding the
>> function, or localization,..etc. If you think I'm not quite sure what
>> I want, you'd be correct. Ultimately I will be clustering the
>> structures based on that classification.
>>
>> On Thu, May 11, 2017 at 8:28 PM, Lenna Peterson
>> <lenna.peterson at gmail.com> wrote:
>> > Well, your original question did ask how to mine the PDB header with
>> > Python.
>> >
>> > "protein classification" is not a specific term (do you mean organism?
>> > function? fold? etc.) - is this something that appears in the PDB
>> > header? If
>> > so, what PDB header field is it in?
>> >
>> > Lenna
>> >
>> >
>> > On Thu, May 11, 2017 at 1:49 PM, Ahmad Abdelzaher
>> > <underoath006 at gmail.com>
>> > wrote:
>> >>
>> >> I'm not trying to mine the actual header. I would definitely be
>> >> interested in an option that retrieves the protein classification
>> >> without having to write any additional code. Does such option exist?
>> >>
>> >> Regards.
>> >>
>> >> On Thu, May 11, 2017 at 7:06 AM, João Rodrigues
>> >> <j.p.g.l.m.rodrigues at gmail.com> wrote:
>> >> > You can do *some* mining. Look at parse_pdb_header.
>> >> >
>> >> > 2017-05-10 18:58 GMT-07:00 Ahmad Abdelzaher <underoath006 at gmail.com>:
>> >> >>
>> >> >> Hey guys,
>> >> >>
>> >> >> Unfortunately  I read this in the FAQ page:
>> >> >>
>> >> >> " If you are interested in data mining the PDB header, you might
>> >> >> want
>> >> >> to look elsewhere because there is only limited support for this."
>> >> >>
>> >> >> So if I can't do it with biopython, what other alternatives do I
>> >> >> have?
>> >> >> I'm doing some PDB mining and I'm interested to retrieve the
>> >> >> classification of the structure, to do some clustering analysis
>> >> >> later.
>> >> >>
>> >> >> Cheers.
>> >> >> _______________________________________________
>> >> >> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> >> >> http://mailman.open-bio.org/mailman/listinfo/biopython
>> >> >
>> >> >
>> >>
>> >> _______________________________________________
>> >> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> >> http://mailman.open-bio.org/mailman/listinfo/biopython
>> >
>> >
>
>



More information about the Biopython mailing list