[BioPython] PDB parser

Mon, 06 May 2002 15:48:51 +0200

"Thomas Hamelryck" wrote:
> Hi Catherine,
> 
> I was wondering what you want to use the PDB/Structure object for, i.e.
> looking at structural information, extraction information from the header
> files, etc.? I think it is difficult to make one Structure object that does
> everything that people expect. I wrote down some considerations below.
> Everybody feel free to criticize & comment of course.
> 
> In the most frequent case, you want a convenient access to the data via the
> Structure/Model/Chain/Residue/Atom (SMCRA, in short) hierarchy. Let's say
> you want to use a single superfamily representative from the SCOP
> (Structural Classification Of Proteins). In that case, you will need to
> extract a number of domains from a set of structures. Each domain is
> specified in the SCOP definition by its structure, chain(s) and residues. So
> it would be convenient to have a class (let's say the Structure class) that
> allows a flexible use of the SMCRA hierarchy, i.e., to do slicing, traversal
> etc. Basically, this representation would do the bookkeeping. This class
> could also contain the information in the parsed header (cell, spacegroup,
> etc.).

I think such a class would be very useful, indeed. It's a "PDB structure" class,
isn't it? where you just have the information extracted from the PDB.
There is already a Bio.PDB module, why not a PDB class for this
purpose (equivalent of the ClustalAlignment class)?

> 
> A second class (let's say a Connectivity class) could contain a simple graph
> of atoms with all the bonds between those atoms (so including inter and
> intra residue bonds). This would be convenient for people who want to use
> the python package to prepare the input for some kind of refinement or
> visualization program. Maybe this representation could also do nearest
> neighbor lookup, angle calculations, rotations & translations etc. Note that
> this thus implies figuring out which atoms are bonded to which atoms, which
> is not specified in the PDB file itself. Implementing the previous approach
> structure is trivial, while the implementation of this last approach is much
> harder.

Yes, other informations that are interpretations , computations for visualization, etc.. could be in another set of classes I guess? Same as Clustalw vs
Align classes (Alignment, SummaryInfo, PSSM, identity_match, ...)?

> Of course, in many cases you would like to take a look at what is in the pdb
> file, e.g., you could want to examine all disulphide bridges. In that case,
> you would want to work with a number of class instances (let's say from the
> Polymer class) that represent various structural entities (polypeptides,
> disulphide bridges, alpha-helices etc.). For this, you need the connectivity
> information of course. This representation would be the structural
> interpretation of the PDB file.
> 
> It is clear that more than one approach should be possible for a structure
> object. One way e.g. to combine the requirements is to attach the Polymer
> objects as observers to the Structure objects. In this approach, you could
> extract e.g. all Polymer objects from a Chain object. Each of these Polymer
> objects would contain at least one residue from that chain. In this way, you
> could e.g. ask questions like "give me all disulphide bridges that involve
> chain A in model 1", which combines the bookkeeping with the structural
> interpretation demands. The structure class would also produce the raw
> connectivity information in a convenient data structure on demand. I think
> this can all be done quite efficiently using Numpy, kjbuckets etc.
> 
> I'm still working on a Structure object, which is mainly reworking older
> code that I'm not happy with. I have a lot of code lying around that I would
> like to put in a shape for general use. I hope to make a Structure class
> that does the bookkeeping available next week.
> 
> Friendly regards,
> 
> ---
> Thomas Hamelryck      Vrije Universiteit Brussel (VUB)
> Intitute for Molecular Biology            ULTR Department
> Paardenstraat 65    1640 Sint-Gensius-Rode, Belgium
>                  http://ultr.vub.ac.be/~thomas
> 
> 
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython

--
Catherine Letondal -- Pasteur Institute Computing Center