[Biopython] Biopython & p3d

Wed Oct 21 11:01:35 UTC 2009

On Wed, Oct 21, 2009 at 11:31 AM, Christian Fufezan
<fufezan at uni-muenster.de> wrote:
>
> A data structure that is build like that of Biopython.pdb imposes
> multiple nested loops and condition queries.

Not really - see below.

> p3ds data structure is not nested and gains strength through combination
> of sets and BSPTree
> This allows faster and more flexible looping. Looping over all alpha and
> beta-carbons for example and printing x-coordinates
>
> p3d:
> for atom in pdb.query('protein and atom type CB or atom type CA'):
>        print atom.x

The Bio.PDB structure, model or chain object do offer direct access
to a "flat" list of atoms via the get_atoms() method. e.g.

from Bio import PDB
structure = Bio.PDB.PDBParser().get_structure("Test", "XXXX.pdb")
for atom in structure.get_atoms() :
	if atom.name in ["CA", "CB"] : print atom.coord

(I'd have to think a bit longer about how in general to restrict this to
proteins, here that is implicit since CA and CB are protein specific)

You can also of course use a list comprehension, e.g. to get all
the x-coordinates (which I guess is what your example does),

from Bio import PDB
structure = Bio.PDB.PDBParser().get_structure("Test", "XXXX.pdb")
x_list = [atom.coord[0] for atom in structure.get_atoms() \
             if atom.name in ["CA", "CB"]]

You can also drill down through the nested structure of models,
chains and residues to get to the atoms that way.

To me these are more Pythonic than the clever natural language
parsing in p3d (which seems ideal for a user interface, rather than
a programming API). Biopython might be improved by defining an
atoms property (list or iterator?) instead of the get_atoms() method.

One might also ask for x, y and z properties on the atom object
to provide direct access to the three coordinates as floats. Do
you think this sort of little thing would help improve Bio.PDB?

> Still I think both methods could exists side by side. If it is efficient - I
> don't know. Replacing biopythons.pdb parser was never the intention
> and I think it has features that are really good and fast!

Yes, it should be possible to offer nice nested access and nice flat
access from the same objects. Internally the current Biopython PDB
structure could perhaps be handled as filtered views of a complete
list of all the atoms (using sets and trees or a database or whatever).
That might make some things faster too.

> Yes that was one thing that we were really missing. Also the fact that
> biopython requires the unfolded entity to be converted to vectors and so
> forth was a bit complex and we needed fast and direct access to the
> coordinates, the very essence of pdb files.

I'm not quite sure what you mean here by "vectors". Could you
be a little more specific? Do you want NumPy style objects or
something else?

Peter