[Biopython] Biopython & p3d

Mon Nov 9 10:57:13 UTC 2009

back ! :)

lets get back into the discussion (or sum it up)

The consensus was
a) both packages (biopython.pdb and p3d) have advantages
b) possibly merge both modules while keeping the best of both of them  
could be an interesting step forward.

On 22 Oct 2009, at 00:14, Peter wrote:

> On Wed, Oct 21, 2009 at 7:22 PM, Christian Fufezan wrote:
>>> Biopython might be improved by defining an atom
>>> property (list or iterator?) instead of the get_atoms() method.
>>
>> agree.  I would argue that p3d's atom/vector class seems the way to  
>> go.
>
> We can probably have similar things for chains etc. Any other
> views on this? I never liked the get_* and set_* methods in
> Bio.PDB myself, and using Python properties seem more
> natural here (they may not have existing when Bio.PDB was
> first started - I'd have to check).
>
> [We should probably break out specific suggestions like this
> into new mailing list threads, and CC people like Thomas H.]
>
>>> One might also ask for x, y and z properties on the atom object
>>> to provide direct access to the three coordinates as floats. Do
>>> you think this sort of little thing would help improve Bio.PDB?
>>>
>> yes indeed, that is _the_ information a pdb module should offer
>> without any addition. Better would be even if the atoms are
>> treatable as vectors (see below). p3d has a series of atom
>> object attributes that are convenient.
>
> I would argue that the x-y-z triple (which Biopython has) is
> more important that separate x, y, and z floats. We seem
> to agree here.
>
What I meant is that I think the most important thing a pdb module  
should offer is the possibility to do vector operations directly with  
atom objects, i.e. before translating them. Whether the values are  
stored in three attributes (.x,.y,.z, p3d) or as a tuple (biopython),  
seems not really important as long simple vector operations are  
possible.

> The Biopython atom's coord property is an x-y-z triple (as a
> one dimensional numpy array). The Bio.PDB code also
> defines its own vector objects on top of this, but my memory
> of the details is hazy here. As I recall, I personally stuck
> with the numpy objects in my scripts using Bio.PDB.
>
The version I used, one had to convert the entity into a vector. But  
that's already some time ago, I guess.

>>> Yes, it should be possible to offer nice nested access and nice flat
>>> access from the same objects. Internally the current Biopython PDB
>>> structure could perhaps be handled as filtered views of a complete
>>> list of all the atoms (using sets and trees or a database or  
>>> whatever).
>>> That might make some things faster too.
>>
>> I agree to some extent. As above, I can only say that I
>> cannot see the advantage of a nested data structure.
>> Maybe you can explain with an example where drilling
>> through the nested structure could come in handy.
>
> The drill down is great for selecting a particular residue or
> chain (or for NMR, a particular model). It is also good for
> looping over these structures - e.g. to process psi/phi
> angles along a protein backbone.

cannot really see an advantage here. If one can directly access all  
the atoms one's interested in with one line and then just collect  
phi,psi angles, why would one need to drill down through the structures?

Looping over structure elements is even more refined with the natural  
human language interface:
imagine: residues_of_interest = protein.query('alpha and residue  
12..51 and model 2')

if you like looking you can also do
for model in models:
protein.query('alpha and residue 12..51 and model',model)

or

for residue in range (12,51):
  protein.query('alpha and residue' , residue , 'and model 2')

but looping over each residue and then do a conditional check if the  
residue is in range (12-51) and if atom type is alpha carbon seems for  
me a bit of an overhead. In fact that's one of the point I like about  
p3d most. one can define the query in a way that nested loops are  
rarely need. Imagine you want to collect chi1 angles of all His...
>
>>>> Yes that was one thing that we were really missing. Also the fact  
>>>> that
>>>> biopython requires the unfolded entity to be converted to vectors  
>>>> and so
>>>> forth was a bit complex and we needed fast and direct access to the
>>>> coordinates, the very essence of pdb files.
>>>
>>> I'm not quite sure what you mean here by "vectors". Could you
>>> be a little more specific? Do you want NumPy style objects or
>>> something else?
>>
>> In p3d the atom objects are vectors,
>
> I don't immediately see what the intention is here. What does
> "adding" or "subtracting" two atom/vector objects give you? A
> new non-atom vector would be my guess? What about
> multiplying by a scaler? Again, getting a non-atom vector
> object back makes most sense.
>
Yes, right one gets a vector back. This vector can then be used in
the query function. Imagine you want to survey residues that span
a membrane along a given path.
With p3d you can easily generate a series of vectors and more  
importantly,
one can use these vectors in the query function.

for c in [k/10.0 * (startVector-endVector) for k in range(1,10)]:
    pdb.query('protein and within 3 of ' c)

to visualize the path in e.g. VMD one can also print those vectors in  
a pdb format.

from the following (I chopped some bits ... ), I can read that  
biopythons pdb module
(with numpy) works similar to p3d - or to be more correct
p3d works like biopython in combination with numpy, in the sense that  
one can use atoms
as vectors.

>> so writing an structural alignment script is straight forward
>> (see e.g. http://p3d.fufezan.net/index.php?title=alignByATP).
>
> Structural alignment is not so different in Biopython - just the  
> details. e.g.
> http://www.warwick.ac.uk/go/peter_cock/python/protein_superposition/
>
very nice - like the Bio.PDB.Superimposer(). It does all the vector  
operations needed to align structures, nice. Involvement of numpy  
certainly makes it powerful.
The nested loops to find all alpha carbons is a biopython.pdb classic ;)

to round thinks up:
p3ds strength comes with the natural human user interface that allows  
the combination of sets and the spatial information (less nested  
loops). However, I am not sure if the biopython's community wants such  
an extension. Biopython.pdb has a long history, it works like it is  
and users are comfortable with it, so maybe there is not much to merge  
after all.