[Biopython] Biopython & p3d

Mon Nov 9 11:21:21 UTC 2009

On Mon, Nov 9, 2009 at 10:57 AM, Christian Fufezan
<fufezan at uni-muenster.de> wrote:
>
> back ! :)
>
> lets get back into the discussion (or sum it up)
>
> The consensus was
> a) both packages (biopython.pdb and p3d) have advantages
> b) possibly merge both modules while keeping the best of both of them could
> be an interesting step forward.

Hi Christian - thanks for getting back to us. That seems like a fair
summary. For those that missed it, the thread is archived here:
http://lists.open-bio.org/pipermail/biopython/2009-October/005721.html

> On 22 Oct 2009, at 00:14, Peter wrote:
>
>> On Wed, Oct 21, 2009 at 7:22 PM, Christian Fufezan wrote:
>>>>
>>>> Biopython might be improved by defining an atom
>>>> property (list or iterator?) instead of the get_atoms() method.
>>>
>>> agree.  I would argue that p3d's atom/vector class seems the way to go.
>>
>> We can probably have similar things for chains etc. Any other
>> views on this? I never liked the get_* and set_* methods in
>> Bio.PDB myself, and using Python properties seem more
>> natural here (they may not have existing when Bio.PDB was
>> first started - I'd have to check).
>>
>> [We should probably break out specific suggestions like this
>> into new mailing list threads, and CC people like Thomas H.]

I must do that... without looking into the details, it seems like a
relatively straightforward addition which should make Bio.PDB
easier to use.

>> The drill down is great for selecting a particular residue or
>> chain (or for NMR, a particular model). It is also good for
>> looping over these structures - e.g. to process psi/phi
>> angles along a protein backbone.
>
> cannot really see an advantage here. If one can directly access all the
> atoms one's interested in with one line and then just collect phi,psi
> angles, why would one need to drill down through the structures?
>
> Looping over structure elements is even more refined with the natural
> human language interface:
> imagine: residues_of_interest = protein.query('alpha and residue
> 12..51 and model 2')
>
> if you like looking you can also do for model in models:
> protein.query('alpha and residue 12..51 and model',model)
>
> or
>
> for residue in range (12,51):
>  protein.query('alpha and residue' , residue , 'and model 2')
>
> but looping over each residue and then do a conditional check if the residue
> is in range (12-51) and if atom type is alpha carbon seems for me a bit of
> an overhead. In fact that's one of the point I like about p3d most. one can
> define the query in a way that nested loops are rarely need. Imagine you
> want to collect chi1 angles of all His...

In psuedo code, I would picture something like this:

[residue.chi1 for residue in model.residues if residue.name="His"]

(That almost certainly won't work as is with Bio.PDB, I'm just tying
to convey how I would expect to be able to tackle the problem with
a list comprehension)

> from the following (I chopped some bits ... ), I can read that biopythons
> pdb module (with numpy) works similar to p3d - or to be more correct
> p3d works like biopython in combination with numpy, in the sense that one
> can use atoms as vectors.

That seems like a fair summary. In p3d, the atoms are (also) vector
like objects, while in Biopython, the atoms have a numpy coord
property. As long as you are happy with numpy, this allows fast
and efficient vector operations.

>>> so writing an structural alignment script is straight forward
>>> (see e.g. http://p3d.fufezan.net/index.php?title=alignByATP).
>>
>> Structural alignment is not so different in Biopython - just the details.
>> e.g.
>> http://www.warwick.ac.uk/go/peter_cock/python/protein_superposition/
>>
> very nice - like the Bio.PDB.Superimposer(). It does all the vector
> operations needed to align structures, nice. Involvement of numpy certainly
> makes it powerful.

Indeed - numpy is *very* powerful.

> The nested loops to find all alpha carbons is a biopython.pdb classic ;)

I would probably write that with a list comprehension nowadays,
but they are essentially just syntactic sugar for (nested) loops.

> to round thinks up:
> p3ds strength comes with the natural human user interface that allows the
> combination of sets and the spatial information (less nested loops).
> However, I am not sure if the biopython's community wants such an extension.
> Biopython.pdb has a long history, it works like it is and users are
> comfortable with it, so maybe there is not much to merge after all.

That seems fair, although that doesn't mean there aren't things we
can improve in Bio.PDB (moving from get/set methods to properties
for example).

My personal view (and I did not write Bio.PDB and have only made
relatively light usage of it) is that working with the nested structures
(of the flattened lists) it provides is fairly natural with Python lists, or
list comprehensions. The p3d "natural language" interface is an
interesting abstraction, and may be easier for some, but to me is
just another layer on top of the raw functionality - and another
query syntax to learn. That said, it probably would be possible to
layer something like this on top of the existing Bio.PDB objects
(but I personally have no interest in doing this, and no need for it -
keeping on top of the sequence side of things in Biopython is
enough to keep me busy!).

I would be delighted if other people on the people on the mailing list
who *do* work with PDB files could comment. e.g. Thomas and
Kristian, cc'd.

Peter