[Biopython] Biopython & p3d

Peter biopython at maubp.freeserve.co.uk
Wed Oct 21 09:18:17 UTC 2009


On Wed, Oct 21, 2009 at 8:25 AM, Christian Fufezan
<fufezan at uni-muenster.de> wrote:
> Hello Biopython,
>
> we ( Michael Specht & I ) published recently p3d, a python module for
> structural bioinformatics and were wondering if it wouldn't be a good good
> thing if could join the Biopython project. We understand that Biopython has
> already a PDB parser but we programmed an alternative version since we found
> the Biopython.pdb syntax to be too non-pythonian. One example why is shown
> below:
>
> Biopython:
>
> def test6(structure):
>        '''get protein surrounding (5) of NAG'''
>        bucket = set()
>        atom_list=Selection.unfold_entities(structure,'A')
>        ns = NeighborSearch(atom_list)
>        for model in structure.get_list():
>                for chain in model.get_list():
>                        for residue in chain.get_list():

I'm not very familiar with the NeighborSearch code, but
I'm pretty sure the above for loops can be just:

for model in structure:
    for chain in model:
        for residue in chain:
            ...

And regarding detecting oxygen atoms, I think there is
a patch on bugzilla to record the (relatively) new atom
column from the PDB file (which will help with Hg and
mercury versus hydrogen).

Still, I would agree with you that some parts of Bio.PDB
are not very pythonic - too many functions names get_*()
which could be replaced with properties. This is something
we could evolve gradually (add new properties, keep the
old methods in place but gradually deprecate them).

Specific suggestions would be welcome.

> def test6(pdb):
>        ''' protein surrounding (5) of resname NAG'''
>        bgl = pdb.query('resname NAG')
>        bucket = pdb.query('protein and oxygen and within 5 of ',bgl)
>        print '     found',len(bucket),' oxygens around NAG'
>        return
>
> Certainly, Biopythons PDB module has its advantages and the is no way p3d
> could replace it, but both modules have their advantages :) The fact that
> biopythons.pdb parser uses a KTree written in C and we wrote one in python
> makes certain queries to the protein structure faster in Biopyhton; however
> if the query involves more complex demands, multiple loops are inevitable in
> biopython, whereas p3d offers a human readable query function that combines
> all aspects. The link to our publication is:
> http://www.biomedcentral.com/1471-2105/10/258

I remember skim reading it a month ago or so. I remember the final line of
the abstract was a very strong opinion ("a perfect tool"), and I was rather
surprised the reviewers and editor let you keep it - regardless of any bias
I might feel to Biopython ;)

> Looking forward to hear from you, maybe one can also envision a
> combined module with a new all advantages together.

That would be a good outcome.

>From the snippet of code and the examples in the paper, the big feature
you have that Bio.PDB lacks is "fancy selections", and that is certainly
something which could be improved in Biopython.

It is interesting you have implemented (invented?) a string based language
with logical and, within etc. In some ways it reminds me of the selection
formulae in VMD - have you used that 3D visualisation tool?

This also reminds me of the SQL language for database selections, and
how classical SQL code with Python just used SQL statements within
Python strings. Have you ever used SQLAlchemy, and looked at how
they handle SQL statements like filters, ands, ors, etc with a clever
object based interface? Perhaps something like that could work for
a 3D structure query API.

Regards,

Peter




More information about the Biopython mailing list