[Biopython] Overhauling of Bio.PDB module

João Rodrigues j.p.g.l.m.rodrigues at gmail.com
Wed Oct 16 17:54:28 UTC 2019


Hi Patrick,

Thank you again for bringing this up. I do agree with you that this is a
necessity.

When Bio.PDB first showed up, there were not so many Python libraries out
there for molecular structures. Now there are a few, so we should think
carefully about what features we want to offer - not to overlap with others
and duplicate efforts. My opinion is that BioPython is very good at
generally handling structures, allowing you do change fields, select bits,
etc, and do very simple calculations like distances or superimpositions. At
the CodeFest in Basel, we talked that it would be awesome to have a
selection language built-in bioptyhon to allow us to do something like
`mol.select("chain A").write("chain_A.pdb")`. This also requires an
overhaul of the data structures we use to store atomic data.

I have some time in the next few months I could spare to work on this.
Interfacing with Biotite would be interesting as well (as well as with
other packages). I'll start a #biopython channel at the 3DSIG slack so that
we can coordinate efforts. How does this sound?

Cheers and again, thanks for bringing this up!

Joao

Patrick Kunzmann <padix.kleber at gmail.com> escreveu no dia quarta,
16/10/2019 à(s) 09:38:

> Hello Biopythoneers,
>
> at the BOSC this year we talked about overhauling the Bio.PDB module.
> The problem is that currently the atom coordinates are stored in a
> separate NumPy array for each atom. This design prevents efficient
> computation of all kinds of analyses (distances, angles,
> superimpositions, etc.). One proposed possible solution to this problem,
> we talked about, was to put the coordinates of the entire structure in
> one NumPy array, and let the Atom, Residue, Chain and Structure objects
> point to positions in this array. The benefit of this approach is that
> functions could be directly applied onto the entire array, harnessing
> the power of vectorization.
>
> For the analysis we could adapt the vectorized functions from the Python
> package Biotite, a project I am currently working on
> (https://www.biotite-python.org/apidoc/biotite.structure.html). Usually,
> these functions already accept the coordinates as NumPy array, so I
> think only a few tweaks would be necessary for every function.
>
> However, we would require one person or a small team who makes the
> effort to implement the new structure types and adapts the analysis
> functions. I could offer a pair of helping hands in the adaption of the
> analysis functions, but I don't have the time for anything more.
>
> So the question is: Is there anyone out there, who is willing to do this
> work? Alternatively, I would propose to write a 'bridge' package between
> Biopython and Biotite, that converts the Biopython structure
> representation into the representation in Biotite and vice versa. I
> think, this solution is less elegant but would also require less effort.
>
> Best regards
>
> Patrick Kunzmann
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20191016/46814adf/attachment.htm>


More information about the Biopython mailing list