[Biopython] Bio.PDB local MMCIF files

Peter Cock p.j.a.cock at googlemail.com
Wed Feb 19 14:51:59 UTC 2014


On Wed, Feb 19, 2014 at 2:39 PM, João Rodrigues <anaryin at gmail.com> wrote:
> Hello,
>
> The implementation I was referring to by the EBI people is here. I tested it
> during a workshop and it is very fast and robust (they use it, that should
> be enough reason) so maybe we could benefit a lot from either its
> incorporation or adaptation?
>
> As for what I suggested. Since my GSOC period, already 4 years ago.., I
> noticed that the PDB module is a bit messy in terms of organization. The
> module itself if named after the databank, which can be confused with the
> format name, the mmcif parser is defined inside in a subfolder and there are
> application wrappers there too (DSSP, NACCESS). Besides this issue, which is
> not an issue at all and just my own pet peeve, there is a lot that the
> entire module could gain from a thorough revision. I've been using it very
> often and some normal manipulations of structures are not straightforward to
> carry out (calculating a center of mass for example, removing double
> occupancies) due to the parser being slow and quite memory hungry. In fact,
> trying to run the parser on a very large collection of structures often
> results in a random crash due to memory issues.
>
> I've been toying with a lot of changes, performance improvements, etc, but
> I'm not satisfied at all with them.. somethings that i've been trying is to
> have the structure coordinates defined as a full numpy array instead of N
> arrays per structure (one per atom) or the usage of __slots__ to mitigate
> memory usage (managed to get it down 33% this way). This would also go in
> line with a suggestion from Eric a long time ago to make a Bio.Struct module
> which would be the perfect "playground" to implement and test these changes.
> Other developments that I think are worth looking into are for example
> making a nice library to link a parsed structure to the PDB database and
> fetch information on it using the REST services they provide.
>
> I'd like to hear your opinion (as in, everybody, developers and users) on
> this and if it makes sense to indeed give a bit of TLC to the Bio.PDB
> module. Also, on what changes you think should be carried out to improve the
> module, like which features are missing, which applications are worth
> wrapping.
>
> Just to kick off some discussion. Maybe a new thread should be opened for
> this later on.
>
> Cheers,
>
> João

+1 on a new thread, and Bio.Struct (or better lower case, Bio.struct
or Bio.structure or something to be a bit more PEP8 like?).

Peter




More information about the Biopython mailing list