[Biopython-dev] Future of Bio.PDB

João Rodrigues anaryin at gmail.com
Wed Feb 19 14:54:13 UTC 2014

>From another thread:

As for what I suggested. Since my GSOC period, already 4 years ago.., I
> noticed that the PDB module is a bit messy in terms of organization. The
> module itself if named after the databank, which can be confused with the
> format name, the mmcif parser is defined inside in a subfolder and there
> are application wrappers there too (DSSP, NACCESS). Besides this issue,
> which is not an issue at all and just my own pet peeve, there is a lot that
> the entire module could gain from a thorough revision. I've been using it
> very often and some normal manipulations of structures are not
> straightforward to carry out (calculating a center of mass for example,
> removing double occupancies) due to the parser being slow and quite memory
> hungry. In fact, trying to run the parser on a very large collection of
> structures often results in a random crash due to memory issues.
> I've been toying with a lot of changes, performance improvements, etc, but
> I'm not satisfied at all with them.. somethings that i've been trying is to
> have the structure coordinates defined as a full numpy array instead of N
> arrays per structure (one per atom) or the usage of __slots__ to mitigate
> memory usage (managed to get it down 33% this way). This would also go in
> line with a suggestion from Eric a long time ago to make a Bio.Struct
> module which would be the perfect "playground" to implement and test these
> changes. Other developments that I think are worth looking into are for
> example making a nice library to link a parsed structure to the PDB
> database and fetch information on it using the REST services they provide.
> I'd like to hear your opinion (as in, everybody, developers and users) on
> this and if it makes sense to indeed give a bit of TLC to the Bio.PDB
> module. Also, on what changes you think should be carried out to improve
> the module, like which features are missing, which applications are worth
> wrapping.
> Just to kick off some discussion. Maybe a new thread should be opened for
> this later on.
> Cheers,
> João

As for the name of the module, yes, Bio.Struct is just the "legacy" name I
remember.. Bio.structure would probably be better and more clear.

More information about the Biopython-dev mailing list