[Biopython] Bio.PDB local MMCIF files

Fields, Christopher J cjfields at illinois.edu
Thu Feb 20 14:16:16 UTC 2014


On Feb 19, 2014, at 8:55 AM, "Peter Cock" <p.j.a.cock at googlemail.com> wrote:
> 
>> On Wed, Feb 19, 2014 at 2:39 PM, João Rodrigues <anaryin at gmail.com> wrote:
>> Hello,
>> 
>> The implementation I was referring to by the EBI people is here. I tested it
>> during a workshop and it is very fast and robust (they use it, that should
>> be enough reason) so maybe we could benefit a lot from either its
>> incorporation or adaptation?
>> 
>> As for what I suggested. Since my GSOC period, already 4 years ago.., I
>> noticed that the PDB module is a bit messy in terms of organization. The
>> module itself if named after the databank, which can be confused with the
>> format name, the mmcif parser is defined inside in a subfolder and there are
>> application wrappers there too (DSSP, NACCESS). Besides this issue, which is
>> not an issue at all and just my own pet peeve, there is a lot that the
>> entire module could gain from a thorough revision. I've been using it very
>> often and some normal manipulations of structures are not straightforward to
>> carry out (calculating a center of mass for example, removing double
>> occupancies) due to the parser being slow and quite memory hungry. In fact,
>> trying to run the parser on a very large collection of structures often
>> results in a random crash due to memory issues.
>> 
>> I've been toying with a lot of changes, performance improvements, etc, but
>> I'm not satisfied at all with them.. somethings that i've been trying is to
>> have the structure coordinates defined as a full numpy array instead of N
>> arrays per structure (one per atom) or the usage of __slots__ to mitigate
>> memory usage (managed to get it down 33% this way). This would also go in
>> line with a suggestion from Eric a long time ago to make a Bio.Struct module
>> which would be the perfect "playground" to implement and test these changes.
>> Other developments that I think are worth looking into are for example
>> making a nice library to link a parsed structure to the PDB database and
>> fetch information on it using the REST services they provide.
>> 
>> I'd like to hear your opinion (as in, everybody, developers and users) on
>> this and if it makes sense to indeed give a bit of TLC to the Bio.PDB
>> module. Also, on what changes you think should be carried out to improve the
>> module, like which features are missing, which applications are worth
>> wrapping.
>> 
>> Just to kick off some discussion. Maybe a new thread should be opened for
>> this later on.
>> 
>> Cheers,
>> 
>> João
> 
> +1 on a new thread, and Bio.Struct (or better lower case, Bio.struct
> or Bio.structure or something to be a bit more PEP8 like?).
> 
> Peter

The similarly designed (but terribly maintained) BioPerl code is Bio::Structure.  It think it was designed years back to be agnostic to a specific database but of course based much of its design on PDB data.

Chris



More information about the Biopython mailing list