[Biopython-dev] pull request: Handle MMCIF with multiple models (closes 2943)

Eric Talevich eric.talevich at gmail.com
Mon Apr 23 16:10:27 EDT 2012


On Mon, Apr 23, 2012 at 1:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Sun, Apr 22, 2012 at 7:48 AM, Lenna Peterson <arklenna at gmail.com> wrote:
>> I've implemented the parser changes (written by Paul Bathen; see bug
>> report) to allow the MMCIF parser to handle multiple models.
>>
>> Models are now accessed by a string key of their model number, rather
>> than an arbitrary index (structure['1'] versus structure[0]).
>>
>> I updated the MMCIF unit test for the new model access method and
>> added a test file with multiple models.
>>
>> I'm not sure if there is documentation to be updated re: accessing the models.
>>
>> issue: https://redmine.open-bio.org/issues/2943
>> pull request: https://github.com/biopython/biopython/pull/34
>
> I've applied that to the trunk, thank you, but on reading this, why are the
> model keys strings and not integers? Does MMCIF allow odd keys or
> something?
>

Ack, I didn't look at that closely enough. Check out this patch to see
the current situation:
https://github.com/biopython/biopython/commit/abdab1a1132ec811f9636f8ba805bbb6cda6dbe9

The models associated with a structure are numbered with a sequential
integer id, starting from 0. It's always been like that in our PDB
parser and we haven't changed it. To ensure that model numbers
specified in the PDB file are preserved when writing the PDB back to
file, the above patch introduced a new attribute on the Model object
called serial_num (also an integer, equal to model.id unless specified
otherwise). That attribute is only used when writing a new PDB file;
Model.__getitem__ still uses Model.id as before.

Perhaps that's surprising now that we read the serial numbers, but it
kept backward compatibility. Plus, it preserves list-like behavior
(item access via integers), even though the models are actually stored
in a dict.

So!

In the mmCIF parser, the calls to structure_builder.init_model should
be given two arguments instead of one: an integer id counting from 0,
and then another integer (probably) containing the model "serial
number" specified in the mmCIF file. In the event that an mmCIF file
doesn't specify the model number, the serial number should be the same
as the sequential id.

Cool? This will also help us convert between PDB and mmCIF formats in
the future.

As for accessing the models by their serial number, using string keys
seems like an effective workaround, but still obviously a workaround
rather than an ideal situation. Let's discuss that a little more,
perhaps file another bug when we've reached some consensus.

Best,
Eric


More information about the Biopython-dev mailing list