[Biopython-dev] pull request: Handle MMCIF with multiple models (closes 2943)

João Rodrigues anaryin at gmail.com
Tue Apr 24 13:59:10 EDT 2012


Hi Lenna,

IMO, chains should be accessed by A, B, C I'd say, doesn't make sense
numerically.

Congrats on the GSOC application and on the good work so far!

Cheers,

João [...] Rodrigues
http://nmr.chem.uu.nl/~joao



No dia 24 de Abril de 2012 19:56, Lenna Peterson <arklenna at gmail.com>escreveu:

> On Tue, Apr 24, 2012 at 11:38 AM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> >
> > On Tue, Apr 24, 2012 at 12:25 AM, Lenna Peterson <arklenna at gmail.com>
> wrote:
> > > On Mon, Apr 23, 2012 at 4:10 PM, Eric Talevich <
> eric.talevich at gmail.com> wrote:
> > >>
> > >> Ack, I didn't look at that closely enough. Check out this patch to see
> > >> the current situation:
> > >>
> https://github.com/biopython/biopython/commit/abdab1a1132ec811f9636f8ba805bbb6cda6dbe9
> > >>
> > >> The models associated with a structure are numbered with a sequential
> > >> integer id, starting from 0. It's always been like that in our PDB
> > >> parser and we haven't changed it. To ensure that model numbers
> > >> specified in the PDB file are preserved when writing the PDB back to
> > >> file, the above patch introduced a new attribute on the Model object
> > >> called serial_num (also an integer, equal to model.id unless
> specified
> > >> otherwise). That attribute is only used when writing a new PDB file;
> > >> Model.__getitem__ still uses Model.id as before.
> > >>
> > >> Perhaps that's surprising now that we read the serial numbers, but it
> > >> kept backward compatibility. Plus, it preserves list-like behavior
> > >> (item access via integers), even though the models are actually stored
> > >> in a dict.
> > >>
> > >> So!
> > >>
> > >> In the mmCIF parser, the calls to structure_builder.init_model should
> > >> be given two arguments instead of one: an integer id counting from 0,
> > >> and then another integer (probably) containing the model "serial
> > >> number" specified in the mmCIF file. In the event that an mmCIF file
> > >> doesn't specify the model number, the serial number should be the same
> > >> as the sequential id.
> > >>
> > >> Cool? This will also help us convert between PDB and mmCIF formats in
> > >> the future.
> > >>
> > >> As for accessing the models by their serial number, using string keys
> > >> seems like an effective workaround, but still obviously a workaround
> > >> rather than an ideal situation. Let's discuss that a little more,
> > >> perhaps file another bug when we've reached some consensus.
> > >>
> > >> Best,
> > >> Eric
> > >
> > >
> > > Hi Eric,
> > >
> > > I believe I've implemented the model_id/serial_id system found in PDB:
> > >
> > >
> https://github.com/lennax/biopython/commit/b453a2968d18e157aac1f99f9f3cfeb4c09bc77d
> > >
> > > Please let me know if you think that looks right. I couldn't find an
> > > mmCIF file without a model column to test, but I believe in that case
> > > it will assign model_id and serial_id to 0. Would that be the correct
> > > behavior?
> > >
> > > I also modified the unit test to check the model serial_num.
> > >
> https://github.com/lennax/biopython/commit/b0443e788438b8ff72979c7a3bc0e531d4cd5cf6
> > >
> > > Currently serial_num is int() of the CIF model column. Regarding
> > > access by string serial_num, I am concerned that the int/string access
> > > would be too subtle (structure[0] == structure['1']; structure[1] ==
> > > structure['2']?). Perhaps an accessor function? i.e.
> > > structure.get_model('1')
> > >
> > > Let me know if you think I should write get_model() or something along
> > > those lines.
> > >
> > > Lenna
> >
> > I left another nitpick on b453a, but besides that it looks exactly right
> to me.
> >
> > The string/int distinction would indeed be weird, especially for newer
> > Python users coming from Perl or Javascript. I don't see a direct
> > analogue for get_model(serial_num) in the other Entities (Residue,
> > Chain, Model, Structure), so I'm inclined to put off the decision for
> > now (i.e. leave it out of this patch set).
> >
> > -Eric
>
>
> Eric,
>
> Okay, I've changed the bad model num generic warning to a
> PDBConstructionException.
>
> New pull request to get MMCIF to the same state as PDB:
> https://github.com/biopython/biopython/pull/36
>
> So are chains accessed by 0, 1, 2 or by A, B, C?
>
> Lenna
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list