[Biopython-dev] creating Protein(structure) object

Eric Talevich eric.talevich at gmail.com
Mon Jun 14 20:27:24 UTC 2010

Hi guys,

Another convention with the Decorator pattern is to ensure that all of the
method arguments that existed in the original class are also present in the
decorated one. This includes the constructor. Decoration simply adds another
feature to whatever was already there.

João Rodrigues <anaryin at gmail.com> wrote:

> Hello Kristian,
> The way I'm doing it as a workaround is:
> class Protein(Structure):
>    def __init__(self, protein):
>         Structure.__init__(self, protein.id)
>        self.full_id = protein.full_id
>        self.child_list = protein.child_list
>        self.child_dict = protein.child_dict
>        self.parent = protein.parent
>        self.xtra = protein.xtra

The way the constructors of Structure and other Entity subclasses work is to
create a new object with the appropriate, empty attributes -- i.e. no
children. Other code then attaches children to the class.

To decorate a Structure with Protein-specific functionality, I would

1. The Entity constructor takes an ID, and creates empty containers for
child Entities. (Models, in this case.) So Protein.__init__ needs to start

class Protein(Structure):
    def __init__(self, id):  # take any keyword arguments?
        Structure.__init__(self, id)
        # handle any keyword arguments here

2. We need to be able to convert an existing Structure to a new Protein.
That's new functionality, so it needs either a keyword argument in __init__,
or a separate method or function. If we add a keyword argument to __init__,
then the implementation is basically two completely different operations
depending on if a Structure was passed or not. Plus, there's still that 'id'
argument to deal with.

3. Instantiating a Protein directly would mean importing the
Bio.Struct.Protein module manually, in addition to "from Bio import Struct".
More to the point, Bio.Struct.Protein consists of lower-level functionality
that a casual Struct user shouldn't have to dig into, as long as
Structure.as_protein() exists. So there's no value in making
Protein.__init__ "do what I mean" at the expense of clarity in the code.
Better to make the code very obvious and explicit here, and focus on API
prettiness from a different angle.

4. The next most convenient place for Structure-to-Protein conversion is on
the Structure class. This presents a nice API that will be sufficient for
most users:

from Bio import Struct
prot = Struct.read('1ABC.pdb').as_protein()

But, going back to OOP principles, the Structure class shouldn't need to
know anything about the Protein class's internals -- though it's free to
call any public method and make things nicer for the user. So, finally, we
need a class method* on Protein that Structure.as_protein() can call.

Hence, Protein.from_structure().

[*] A class method can be called without first instantiating the class.
Since we're trying to construct a new object here, we need to be able to
call this Protein method before the Protein object exists. No worries, just
use the @classmethod decorator.

> It works because every method I'm using deepcopies this anyway..

If someone modifies the original Structure object after you've created a
Protein this way -- e.g. renumbering residues, or with their own function --
it will also modify the Protein object, since lists and dicts are shared. Is
this what you want?

If you're concerned about memory usage, you can also look at implementing

> The way of adding the childs seems the correct way to go but it won't copy
> headers... should we want this?

You code for copying the Structure's children looks right to me, except I
think it's best to be little paranoid with Python lists and make deep copies
anyway. I suppose you could also copy any header info that's relevant to
proteins, using the same approach.


More information about the Biopython-dev mailing list