[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?

Hongbo Zhu 朱宏博 macrozhu at gmail.com
Fri Dec 2 09:41:30 UTC 2011


Hi,

I propose to add slicing to class Bio.PDB.Chain by changing function
Bio.PDB.Chain.__getitem__().

* Why is slicing necessary for Bio.PDB.Chain?
Protein domain definitions are usually presented as the starting and ending
positions of the domain in protein primary structures, e.g. in SCOP, or
CATH. Slicing comes in handy when extracting domains from PDB files.

* Why is slicing not available at the moment?
I understand that the majority of Bio.PDB.Entity objects are not lists. And
there is not internal *sequential order* for the child entities in these
objects. For example, In Bio.PDB.Model, its child Chain entities do not
really have a sequential order within Model. Slicing seems not make sense.
But Bio.PDB.Chain is exceptional: Residue entities in Bio.PDB.Chain have a
sequence order as presented in the primary structure and slicing becomes a
reasonable operation.

* How to slice a Chain entity?
I think it can be realized by revising the
function Bio.PDB.Chain.__getitem__(). For example:

    def __getitem__(self, id):
        """Return the residue with given id.

        The id of a residue is (hetero flag, sequence identifier, insertion
code).
        If id is an int, it is translated to (" ", id, " ") by the
_translate_id
        method.

        Arguments:
        o id - (string, int, string) or int
        """
        if isinstance(id, slice):
            res_id_list = [r.id for r in self.get_iterator()]
            if id.start is not None:
                start_index =
res_id_list.index(self._translate_id(id.start))
            else:
                start_index = 0
            stop_index = res_id_list.index(self._translate_id(id.stop))
            return self.get_list()[start_index:stop_index:id.step]
        else:
            id=self._translate_id(id)
            return Entity.__getitem__(self, id)



More information about the Biopython-dev mailing list