[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?

Hongbo Zhu 朱宏博 macrozhu at gmail.com
Mon Dec 5 11:46:59 UTC 2011


Hi, Peter,

I just realized a special issue concerning slicing Bio.PDB.Chain.
Normally, in python a slice is given by three arguments: start, stop and
step, where the element at position *stop* is not included in the output.
For example,

mylist[2:40:1]  would return: [ mylist[2],mylist[3], ...., mylist[39] ]

But in CATH and SCOP, sequence segments composing domains are given as
start and end position. And the residue at the end position is also
included in the domain definition. e.g. if a domain is defined to be from
residue (' ', 1, ' ') to residue (' ', 40, ' '), a slicing like this
mychain[(' ', 2, ' '): (' ', 40, ' ')] or mychain[2:40] would not include
residue (' ',40,' '). And it is not definite that mychain[(' ', 2, ' '): ('
', 41, ' ')] would give the correct outcome because the residue after ('
',40,' ') does not necessary have to be (' ',41,' '). Of course we can
change the code in the __getitem__() such that it includes the end
position. But then it is against the general python convention of slicing.

So I think maybe an independent function is perhaps needed:

class Chain(Entity):

    def get_slice(self, start, end, step=None):
        """Return a slice of the chain from start to end (including end
position)

        Arguments:
        o start - (string, int, string) or int
        o end - (string, int, string) or int
        o step - None or int
        """
        res_id_list = [r.id for r in self.get_iterator()]
        start_index = res_id_list.index(self._translate_id(start))
        stop_index  = res_id_list.index(self._translate_id(stop))

        return self.get_list()[start_index:stop_index:step]

And for the overload of operator __add__(), is it for the concatenation of
chain segments? I think it is very important (if I chop the sequence into
pieces, I should also be able to glue them together back, right? ) But this
implies the function get_slice should return Chain instance, not just a
list of Residue instances, right?

--Hongbo


>
> Hi Hongbo,
>
> I agree defining integer based slicing of Chain objects sounds like a good
> idea.
>
> Could you write a couple of unit tests for the new slicing please (in
> file Tests/test_PDB.py)? You can just give code snippets, a patch, or
> create a branch on github it you would prefer.
>
> Does it make sense to consider __add__ for the Chain as well?
>
> Peter
>



-- 
Hongbo



More information about the Biopython-dev mailing list