[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?

Peter Cock p.j.a.cock at googlemail.com
Mon Dec 5 13:50:44 UTC 2011


On Mon, Dec 5, 2011 at 1:38 PM, Hongbo Zhu 朱宏博 <macrozhu at gmail.com> wrote:
>
>> Perhaps I misunderstood - I would not want to allow the syntax
>> mychain[(' ', 2, ' '): (' ', 40, ' ')] which is unclear, rather only allow
>> the user to use mychain[2:41] which requires Python counting.
>
> But even in mychain[2:41], the 2 and 41 should be residue sequence number.
> Then it is consistent with the current acceptable syntax mychain[2], where 2
> also refers to a sequence number. At the moment, BioPython also
> accepts mychain[(' ', 2, ' ')]. So I think mychain[(' ', 2, ' '): (' ', 40,
> ' ')] would be just a nature extension of mychain[(' ', 2, ' ')].
>
> According to the source code, mychain[2] is considered an abbreviation of
> mychain[(' ', 2, ' ')]. Internally, mychain[2] will be translated to
> mychain[(' ', 2, ' ')] by function Bio.PDB.Chain.__translate_id(). So if
> mychain[2:4] would be allowed, internally it would also
> be first translated to mychain[(' ', 2, ' '): (' ', 40, ' ')]. So in my
> point of view, mychain[2:4] is just an abbreviation for mychain[(' ', 2, '
> '): (' ', 40, ' ')], just like mychain[2] is a short version of mychain[('
> ',2,' ')].
>
> hongbo

I've never really liked these strange tuple IDs, which are usually
but not always full of empty values. I understand some of
the corner cases they handle, but they are very complicated.

You cannot assume 2 will map to (' ', 2, ' ') in general - this
is what the _translate_id method handles. Consider the case
where you have sliced the Chain as discussed, since the
first two elements have been removed, that mapping will shift.

We definitely would need a test case covering non-trivial
ID tuples (e.g. using insertion codes), and tests slicing a
previously sliced Chain.

Peter




More information about the Biopython-dev mailing list