[Biopython-dev] slicing in Bio.PDB.Chain.getitem() ?

Mon Dec 5 15:53:36 UTC 2011

On Mon, Dec 5, 2011 at 2:48 PM, Hongbo Zhu 朱宏博 <macrozhu at gmail.com> wrote:
>
> On Mon, Dec 5, 2011 at 2:50 PM, Peter Cock wrote:
>>
>> I've never really liked these strange tuple IDs, which are usually
>> but not always full of empty values. I understand some of
>> the corner cases they handle, but they are very complicated.
>
>
> This seems to be the problem of PDB.

Yes.

> I don't know how other packages handle the issue.
> In addition, I once proposed to remove the HETERO-flag in the residue ID.
> http://biopython.org/pipermail/biopython-dev/2011-January/008640.html
> It is only retained for the backwards compatibility with PDB files before
> remediation in 2007. Removing only HETERO-flag does not solve
> the problem totally, but to some extent (say, around 50%).

Breaking the API without making the ID much easier to use is a bad idea.

> PDB entry 1h4w is a good example with icode and the sequence of chain A
> starts with resnum 16.

That shows the problem nicely,

>>> from Bio import PDB
>>> structure = PDB.PDBParser().get_structure("1h4w", "1h4w.pdb")
>>> chain = structure[0]['A']
>>> len(chain)
351
>>> chain[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "Bio/PDB/Chain.py", line 67, in __getitem__
    return Entity.__getitem__(self, id)
  File "Bio/PDB/Entity.py", line 38, in __getitem__
    return self.child_dict[id]
KeyError: (' ', 0, ' ')

However, you can access the first residue like this:

>>> chain[16]
<Residue ILE het=  resseq=16 icode= >

Likewise,

>>> for index, residue in enumerate(chain):
...     print index, residue
...     assert chain[index] == residue
...
0 <Residue ILE het=  resseq=16 icode= >
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "Bio/PDB/Chain.py", line 67, in __getitem__
    return Entity.__getitem__(self, id)
  File "Bio/PDB/Entity.py", line 38, in __getitem__
    return self.child_dict[id]
KeyError: (' ', 0, ' ')

So as you say, the current implementation does map
an integer index to the middle field of the ID tuple,
rather than the position in the list as I had assumed.
Sadly this means it is incompatible with Pythonic
slicing, so we can't extend __getitem__ to offer that.

Peter

[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?

[Biopython-dev] slicing in Bio.PDB.Chain.getitem() ?