[Biopython-dev] slicing in Bio.PDB.Chain.__getitem__() ?
Peter Cock
p.j.a.cock at googlemail.com
Mon Dec 5 15:53:36 UTC 2011
On Mon, Dec 5, 2011 at 2:48 PM, Hongbo Zhu 朱宏博 <macrozhu at gmail.com> wrote:
>
> On Mon, Dec 5, 2011 at 2:50 PM, Peter Cock wrote:
>>
>> I've never really liked these strange tuple IDs, which are usually
>> but not always full of empty values. I understand some of
>> the corner cases they handle, but they are very complicated.
>
>
> This seems to be the problem of PDB.
Yes.
> I don't know how other packages handle the issue.
> In addition, I once proposed to remove the HETERO-flag in the residue ID.
> http://biopython.org/pipermail/biopython-dev/2011-January/008640.html
> It is only retained for the backwards compatibility with PDB files before
> remediation in 2007. Removing only HETERO-flag does not solve
> the problem totally, but to some extent (say, around 50%).
Breaking the API without making the ID much easier to use is a bad idea.
> PDB entry 1h4w is a good example with icode and the sequence of chain A
> starts with resnum 16.
That shows the problem nicely,
>>> from Bio import PDB
>>> structure = PDB.PDBParser().get_structure("1h4w", "1h4w.pdb")
>>> chain = structure[0]['A']
>>> len(chain)
351
>>> chain[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "Bio/PDB/Chain.py", line 67, in __getitem__
return Entity.__getitem__(self, id)
File "Bio/PDB/Entity.py", line 38, in __getitem__
return self.child_dict[id]
KeyError: (' ', 0, ' ')
However, you can access the first residue like this:
>>> chain[16]
<Residue ILE het= resseq=16 icode= >
Likewise,
>>> for index, residue in enumerate(chain):
... print index, residue
... assert chain[index] == residue
...
0 <Residue ILE het= resseq=16 icode= >
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "Bio/PDB/Chain.py", line 67, in __getitem__
return Entity.__getitem__(self, id)
File "Bio/PDB/Entity.py", line 38, in __getitem__
return self.child_dict[id]
KeyError: (' ', 0, ' ')
So as you say, the current implementation does map
an integer index to the middle field of the ID tuple,
rather than the position in the list as I had assumed.
Sadly this means it is incompatible with Pythonic
slicing, so we can't extend __getitem__ to offer that.
Peter
More information about the Biopython-dev
mailing list