[BioRuby] bio.pdb doubt
Alex Gutteridge
alexg at ruggedtextile.com
Thu Feb 21 11:08:08 UTC 2008
On 21 Feb 2008, at 10:27, K. Shameer wrote:
> Alex,
>
>> I shouldn't have posted code without testing first!
>
> :)
>
>>
>> The problem is that the PDB parser reads the solvent (water)
>> molecules
>> into a separate chain. So in this case we have the protein chain and
>> the water 'chain'. My naive multichain? method then reports you have
>> two chains.
>
> Is this something unusual ? In a structural bioinformatics scenario
> solvent/water belongs to the HETATM definition. I am not able
> understand
> the logic behind the consideration of a ATOM records as well as HETATM
> records as part of chain.
It's a cludge, no doubt about that. There may well be a better
solution but it's also not trivial. A couple of problems come up in
practice:
1. How do you know what the solvent is? In 99% cases it's HOH but not
always. Sometimes you have all sorts of other weird molecules floating
around. If you siphon off all HOH molecules into a separate 'solvent'
data structure you'll loose information for some structures.
2. The HETATM/ATOM distinction is tricky as well. Some HETATM records
(including the solvent in some PDB files) are given distinct chain ids
and in some cases do represent linear chain like molecules. Bound DNA
for instance: ATOM? HETATM? Chain? Not a chain? There is no consistent
representation of these things in (legacy) PDB files so any choice you
make will be a compromise.
That said, if you want to have a poke through the PDB parser and make
some changes then be my guest. It's been a while since I did any PDB
stuff (and god-willing it will be a while until I do some more!) so
it's an area that could probably do with a fresh pair of eyes.
More information about the BioRuby
mailing list