[BioRuby] Bug in writing PDB ATOM
Alex Gutteridge
alexg at kuicr.kyoto-u.ac.jp
Fri Feb 9 01:15:46 UTC 2007
On 9 Feb 2007, at 07:54, Yen-Ju Chen wrote:
> In bio/db/pdb/pdb.rb line 1019,
> the ATOM entry is written as:
>
> sprintf("%-6s%5d %-4s%-1s%3s %-1s%4d%-1s
>
> It results an ATOM entry as:
> ATOM 61 OD1 ASN A 8 102.025 27.929 144.984 1.00
> 88.56 O
>
> But the right ATOM entry should be
> ATOM 61 OD1 ASN A 8 102.025 27.929 144.984 1.00
> 88.56 O
>
> Note there are 2 spaces after '61' and one space before 'ASN'
> I change this line to:
>
> sprintf("%-6s%5d %-3s%-1s%3s %-1s%4d%-1s
>
> and it works fine now.
> But I am new to Ruby and not familiar with the format yet.
>
> Yen-Ju
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
Hi Yen-Ju,
Thanks for your bug report. In fact (as far as I can tell) the PDB
format (http://www.wwpdb.org/documentation/format23/sect9.html) is
ambiguous in this case. Columns 13-16 are specified for the 'Atom
name' ('OD1' in the case you mention), but the justification of the
field is not specified. Note that the field requires four columns so
your fix (which reduces it to three) may break if you encounter an
atom name with 4 characters.
However, you are quite correct that the convention in most PDB files
is that when less than 4 characters are used for the atom name, the
field is aligned as you show. In summary, any of the following is a
valid name according to my reading of the specifications, but the
convention in many files is to use the form shown in the third and
fourth examples rather than the first and second. Note that the fifth
example is also a valid atom name and may break your fix:
OD1
N
OD1
N
OD12
I will change the code to use the conventional form where possible,
but be careful with your fix because it may break on some (rare) PDB
files.
An important general point: PDB files (particularly older ones) are
*very* messy. Efforts have been made within the PDB and at the EBI
MSD to clean these files up, but there are still issues. This means
that it is very hard to write a parser that can read in any PDB file
and then output it in exactly the same format (including spacing
etc...). The BioRuby parser should be able to parse any valid PDB
file and output the data back out as a valid PDB format string, but
the input and output are *not* guaranteed to be identical.
I have not had time to actively maintain the PDB parsing in BioRuby,
so if you are interested in Ruby and PDB files feel free to submit
more bug reports and patches.
Thanks again.
Alex Gutteridge
Bioinformatics Center
Kyoto University
More information about the BioRuby
mailing list