[BioRuby] Bug in writing PDB ATOM

Fri Feb 9 01:15:46 UTC 2007

On 9 Feb 2007, at 07:54, Yen-Ju Chen wrote:

> In bio/db/pdb/pdb.rb line 1019,
> the ATOM entry is written as:
>
>           sprintf("%-6s%5d %-4s%-1s%3s %-1s%4d%-1s
>
> It results an ATOM entry as:
> ATOM     61 OD1  ASN A   8     102.025  27.929 144.984  1.00  
> 88.56           O
>
> But the right ATOM entry should be
> ATOM     61  OD1 ASN A   8     102.025  27.929 144.984  1.00  
> 88.56           O
>
> Note there are 2 spaces after '61' and one space before 'ASN'
> I change this line to:
>
>           sprintf("%-6s%5d  %-3s%-1s%3s %-1s%4d%-1s
>
> and it works fine now.
> But I am new to Ruby and not familiar with the format yet.
>
> Yen-Ju
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>

Hi Yen-Ju,

Thanks for your bug report. In fact (as far as I can tell) the PDB  
format (http://www.wwpdb.org/documentation/format23/sect9.html) is  
ambiguous in this case. Columns 13-16 are specified for the 'Atom  
name' ('OD1' in the case you mention), but the justification of the  
field is not specified. Note that the field requires four columns so  
your fix (which reduces it to three) may break if you encounter an  
atom name with 4 characters.

However, you are quite correct that the convention in most PDB files  
is that when less than 4 characters are used for the atom name, the  
field is aligned as you show. In summary, any of the following is a  
valid name according to my reading of the specifications, but the  
convention in many files is to use the form shown in the third and  
fourth examples rather than the first and second. Note that the fifth  
example is also a valid atom name and may break your fix:

OD1
N
  OD1
  N
OD12

I will change the code to use the conventional form where possible,  
but be careful with your fix because it may break on some (rare) PDB  
files.

An important general point: PDB files (particularly older ones) are  
*very* messy. Efforts have been made within the PDB and at the EBI  
MSD to clean these files up, but there are still issues. This means  
that it is very hard to write a parser that can read in any PDB file  
and then output it in exactly the same format (including spacing  
etc...). The BioRuby parser should be able to parse any valid PDB  
file and output the data back out as a valid PDB format string, but  
the input and output are *not* guaranteed to be identical.

I have not had time to actively maintain the PDB parsing in BioRuby,  
so if you are interested in Ruby and PDB files feel free to submit  
more bug reports and patches.

Thanks again.

Alex Gutteridge

Bioinformatics Center
Kyoto University