[Bioperl-guts-l] [Bug 1895] New: Bio::Structure::IO::pdb missing and extra TER records

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sat Nov 5 11:45:09 EST 2005


http://bugzilla.open-bio.org/show_bug.cgi?id=1895

           Summary: Bio::Structure::IO::pdb  missing and extra TER records
           Product: Bioperl
           Version: main-trunk
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: saul at alien-science.org


=== background ===========

use Bio::Structure::IO;
$in  = Bio::Structure::IO->new(-file    => $ARGV[0],
                               -format  => 'pdb');
$s   = $in->next_structure();
$out = Bio::Structure::IO->new(-file   => ">woot.pdb",
                               -format => 'pdb');
$out->write_structure($s);

The above copies a pdb file by loading it into an Entry object and
then writing out the entry object. I am using it as a sanity check
for processing PDBS -- if the copy script gets things wrong then
I've hit an issue in the object->pdb serialisation rather than any
accident I've managed to code up. Its also instructive to diff the
bioperl built file with the raw file that it was created from.

I've hit a problem where some PDB files (eg 1A07, 1AOT, 1EFN, 1AOU)
change structure slightly when copied with the script. This behaviour
is down to Bio::Structure::IO::pdb and the insertion of TER records.
There a change added in version 1.6, 2002/07/19 10:32:10 of pdb.pm
at around line 748 that adds a TER when an ATOM->HETATM change is
detected when writing out a pdb.

=== problem =======

I believe there are two problems with this change
1) For the example pdb files given the following happens when
   processing chains:
   i)   Write TER (set $wr_ter)
   ii)  Write chain
   iii) Don't write TER ($wr_ter == 1) so chain doesn't end in
        TER.
   iv)  Write 'TER       1          D   0' as variables are
        not initialised
   v)   Write chain
   vi)  Don't write TER ($wr_ter == 1) so chain doesn't end in
        TER.


This gives the following example diff between a raw 1A07.pdb and a
processed one:
1419a1420
> TER       1          C   0
1456d1456
< TER    1081      DIP C 103
2480a2481
> TER       1          D   0
2517d2517
< TER    2142      DIP D 103

The processed PDB contains some additional TER records containing
zeroed variables (state iv) and is missing some of the original TER 
records (state vi).

2) There is a comment to suggest the change is there to add a TER 
   when moving from ATOM -> HETATM. However, I am not sure this
   is actually needed as the example given in the PDB format
   description actually has an example showing ATOM -> HETATM
   changes without a TER insertion:
   http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html


=== solution ========

My suggested patch is below. It keeps the existing behaviour of
ATOM->HETATM transition TER (although I'm not sure it is needed).
However, the $wr_ter flag has been removed and in its place a
$last_record variable is used to spot ATOM -> HETATM transitions.
I have run this on a selection of 40 pdbs. I have found that the
TERs are more correctly preserved.

--- IO/pdb.pm   2005-10-09 14:53:22.000000000 +0000
+++ /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/IO/pdb.pm  2005-11-05
16:08:24.000000000 +0000
@@ -728,7 +728,7 @@
        for my $chain ($struc->get_chains($model)) {
            my ($residue, $atom, $resname, $resnum, $atom_line, $atom_serial,
$atom_icode, $chain_id);
            my ($prev_resname, $prev_resnum, $prev_atomicode); # need these for
TER record
-           my $wr_ter = 0; # have we already written out a TER for this chain
+           my $last_record; # Used to spot an ATOM -> HETATM change within a
chain
            $chain_id = $chain->id;
            if ( $chain_id eq "default" ) {
                $chain_id = " ";
@@ -738,7 +738,8 @@
                ($resname, $resnum) = split /-/, $residue->id;
                for $atom ($struc->get_atoms($residue)) {
                    if ($het_res{$resname}) {  # HETATM
-                       if ( ! $wr_ter && $resname ne "HOH" ) { # going from
ATOM -> HETATM, we have to write TER
+                       if ( $last_record eq "ATOM  " && $resname ne "HOH" ) {
+                           # going from ATOM -> HETATM, we have to write TER
                            my $ter_line = "TER   ";
                            $ter_line .= sprintf("%5d", $atom_serial + 1);
                            $ter_line .= "      ";
@@ -748,12 +749,12 @@
                            $ter_line .= $atom_icode ? $prev_atomicode : " "; #
27
                            $ter_line .= " " x (80 - length $ter_line);  #
extend to 80 chars
                            $self->_print($ter_line,"\n");
-                           $wr_ter = 1;
                        }
                        $atom_line = "HETATM";
                    } else {
                        $atom_line = "ATOM  ";
                    }
+                   $last_record = $atom_line;
                    $atom_line .= sprintf("%5d ", $atom->serial);
                    $atom_serial = $atom->serial; # we need it for TER record
                    $atom_icode = $atom->icode;
@@ -820,8 +821,8 @@
                    $self->_print($atom_line,"\n");
                }
            }
-           # write out TER record if it hasn't been written yet
-           if ( $resname ne "HOH" && ! $wr_ter ) {
+           # write out TER record
+           if ( $resname ne "HOH" ) {
                my $ter_line = "TER   ";
                $ter_line .= sprintf("%5d", $atom_serial + 1);
                $ter_line .= "      ";
@@ -831,7 +832,6 @@
                $ter_line .= $atom_icode ? $atom_icode : " "; # 27
                $ter_line .= " " x (80 - length $ter_line);  # extend to 80
chars
                $self->_print($ter_line,"\n");
-               $wr_ter = 1;
            }
        }
        if ($struc->get_models > 1) { # we need ENDMDL




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Bioperl-guts-l mailing list