[Bioperl-guts-l] [Bug 1895] New: Bio::Structure::IO::pdb missing
and extra TER records
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Sat Nov 5 11:45:09 EST 2005
http://bugzilla.open-bio.org/show_bug.cgi?id=1895
Summary: Bio::Structure::IO::pdb missing and extra TER records
Product: Bioperl
Version: main-trunk
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Core Components
AssignedTo: bioperl-guts-l at bioperl.org
ReportedBy: saul at alien-science.org
=== background ===========
use Bio::Structure::IO;
$in = Bio::Structure::IO->new(-file => $ARGV[0],
-format => 'pdb');
$s = $in->next_structure();
$out = Bio::Structure::IO->new(-file => ">woot.pdb",
-format => 'pdb');
$out->write_structure($s);
The above copies a pdb file by loading it into an Entry object and
then writing out the entry object. I am using it as a sanity check
for processing PDBS -- if the copy script gets things wrong then
I've hit an issue in the object->pdb serialisation rather than any
accident I've managed to code up. Its also instructive to diff the
bioperl built file with the raw file that it was created from.
I've hit a problem where some PDB files (eg 1A07, 1AOT, 1EFN, 1AOU)
change structure slightly when copied with the script. This behaviour
is down to Bio::Structure::IO::pdb and the insertion of TER records.
There a change added in version 1.6, 2002/07/19 10:32:10 of pdb.pm
at around line 748 that adds a TER when an ATOM->HETATM change is
detected when writing out a pdb.
=== problem =======
I believe there are two problems with this change
1) For the example pdb files given the following happens when
processing chains:
i) Write TER (set $wr_ter)
ii) Write chain
iii) Don't write TER ($wr_ter == 1) so chain doesn't end in
TER.
iv) Write 'TER 1 D 0' as variables are
not initialised
v) Write chain
vi) Don't write TER ($wr_ter == 1) so chain doesn't end in
TER.
This gives the following example diff between a raw 1A07.pdb and a
processed one:
1419a1420
> TER 1 C 0
1456d1456
< TER 1081 DIP C 103
2480a2481
> TER 1 D 0
2517d2517
< TER 2142 DIP D 103
The processed PDB contains some additional TER records containing
zeroed variables (state iv) and is missing some of the original TER
records (state vi).
2) There is a comment to suggest the change is there to add a TER
when moving from ATOM -> HETATM. However, I am not sure this
is actually needed as the example given in the PDB format
description actually has an example showing ATOM -> HETATM
changes without a TER insertion:
http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html
=== solution ========
My suggested patch is below. It keeps the existing behaviour of
ATOM->HETATM transition TER (although I'm not sure it is needed).
However, the $wr_ter flag has been removed and in its place a
$last_record variable is used to spot ATOM -> HETATM transitions.
I have run this on a selection of 40 pdbs. I have found that the
TERs are more correctly preserved.
--- IO/pdb.pm 2005-10-09 14:53:22.000000000 +0000
+++ /usr/lib/perl5/site_perl/5.8.5/Bio/Structure/IO/pdb.pm 2005-11-05
16:08:24.000000000 +0000
@@ -728,7 +728,7 @@
for my $chain ($struc->get_chains($model)) {
my ($residue, $atom, $resname, $resnum, $atom_line, $atom_serial,
$atom_icode, $chain_id);
my ($prev_resname, $prev_resnum, $prev_atomicode); # need these for
TER record
- my $wr_ter = 0; # have we already written out a TER for this chain
+ my $last_record; # Used to spot an ATOM -> HETATM change within a
chain
$chain_id = $chain->id;
if ( $chain_id eq "default" ) {
$chain_id = " ";
@@ -738,7 +738,8 @@
($resname, $resnum) = split /-/, $residue->id;
for $atom ($struc->get_atoms($residue)) {
if ($het_res{$resname}) { # HETATM
- if ( ! $wr_ter && $resname ne "HOH" ) { # going from
ATOM -> HETATM, we have to write TER
+ if ( $last_record eq "ATOM " && $resname ne "HOH" ) {
+ # going from ATOM -> HETATM, we have to write TER
my $ter_line = "TER ";
$ter_line .= sprintf("%5d", $atom_serial + 1);
$ter_line .= " ";
@@ -748,12 +749,12 @@
$ter_line .= $atom_icode ? $prev_atomicode : " "; #
27
$ter_line .= " " x (80 - length $ter_line); #
extend to 80 chars
$self->_print($ter_line,"\n");
- $wr_ter = 1;
}
$atom_line = "HETATM";
} else {
$atom_line = "ATOM ";
}
+ $last_record = $atom_line;
$atom_line .= sprintf("%5d ", $atom->serial);
$atom_serial = $atom->serial; # we need it for TER record
$atom_icode = $atom->icode;
@@ -820,8 +821,8 @@
$self->_print($atom_line,"\n");
}
}
- # write out TER record if it hasn't been written yet
- if ( $resname ne "HOH" && ! $wr_ter ) {
+ # write out TER record
+ if ( $resname ne "HOH" ) {
my $ter_line = "TER ";
$ter_line .= sprintf("%5d", $atom_serial + 1);
$ter_line .= " ";
@@ -831,7 +832,6 @@
$ter_line .= $atom_icode ? $atom_icode : " "; # 27
$ter_line .= " " x (80 - length $ter_line); # extend to 80
chars
$self->_print($ter_line,"\n");
- $wr_ter = 1;
}
}
if ($struc->get_models > 1) { # we need ENDMDL
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Bioperl-guts-l
mailing list