[Bioperl-l] writing PDB format from BioPerl

Ed Green ed@compbio.berkeley.edu
Thu, 10 Jan 2002 18:02:28 -0800 (PST)


On Fri, 11 Jan 2002, Kris Boulez wrote:

> I've just checked in a new version of Bio::Structure::IO::pdb.pm which
> has a working write_structure . Meaning that you can now write out PDB
> records from BioPerl.
> 
> This is not the final version, but it is what I have now ready for 0.9.3
> (Ewan, I hope I'm on time). I think it is fairly complete.
> 
> Things it doesn't do (right) at the moment
> - no ANISOU,SIGUIJ,SIGATM records for the moment
> - placement of TER record is sometimes different then in original
>    (someone has the exact algoritm ?)
> - MASTER record (contains checksums) is not calculated, but used from
>    original
> - minor glitches as PDB records are mostly created by humans and
>    not computers
> 
> In the near future I hope to add the missing records, add documentation
> and examples, and see if the DSSP and STRIDE modules can be integrated
> better.

Kris-
Very nice.  I think I can help integrate the SecStr modules.  Since the 
Structure objects can now output pdb files, interaction between 
STRIDE/DSSP and Structure objects can be completely abstracted away, 
bringing the DSPP and STRIDE modules into bioperl API conformance.

What I have in mind is a method (perhaps called getSecStr) for Structure 
objects which will take 'DSSP' or 'STRIDE' as a parameter.  The indicated 
executable will be invoked with proper pdb input.  The results will be 
parsed.  These results, rather than being encapsulated in an object as 
they are now, will just add information to the existing Structure object 
at the level of Residue.  This would require additional Residue fields for 
secondary structure, exposed surface area, and any other information you 
want to hang on to from DSSP/STRIDE.  I guess what I'm describing is 
letting go of STRIDE and DSSP as separate objects and just folding their 
functionality into Structure objects.

The only problem with this design is that often I'm not interested in any 
structural feature except secondary structure information.  In this case 
the only purpose of having a Structure object would be to call the 
getSecStr method on it.  That's fine, except that creating a Structure 
object parses the pdb file, then getSecStr communicates with STRIDE/DSSP 
through a pdb which it writes back out.  Then, the only purposes of 
parsing the pdb file is to write the pdb file back out.  I understand that 
this very thing happens with Seq objects, but parsing and writing sequence 
files doesn't involve nearly as much overhead as parsing and writing 
structure files.  

It would be faster (and less  error prone) if there could be a switch 
when invoking new Structures, which just associates a pdb file with the 
object and delays the parsing until some method is invoked which requires 
parsing or until the user requests the full structure object.  Then 
getSecStr or any other future method which does an analysis on a structure and 
requires a pdb file as input could just be passed the pdb file directly.

Comments/suggestions are welcomed.

Ed Green

***********************
Brenner Research Group
UC Berkeley
ed@compbio.berkeley.edu
510-642-9614
***********************