[Bioperl-l] writing PDB format from BioPerl

Kris Boulez Kris.Boulez@algonomics.com
Fri, 11 Jan 2002 11:07:23 +0100


Quoting Ed Green (ed@compbio.berkeley.edu):
> 
> What I have in mind is a method (perhaps called getSecStr) for Structure 
> objects which will take 'DSSP' or 'STRIDE' as a parameter.  The indicated 
> executable will be invoked with proper pdb input.  The results will be 
> parsed.  These results, rather than being encapsulated in an object as 
> they are now, will just add information to the existing Structure object 
> at the level of Residue.  This would require additional Residue fields for 
> secondary structure, exposed surface area, and any other information you 
> want to hang on to from DSSP/STRIDE.  I guess what I'm describing is 
> letting go of STRIDE and DSSP as separate objects and just folding their 
> functionality into Structure objects.
> 
This was also my idea.

> The only problem with this design is that often I'm not interested in any 
> structural feature except secondary structure information.  In this case 
> the only purpose of having a Structure object would be to call the 
> getSecStr method on it.  That's fine, except that creating a Structure 
> object parses the pdb file, then getSecStr communicates with STRIDE/DSSP 
> through a pdb which it writes back out.  Then, the only purposes of 
> parsing the pdb file is to write the pdb file back out.  I understand that 
> this very thing happens with Seq objects, but parsing and writing sequence 
> files doesn't involve nearly as much overhead as parsing and writing 
> structure files.  
> 
Do I see it correct when I say these are the steps you want to take

a) run STRIDE/DSSP on pdb_file. Produces output file (stride.out)

b) read in pdb_file and stride.out, giving you Structure object with
   additional residue fields

c) do something on Structure object

I think there is a seperate design pattern for running an external
application in BioPerl (look under Bio::Tools::Run). This would do a)

Adding the STRIDE output to the Structure object (step b) can then be
done from a new Bio::Structure::IO::stride object (or from SearchIO ?).


> It would be faster (and less  error prone) if there could be a switch 
> when invoking new Structures, which just associates a pdb file with the 
> object and delays the parsing until some method is invoked which requires 
> parsing or until the user requests the full structure object.  Then 
> getSecStr or any other future method which does an analysis on a structure and 
> requires a pdb file as input could just be passed the pdb file directly.
> 

The BioPerl IO system is stream based and it all or nothing. Whence you
call next_structure() it has to go to the end.
I do agree that parsing every header line can get slow and looking at a
method for specifying which lines to parse and which not might be
something to look into.

Kris,