[Bioperl-l] Parsing PDB entries in BioPerl

Kris Boulez Kris.Boulez@algonomics.com
Tue, 13 Nov 2001 19:10:56 +0100


As I found myself writing ad-hoc scripts to get certain data out of a
PDB entry, I've decided to write a PDB parser for BioPerl. 

The idea is to parse every line in the entry and to have access to all
the data via some Bio:: object. The work on the SeqIO parser (Bio::SeqIO::pdb)
is progressing nicely.

For the moment I'm working on parsing all the different 'records' (PDBspeak
for different lines) and not so much on how to store the info in a Bio:: 
object (references are already stored in Bio::Annotation::Reference objects).
The moment to start thinking abouth 'how' to store 'what' inside 'which' 
Bio::* object has arrived. 

My first thought was to inherit from a Bio::Seq object, but this does
not seem to be the right approach
  - which sequence to store (the one from Swiss-Prot)
  - not every residue has coordinates (C,N terminal)
  - PDB entries can consist of multiple 'chains' (i.e. a complex of two
    proteins)
  - how to handle post-translational modifications
  - there is no easy access to the data that makes PDB special (x,y,z
    coordinates, ...)
  - how to handle 'models' (structures determined by NMR, do not consist
    of one, but multiple entries).

This suggests that a new type of object might be needed. To start
thinking about this I think it might be good to think about how the user
might use this object (i.e. 'which questions would you ask ?). So
therefor I would want to ask you which data in a PDB entry you're
typically intrested in and which questions you want to ask to such an
object.



Kris,
-- 
Kris Boulez 				Tel: +32-9-241.11.00
AlgoNomics NV 				Fax: +32-9-241.11.02
Technologiepark 4 			email: kris.boulez@algonomics.com
B 9052 Zwijnaarde 			http://www.algonomics.com/