[Biopython-dev] Module reorganization for upcoming Bio.PDB enhancements
Eric Talevich
eric.talevich at gmail.com
Mon May 31 11:38:51 EDT 2010
Hi all,
This summer our GSoC student João Rodrigues will be implementing a number of
enhancements to Biopython's structural biology modules. Since Bio.PDB is one
of the most widely used parts of Biopython, I'd like to find a way to
let João add major new features without breaking existing code and
documentation.
There are a few issues I'd like to address:
1. The I/O conventions of parse/read/write/convert seem to work very well in
SeqIO, AlignIO, Phylo, and other Biopython sub-packages. Bio.PDB supports
I/O in several formats, but the API is lower-level and isn't unified in the
same way (yet).
2. PDB headers seem to have become better structured in recent years, in
both the wwPDB spec and submitted files. But header info isn't well
integrated with PDB Structure object, and parse_pdb_header needs some
attention as well.
3. Kristian asked on this list awhile ago about the proper location for his
new code that works with RNA structures. While RCSB's PDB contains some RNA
structures, the RNA world doesn't revolve around it. Similarly, João needs a
place to put code for structure prediction/validation servers, command-line
wrappers, secondary structures, etc.
I propose a new sub-package called Bio.Struct for these enhancements:
from Bio import Struct
mystruct = Struct.read("1MOT.pdb", "pdb")
# Or, letting the format argument default to "pdb":
mystruct = Struct.read("1MOT.pdb")
# Eventually this will work too:
Struct.convert("1MOT.pdb", "pdb", "1MOT.xml", "pdbxml")
from Bio.Struct.Applications import DSSP
# Like the other command-line wrappers
# (I'm curious about Peter's cunning new scheme...)
from Bio.Struct import WHATIF, Jpred
# Servers each get their own module
from Bio.Struct import RNA
# Would this work for you, Kristian?
Alternatively, we could do all of this within the PDB module -- so picture
the above examples with "PDB" in place of "Struct". This raises the chance
of naming collisions, though, and doesn't solve issue #3 above.
We'll leave the existing PDB module layout alone, in general. I think it
will be necessary to add a few more attributes to the
Bio.PDB.Structure.Structure class, but we can do this without breaking
compatibility. Since fewer people depend on the exact formatting of the
Structure.header data (we believe), it's safer to change this dictionary,
moving the more essential entries to a separate attribute, or whatever seems
reasonable when we dig into it.
Comments?
Thanks,
Eric
More information about the Biopython-dev
mailing list