[Biopython-dev] Module reorganization for upcoming Bio.PDB enhancements

Eric Talevich eric.talevich at gmail.com
Tue Jun 1 03:44:11 UTC 2010

On Mon, May 31, 2010 at 11:53 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Mon, May 31, 2010 at 4:38 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > Hi all,
> >
> > This summer our GSoC student João Rodrigues will be implementing a number
> of
> > enhancements to Biopython's structural biology modules. Since Bio.PDB is
> one
> > of the most widely used parts of Biopython, I'd like to find a way to
> > let João add major new features without breaking existing code and
> > documentation.
> >
> > There are a few issues I'd like to address:
> >
> > 1. The I/O conventions of parse/read/write/convert seem to work very well
> in
> > SeqIO, AlignIO, Phylo, and other Biopython sub-packages. Bio.PDB supports
> > I/O in several formats, but the API is lower-level and isn't unified in
> the
> > same way (yet).
> Currently Bio.PDB supports the plain text PDB format, and has partial
> support for mmCIF. It lacks support for the XML PDB format, PDBML -
> Protein Data Bank Markup Language.

Yeah, it would be good to implement that at some point. For now, I'd be
happy to be able to read and write PDB files with a single function call
each, and design the I/O wrapper for easy extension to mmCIF and PDBML.

Under this proposed scheme, what would you see as the basic record type
> (analogous to a SeqRecord, alignment or tree in Bio.SeqIO, Bio.AlignIO and
> Bio.Phylo)? It would be nice to say a protein chain, but there is the issue
> of
> multiple models (e.g. from NMR). I presume you'd go with the model as the
> basic unit (where each model may contain multiple chains).

I'd consider a structure to be the basic unit of I/O. If we're going to make
better use of header info, that's generally associated with the whole
structure and not individual models -- we'd have to duplicate the header
info in each Model object emitted, which would be weird.

Are there any formats that store more than one structure in a file? If not,
then there's probably no need for a parse() function in Bio.Struct.

> > from Bio.Struct import WHATIF, Jpred
> > # Servers each get their own module
> Hmm - perhaps we may need have another level here, Bio.Struct.Servers
> or Bio.Struct.WWW or something. How many of these do you expect?

João's project plan includes Dali and WHATIF:

These servers do different things so I wouldn't expect any similarity in the
code between them. There are lots of servers that we *could* support...
Aesthetically, a Servers or WWW subdirectory would match
Bio.Struct.Applications and make the whole package a little more

Here's one more idea: Fetching a single PDB file from RCSB requires a
separate import and a couple of calls. Should we make this even easier by
mimicking the efetch function in Bio.Entrez, something like

>>> handle = Bio.PDB.fetch("1MOT")


>>> from Bio.Struct.WWW import RCSB
>>> handle = RCSB.fetch("1MOT", "pdb")



More information about the Biopython-dev mailing list