[Biopython-dev] Module reorganization for upcoming Bio.PDB enhancements

Eric Talevich eric.talevich at gmail.com
Mon May 31 23:44:11 EDT 2010


On Mon, May 31, 2010 at 11:53 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Mon, May 31, 2010 at 4:38 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > Hi all,
> >
> > This summer our GSoC student João Rodrigues will be implementing a number
> of
> > enhancements to Biopython's structural biology modules. Since Bio.PDB is
> one
> > of the most widely used parts of Biopython, I'd like to find a way to
> > let João add major new features without breaking existing code and
> > documentation.
> >
> > There are a few issues I'd like to address:
> >
> > 1. The I/O conventions of parse/read/write/convert seem to work very well
> in
> > SeqIO, AlignIO, Phylo, and other Biopython sub-packages. Bio.PDB supports
> > I/O in several formats, but the API is lower-level and isn't unified in
> the
> > same way (yet).
>
> Currently Bio.PDB supports the plain text PDB format, and has partial
> support for mmCIF. It lacks support for the XML PDB format, PDBML -
> Protein Data Bank Markup Language.
>

Yeah, it would be good to implement that at some point. For now, I'd be
happy to be able to read and write PDB files with a single function call
each, and design the I/O wrapper for easy extension to mmCIF and PDBML.


Under this proposed scheme, what would you see as the basic record type
> (analogous to a SeqRecord, alignment or tree in Bio.SeqIO, Bio.AlignIO and
> Bio.Phylo)? It would be nice to say a protein chain, but there is the issue
> of
> multiple models (e.g. from NMR). I presume you'd go with the model as the
> basic unit (where each model may contain multiple chains).
>

I'd consider a structure to be the basic unit of I/O. If we're going to make
better use of header info, that's generally associated with the whole
structure and not individual models -- we'd have to duplicate the header
info in each Model object emitted, which would be weird.

Are there any formats that store more than one structure in a file? If not,
then there's probably no need for a parse() function in Bio.Struct.



> > from Bio.Struct import WHATIF, Jpred
> > # Servers each get their own module
>
> Hmm - perhaps we may need have another level here, Bio.Struct.Servers
> or Bio.Struct.WWW or something. How many of these do you expect?
>

João's project plan includes Dali and WHATIF:
http://biopython.org/wiki/GSOC2010_Joao

These servers do different things so I wouldn't expect any similarity in the
code between them. There are lots of servers that we *could* support...
Aesthetically, a Servers or WWW subdirectory would match
Bio.Struct.Applications and make the whole package a little more
self-documenting.

Here's one more idea: Fetching a single PDB file from RCSB requires a
separate import and a couple of calls. Should we make this even easier by
mimicking the efetch function in Bio.Entrez, something like

>>> handle = Bio.PDB.fetch("1MOT")

or

>>> from Bio.Struct.WWW import RCSB
>>> handle = RCSB.fetch("1MOT", "pdb")

?

-Eric



More information about the Biopython-dev mailing list