[Biopython-dev] Module reorganization for upcoming Bio.PDB enhancements

Rodrigo Faccioli rodrigo_faccioli at uol.com.br
Mon May 31 13:51:10 EDT 2010


Hi,

I would like to comment some ideas:

Firstly, I suggest to maintain the getStructure command. This command has
the goal load whole structure (models, chains, ATOM, HETAM, etc).  So, the
getStructure command is executed:  structure = getStructure(id)

Afterwards,  users can execute it as they need. Below I try to show some
specific exemaples.

In structure contains whole structure loaded including its errors. The
command can be like: structure.get_StructureErrors().getStructureErrors()
This command returns a dictionary containing all errors of the strcurure.
For complete example is [1]. One idea: this dictionary is created by WHATIF
module.

Other example is about convert command. It may have more options such as
model and chain. So, it can be called:
convert(structure, SelectedModels, SelectedChains,"1MOT.xml", "pdbxml")
When SelectedModels and SelectedChains options are None will be considered
all values of, respectively, models and chains of protein.

In this way we've developed a new Bio.PDB.Parser methodology. Please read
loadStructureFromFile function in [2].  This new methodology is an
alternative developed by my group research. With it we have worked with pdb
file and our database applying one parser only. In that example is showing
to work with pdb file only.

I hope this mail may contribute with something.
Sorry my English mistakes.


[1]
http://github.com/rodrigofaccioli/ContributeToBioPython/blob/master/scripts/check_structure.py
[2]
http://github.com/rodrigofaccioli/ContributeToBioPython/blob/master/fcfrp/PDBParser.py


Thanks in advance,

--
Rodrigo Antonio Faccioli
Ph.D Student in Electrical Engineering
University of Sao Paulo - USP
Engineering School of Sao Carlos - EESC
Department of Electrical Engineering - SEL
Intelligent System in Structure Bioinformatics
http://laips.sel.eesc.usp.br
Phone: 55 (16) 3373-9366 Ext 229
Curriculum Lattes - http://lattes.cnpq.br/1025157978990218
Public Profile - http://br.linkedin.com/pub/rodrigo-faccioli/7/589/a5


2010/5/31 Peter <biopython at maubp.freeserve.co.uk>

> On Mon, May 31, 2010 at 4:38 PM, Eric Talevich <eric.talevich at gmail.com>
> wrote:
> > Hi all,
> >
> > This summer our GSoC student João Rodrigues will be implementing a number
> of
> > enhancements to Biopython's structural biology modules. Since Bio.PDB is
> one
> > of the most widely used parts of Biopython, I'd like to find a way to
> > let João add major new features without breaking existing code and
> > documentation.
> >
> > There are a few issues I'd like to address:
> >
> > 1. The I/O conventions of parse/read/write/convert seem to work very well
> in
> > SeqIO, AlignIO, Phylo, and other Biopython sub-packages. Bio.PDB supports
> > I/O in several formats, but the API is lower-level and isn't unified in
> the
> > same way (yet).
>
> Currently Bio.PDB supports the plain text PDB format, and has partial
> support for mmCIF. It lacks support for the XML PDB format, PDBML -
> Protein Data Bank Markup Language.
>
> Under this proposed scheme, what would you see as the basic record type
> (analogous to a SeqRecord, alignment or tree in Bio.SeqIO, Bio.AlignIO and
> Bio.Phylo)? It would be nice to say a protein chain, but there is the issue
> of
> multiple models (e.g. from NMR). I presume you'd go with the model as the
> basic unit (where each model may contain multiple chains).
>
> > 2. PDB headers seem to have become better structured in recent years, in
> > both the wwPDB spec and submitted files. But header info isn't well
> > integrated with PDB Structure object, and parse_pdb_header needs some
> > attention as well.
>
> Agreed.
>
> > 3. Kristian asked on this list awhile ago about the proper location for
> his
> > new code that works with RNA structures. While RCSB's PDB contains some
> RNA
> > structures, the RNA world doesn't revolve around it. Similarly, João
> needs a
> > place to put code for structure prediction/validation servers,
> command-line
> > wrappers, secondary structures, etc.
> >
> >
> > I propose a new sub-package called Bio.Struct for these enhancements:
> >
> > from Bio import Struct
> > mystruct = Struct.read("1MOT.pdb", "pdb")
> > # Or, letting the format argument default to "pdb":
> > mystruct = Struct.read("1MOT.pdb")
> > # Eventually this will work too:
> > Struct.convert("1MOT.pdb", "pdb", "1MOT.xml", "pdbxml")
>
> I'd probably go with "pdbml" rather than "pdbxml" since that seems to be
> what the PDB themselves call it:
> http://www.pdb.org/pdb/static.do?p=file_formats/index.jsp
>
> > from Bio.Struct.Applications import DSSP
> > # Like the other command-line wrappers
> > # (I'm curious about Peter's cunning new scheme...)
>
> See:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-May/007773.html
>
> > from Bio.Struct import WHATIF, Jpred
> > # Servers each get their own module
>
> Hmm - perhaps we may need have another level here, Bio.Struct.Servers
> or Bio.Struct.WWW or something. How many of these do you expect?
>
> > from Bio.Struct import RNA
> > # Would this work for you, Kristian?
> >
> >
> > Alternatively, we could do all of this within the PDB module -- so
> picture
> > the above examples with "PDB" in place of "Struct". This raises the
> chance
> > of naming collisions, though, and doesn't solve issue #3 above.
> >
> >
> > We'll leave the existing PDB module layout alone, in general. I think it
> > will be necessary to add a few more attributes to the
> > Bio.PDB.Structure.Structure class, but we can do this without breaking
> > compatibility. Since fewer people depend on the exact formatting of the
> > Structure.header data (we believe), it's safer to change this dictionary,
> > moving the more essential entries to a separate attribute, or whatever
> seems
> > reasonable when we dig into it.
> >
> > Comments?
>
> I don't want us to break backwards compatibility in Bio.PDB (given how
> widely used it seems to be based on citations at least), but would like
> us to continue making small fixes or enhancements to it. Therefore a
> new Bio.Struct module may be the safer option.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list