[Biopython] About Google Summer Code Project PDB-tidy

Fuxiao Xin fuxin at umail.iu.edu
Thu Apr 8 01:40:02 UTC 2010


Dear all,

I am a third year Phd student in Bioinformatics from Indiana University
Bloomington.  I am very in interested in the google summer code project of
biopython "PDB-Tidy: command-line tools for manipulating PDB files".

My own research needs extensive manipulation of PDB files, and I think  this
idea of adding more features to Bio.PDB and more command line options to
analyze/present PDB data is excellent. This project is of strong interest to
me since it will benefit my own research project as well.

Programming Skills: I use perl and python during my daily research. I am now
working on developing a new functional site predictor using protein
structure information. The code will be open source, but the work is under
review so the code is not released yet.

My project plan:

week1
1. Renumber residues starting from 1 (or N)
function name: renumberPDB, given a pdb file, rename the atom field
numbering of the file to remove missing amino acids
communicate with mentors to set standards of the code to follow for the rest
of the functions
create work log to keep track of process;

week2-3
2. Select a portion of the structure -- models, chains, etc. -- and write it
to a new file (PDB, FASTA, and other formats)
function name: rewritePDB, inputs will be a particular portion of a PDB file
you want to write out(support 'chain', 'model', 'atom'), a file format(PDB,
fasta), and the output name.
3. Perform some basic, well-established measures of model quality/validity
function name: PDBquality
the function will report RESOLUTION and ? of the structure
4. extract disorder region in PDB structure
function name: PDBdisorder
report missing residues in the structure atom field

week3-4
5. make a function to draw a Ramachandran plot
function name: ramaPLOT
combine the two steps(calcualting torsion angles and draw the plot) into one
function, give the option to draw the plot or not

week5
6. open PDB files in the window for visulization, visulize PDBsuperpose
results, output RMSD
function name: superposePDB
the function will look like the PDBsuperpose function in matlab; use
Bio.PDB.Superimposer() to perform the superimpose, use Jmol or other
visulization tool to see the results
week6
7. write a function to extract all experimental conditions of a PDB file,
includes PH, temperature, and salt
function name: PDBconditon
it will be easy to get PH and temperature information, but for salt, it will
be hard to parse because there is no general rule of such information in the
PDB file; parse REMARK 200 field;

week7-8
8. extract PTM,
function name: PDBptm
difficult: the Post-translational modification annotation in PDB is not
consistant, need to make a list of PTMs to work on
parse MODRES field

week9-10
9. extract ligand binding information
function name: PDBligand
parse HETNAM field


Other obligations:  I am aware that google summer code starts from May 24th,
but I will have a review paper with my advisor due on June 1st, I hope it
will be OK for me to start after June 1st, and I will makeup the first week
in Auguest.

Best,
Fuxiao



More information about the Biopython mailing list