[Biopython-dev] [Biopython - Bug #3379] PDBParser fails to parse PDBs produced by PatchDock

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Mon Aug 27 04:24:16 UTC 2012


Issue #3379 has been updated by David Cain.


Regarding "pure concatenation," I wasn't exaggerating when I said really ugly Perl scripts. =)

I created a "pull request on the Biopython GitHub repository":https://github.com/biopython/biopython/pull/60. Could you give me some feedback on my solution? If the devs agree on a certain behavior, I'll start writing some unit tests.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379

Author: David Cain
Status: New
Priority: Low
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.57
URL: 


I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list