[Biopython-dev] bug-request

Tue Dec 10 08:22:24 EST 2002

> Trying to apply the PDB-Biopython-module with all PDB structures alvailable
> to me, I recognized that the job will sometimes be killed while processing
> a file, especially when processing large PDB files.
> The cause seems to be a lack of memory. 

Well, yes. PDB file 1HTQ is a monster of 70 MB, containing almost a million 
atoms. If you want to use the PDB module for this you'll have to buy some 
more memory, I guess. :-) 

> Seemingly the problem files are
> read several times (-due to an error within the header reading routine?-->
> I came upon this, because the program is printing the same discontinuations
> several times to the screen.)

No, that is because the chains in the file are discontinuous. 
This is not the problem.

> Again the core problem: Note that for example  "1htq"  and "1bxr" will not
> be processed correctly, but be killed after some time.

1BXR contains an error. It has two residues with the same identifier.

HETATM23384  K     K  3985      -8.986  34.229 -48.036  1.00 54.69           K
HETATM47621  K     K  3985     -19.641 -25.353 -32.655  1.00 39.94           K

Normally, this should be handled by using PDBParser(PERMISSIVE=1)
which would leave out the duplicated atoms, but there was a bug in the 
error handling code (there was an old assert statement instead of a "raise 
PDBConstructionError" statement). That's been corrected now, you can try out 
the new version. 1BXR should parse OK now with PDBParser(PERMISSIVE=1).

Cheers, 

-Thomas