[Biopython] Reading large files, Biopython cookbook example

Sun Jul 14 11:21:49 UTC 2013

On Sat, Jul 13, 2013 at 7:50 AM, Katrina Lexa <klexa at umich.edu> wrote:
> Hi everyone,
>
> I'm trying to do something that seems like it ought to be super simple,
> since it is on the Biopython wiki cookbook
> (http://biopython.org/wiki/Reading_large_PDB_files), but for some reason
> that script will not work for me.
>
> When I try to run it as it is, on a pdb file that has more than 10000
> residues, I get the "NameError: global name 'Residue' is not defined" at
> line 77. My assumption was that maybe the script needed to import some other
> module from Biopython, so I added from Bio.PDB import * to the top of the
> script, but then it failed with "TypeError: 'str' object is not callable" at
> line 73 (residue = Residue(res_id, resname, self.segid). I tried to
> circumvent this by just changing the name of the variable being created,
> from residue = Residue to foobar = Residue (and then carrying that naming
> through), but I continued to get the TypeError. Has anyone seen this before
> and/or can anyone help me out getting this to run.
>
> I have a file where all of the residues after 9999 are numbered starting
> with A000, and that causes the normal Bio.PDB.PDBParser to crash with
> invalid literal for int() with base 10: 'A000', so if there is an easier
> work around for that, that would also be a solution.
>
> Thank you so much for your help!

It seems that the wiki example assumes the residues numbers
wrap round from at 9999 to restart 0, 1, 2, ... whereas your file
is going from 9999 to A000, A001, etc which I've not seen before.

Where did your PDB file come from? A public database?
Another tool?

Peter