[Biopython] Reading large files, Biopython cookbook example

Sun Jul 14 16:40:32 UTC 2013

Hi Peter,

My PDB file came from Maestro, so that is the ordering it follows after 9999. I tried to modify the parser script so that it accounted for the different format of my PDB file, just by changing line 166 to say something like-

try:
    resseq=str(line[22:26].split()[0]) # sequence identifier
except ValueError:
    resseq=10000 # sequence identifier

But my Python is not great, and I think I'm missing something with that, because I get the same error.

Thank you for your help,

Katrina

On Jul 14, 2013, at 4:21 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
> It seems that the wiki example assumes the residues numbers
> wrap round from at 9999 to restart 0, 1, 2, ... whereas your file
> is going from 9999 to A000, A001, etc which I've not seen before.
> 
> Where did your PDB file come from? A public database?
> Another tool?
> 
> Peter

> On Sat, Jul 13, 2013 at 7:50 AM, Katrina Lexa <klexa at umich.edu> wrote:
>> Hi everyone,
>> 
>> I'm trying to do something that seems like it ought to be super simple,
>> since it is on the Biopython wiki cookbook
>> (http://biopython.org/wiki/Reading_large_PDB_files), but for some reason
>> that script will not work for me.
>> 
>> When I try to run it as it is, on a pdb file that has more than 10000
>> residues, I get the "NameError: global name 'Residue' is not defined" at
>> line 77. My assumption was that maybe the script needed to import some other
>> module from Biopython, so I added from Bio.PDB import * to the top of the
>> script, but then it failed with "TypeError: 'str' object is not callable" at
>> line 73 (residue = Residue(res_id, resname, self.segid). I tried to
>> circumvent this by just changing the name of the variable being created,
>> from residue = Residue to foobar = Residue (and then carrying that naming
>> through), but I continued to get the TypeError. Has anyone seen this before
>> and/or can anyone help me out getting this to run.
>> 
>> I have a file where all of the residues after 9999 are numbered starting
>> with A000, and that causes the normal Bio.PDB.PDBParser to crash with
>> invalid literal for int() with base 10: 'A000', so if there is an easier
>> work around for that, that would also be a solution.
>> 
>> Thank you so much for your help!
>