[Biopython] Reading large files, Biopython cookbook example

Katrina Lexa klexa at umich.edu
Mon Jul 15 04:38:37 UTC 2013


Thank you both! I wasn't able to get that to work within the PDBParser script itself from Biopython (I kept getting the same int error, even though I was trying to catch it), but I just wrote my own little wrapper, and it's working as intended. I appreciate the help.

On Jul 14, 2013, at 9:42 AM, Nick Lindberg <nlindberg at mkei.org> wrote:

> It's interesting that it would roll over into hex after 9999.  (Maybe it's
> a matter of keeping the residue number within 4 digits without wrapping.)
> Either way, conversion from hex to decimal in Python is super easy.
> 
> If your hex character is in a variable "residue" then:
> 
> decimal_conversion = int(residue, 16)
> 
> will turn A000 into 10000, A001 into 10001, etc.  In your case, since you
> know it doesn't go to hex until after 9999 (and so that it will start with
> a letter) you could use an identifier to check if the first character is a
> letter or not, then convert it.
> 
> From there, you could either subtract 10000 to have it wrap properly, or
> fix Biopython to read the correct values.  (You could either do this on
> the fly in Biopython, or write a script to convert your residue file.)
> 
> Let me know if you'd like some help.
> 
> Thanks--
> 
> Nick Lindberg
> Sr. Consulting Engineer, HPC
> Milwaukee Institute
> 414.727.6413 (W)
> http://www.mkei.org
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 7/14/13 6:21 AM, "Peter Cock" <p.j.a.cock at googlemail.com> wrote:
> 
>> On Sat, Jul 13, 2013 at 7:50 AM, Katrina Lexa <klexa at umich.edu> wrote:
>>> Hi everyone,
>>> 
>>> I'm trying to do something that seems like it ought to be super simple,
>>> since it is on the Biopython wiki cookbook
>>> (http://biopython.org/wiki/Reading_large_PDB_files), but for some reason
>>> that script will not work for me.
>>> 
>>> When I try to run it as it is, on a pdb file that has more than 10000
>>> residues, I get the "NameError: global name 'Residue' is not defined" at
>>> line 77. My assumption was that maybe the script needed to import some
>>> other
>>> module from Biopython, so I added from Bio.PDB import * to the top of
>>> the
>>> script, but then it failed with "TypeError: 'str' object is not
>>> callable" at
>>> line 73 (residue = Residue(res_id, resname, self.segid). I tried to
>>> circumvent this by just changing the name of the variable being created,
>>> from residue = Residue to foobar = Residue (and then carrying that
>>> naming
>>> through), but I continued to get the TypeError. Has anyone seen this
>>> before
>>> and/or can anyone help me out getting this to run.
>>> 
>>> I have a file where all of the residues after 9999 are numbered starting
>>> with A000, and that causes the normal Bio.PDB.PDBParser to crash with
>>> invalid literal for int() with base 10: 'A000', so if there is an easier
>>> work around for that, that would also be a solution.
>>> 
>>> Thank you so much for your help!
>> 
>> It seems that the wiki example assumes the residues numbers
>> wrap round from at 9999 to restart 0, 1, 2, ... whereas your file
>> is going from 9999 to A000, A001, etc which I've not seen before.
>> 
>> Where did your PDB file come from? A public database?
>> Another tool?
>> 
>> Peter
>> _______________________________________________
>> Biopython mailing list  -  Biopython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
> 





More information about the Biopython mailing list