[Biopython] Reading large files, Biopython cookbook example

Sun Jul 14 16:42:27 UTC 2013

It's interesting that it would roll over into hex after 9999.  (Maybe it's
a matter of keeping the residue number within 4 digits without wrapping.)
Either way, conversion from hex to decimal in Python is super easy.

If your hex character is in a variable "residue" then:

decimal_conversion = int(residue, 16)

will turn A000 into 10000, A001 into 10001, etc.  In your case, since you
know it doesn't go to hex until after 9999 (and so that it will start with
a letter) you could use an identifier to check if the first character is a
letter or not, then convert it.

>From there, you could either subtract 10000 to have it wrap properly, or
fix Biopython to read the correct values.  (You could either do this on
the fly in Biopython, or write a script to convert your residue file.)

Let me know if you'd like some help.

Thanks--

Nick Lindberg
Sr. Consulting Engineer, HPC
Milwaukee Institute
414.727.6413 (W)
http://www.mkei.org

On 7/14/13 6:21 AM, "Peter Cock" <p.j.a.cock at googlemail.com> wrote:

>On Sat, Jul 13, 2013 at 7:50 AM, Katrina Lexa <klexa at umich.edu> wrote:
>> Hi everyone,
>>
>> I'm trying to do something that seems like it ought to be super simple,
>> since it is on the Biopython wiki cookbook
>> (http://biopython.org/wiki/Reading_large_PDB_files), but for some reason
>> that script will not work for me.
>>
>> When I try to run it as it is, on a pdb file that has more than 10000
>> residues, I get the "NameError: global name 'Residue' is not defined" at
>> line 77. My assumption was that maybe the script needed to import some
>>other
>> module from Biopython, so I added from Bio.PDB import * to the top of
>>the
>> script, but then it failed with "TypeError: 'str' object is not
>>callable" at
>> line 73 (residue = Residue(res_id, resname, self.segid). I tried to
>> circumvent this by just changing the name of the variable being created,
>> from residue = Residue to foobar = Residue (and then carrying that
>>naming
>> through), but I continued to get the TypeError. Has anyone seen this
>>before
>> and/or can anyone help me out getting this to run.
>>
>> I have a file where all of the residues after 9999 are numbered starting
>> with A000, and that causes the normal Bio.PDB.PDBParser to crash with
>> invalid literal for int() with base 10: 'A000', so if there is an easier
>> work around for that, that would also be a solution.
>>
>> Thank you so much for your help!
>
>It seems that the wiki example assumes the residues numbers
>wrap round from at 9999 to restart 0, 1, 2, ... whereas your file
>is going from 9999 to A000, A001, etc which I've not seen before.
>
>Where did your PDB file come from? A public database?
>Another tool?
>
>Peter
>_______________________________________________
>Biopython mailing list  -  Biopython at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biopython