[Biopython-dev] Numpy/Scipy and Biopython

Diego Zea diego_zea at yahoo.com.ar
Wed Nov 28 03:09:58 UTC 2012

Hi João (and others)!!! Thanks :)

I think someone with more Numpy knowledgement can do this better, but this is my idea:

1- Load the PDB direct to numpy (I do this fast and bad, don't trust in this parser)
2- Use a matrix nx3 for xyz and one matriz with named columns for other information. ( I dont know how )
[ The indice is the same, and you can use one for slice the other with boolean arrays ;) ]
3- Define methods for the most commons operations

This is and example of my idea (work on 1AB0 from PDB)...


import numpy

xyz = []

# The example structure is 
# http://www.rcsb.org/pdb/explore.do?structureId=1ab0 
with open("/home/dzea/databases/PDB/1ab0.pdb","r") as fh:
    """ Very naive parser.I write this in a couple of minutes.
    It's bad, but it's only for show the idea """
    for line in fh:
        if line[0:4]=='ATOM':
            temp =[]
            temp2 =[]
            temp.append(line[4:11].replace(" ",""))
            temp2.append(line[11:16].replace(" ",""))
            temp2.append(line[17:21].replace(" ",""))
            temp.append(line[22:27].replace(" ",""))
            temp.append(line[55:60].replace(" ",""))
            temp.append(line[60:67].replace(" ",""))
            temp2.append(line[-5:].replace(" ","").replace("\n",""))

# I don't good for using different dtypes 
# In different columns
# But can be better columns with names instead of this:
names_array = numpy.array(names,numpy.character)             
descript_array = numpy.array(descript,numpy.float16)
xyz_array = numpy.array(xyz,numpy.float16)

def select_atom(names,xyz,descript,atom='CA'):
    xyz_s = xyz[names[:,0]==atom,:]
    names_s = names[names[:,0]==atom,:]
    descript_s = descript[names[:,0]==atom,:]
    return names_s,xyz_s,descript_s

def delete_res_num(names,xyz,descript,num=20):
    xyz_s = xyz[descript[:,1]!=num,:]
    names_s = names[descript[:,1]!=num,:]
    descript_s = descript[descript[:,1]!=num,:]
    return names_s,xyz_s,descript_s

def delete_atom_num(names,xyz,descript,num=20):
    xyz_s = xyz[descript[:,0]!=num,:]
    names_s = names[descript[:,0]!=num,:]
    descript_s = descript[descript[:,0]!=num,:]
    return names_s,xyz_s,descript_s

def add_atom(new_name,new_xyz,new_descript,names,xyz,descript):
    # Using vstack ;)
    new_name = numpy.array(new_name,numpy.character)
    new_descript = numpy.array(new_descript,numpy.float16)
    new_xyz = numpy.array(new_xyz,numpy.float16)
    xyz_s = numpy.vstack((xyz,new_xyz))
    names_s = numpy.vstack((names,new_name))
    descript_s = numpy.vstack((descript,new_descript))
    return names_s,xyz_s,descript_s

## Example (works!!!)

if ((dx*dp)>=(h/(2*pi)))
printf("Diego Javier Zea\n");

> De: João Rodrigues <anaryin at gmail.com>
>Para: Diego Zea <diego_zea at yahoo.com.ar> 
>CC: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org> 
>Enviado: martes, 27 de noviembre de 2012 12:40
>Asunto: Re: [Biopython-dev] Numpy/Scipy and Biopython
>Hi Diego,
>Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach.
>2012/11/27 Diego Zea <diego_zea at yahoo.com.ar>

More information about the Biopython-dev mailing list