[Biopython-dev] Numpy/Scipy and Biopython
Diego Zea
diego_zea at yahoo.com.ar
Tue Nov 27 22:09:58 EST 2012
""""
Hi João (and others)!!! Thanks :)
I think someone with more Numpy knowledgement can do this better, but this is my idea:
1- Load the PDB direct to numpy (I do this fast and bad, don't trust in this parser)
2- Use a matrix nx3 for xyz and one matriz with named columns for other information. ( I dont know how )
[ The indice is the same, and you can use one for slice the other with boolean arrays ;) ]
3- Define methods for the most commons operations
This is and example of my idea (work on 1AB0 from PDB)...
""""
import numpy
names=[]
descript=[]
xyz = []
# The example structure is
# http://www.rcsb.org/pdb/explore.do?structureId=1ab0
with open("/home/dzea/databases/PDB/1ab0.pdb","r") as fh:
""" Very naive parser.I write this in a couple of minutes.
It's bad, but it's only for show the idea """
for line in fh:
if line[0:4]=='ATOM':
temp =[]
temp2 =[]
temp.append(line[4:11].replace(" ",""))
temp2.append(line[11:16].replace(" ",""))
temp2.append(line[17:21].replace(" ",""))
temp.append(line[22:27].replace(" ",""))
xyz.append(line[31:56].split())
temp.append(line[55:60].replace(" ",""))
temp.append(line[60:67].replace(" ",""))
temp2.append(line[-5:].replace(" ","").replace("\n",""))
descript.append(temp)
names.append(temp2)
# I don't good for using different dtypes
# In different columns
# But can be better columns with names instead of this:
names_array = numpy.array(names,numpy.character)
descript_array = numpy.array(descript,numpy.float16)
xyz_array = numpy.array(xyz,numpy.float16)
def select_atom(names,xyz,descript,atom='CA'):
xyz_s = xyz[names[:,0]==atom,:]
names_s = names[names[:,0]==atom,:]
descript_s = descript[names[:,0]==atom,:]
return names_s,xyz_s,descript_s
def delete_res_num(names,xyz,descript,num=20):
xyz_s = xyz[descript[:,1]!=num,:]
names_s = names[descript[:,1]!=num,:]
descript_s = descript[descript[:,1]!=num,:]
return names_s,xyz_s,descript_s
def delete_atom_num(names,xyz,descript,num=20):
xyz_s = xyz[descript[:,0]!=num,:]
names_s = names[descript[:,0]!=num,:]
descript_s = descript[descript[:,0]!=num,:]
return names_s,xyz_s,descript_s
def add_atom(new_name,new_xyz,new_descript,names,xyz,descript):
# Using vstack ;)
new_name = numpy.array(new_name,numpy.character)
new_descript = numpy.array(new_descript,numpy.float16)
new_xyz = numpy.array(new_xyz,numpy.float16)
xyz_s = numpy.vstack((xyz,new_xyz))
names_s = numpy.vstack((names,new_name))
descript_s = numpy.vstack((descript,new_descript))
return names_s,xyz_s,descript_s
## Example (works!!!)
xyz_array.shape
delete_atom_num(names_array,xyz_array,descript_array)[1].shape
add_atom(['H','H','H'],[0,0,0],[0,0,0,0],names_array,xyz_array,descript_array)[1].shape
if ((dx*dp)>=(h/(2*pi)))
{
printf("Diego Javier Zea\n");
}
>________________________________
> De: João Rodrigues <anaryin at gmail.com>
>Para: Diego Zea <diego_zea at yahoo.com.ar>
>CC: "biopython-dev at lists.open-bio.org" <biopython-dev at lists.open-bio.org>
>Enviado: martes, 27 de noviembre de 2012 12:40
>Asunto: Re: [Biopython-dev] Numpy/Scipy and Biopython
>
>
>Hi Diego,
>
>
>Nice post and nice ideas. As for Bio.PDB, indeed representing the entire structure as a Nx3 matrix of coordinates is super attractive, but would require a deep change in the current framework. Also, manipulation of the structure (removing atoms, adding atoms, etc) would become a bit more complicated.. If you have good ideas to do this, please do share them. I know for example ProDy and csb use a similar approach.
>
>
>Cheers,
>
>
>João
>
>
>2012/11/27 Diego Zea <diego_zea at yahoo.com.ar>
>
>http://stackoverflow.com/questions/13552916/numpy-and-biopython-must-be-integrated
>
>
>
More information about the Biopython-dev
mailing list