[Biopython] BioPython MMCIFParser.py chain.id

Riccardo mitma07 at gmail.com
Mon Jan 26 18:36:09 UTC 2015


Hello to the BioPython mailing-list,
I'm using BioPython to calculate the dihedral angles in a protein together
with the total number of residues for each chain; I made use of this
construct for the total number of residues:

*    resseq_list = []*
*    for residue in chain:*
*        #print residue*
*        residue_full_id = residue.get_full_id()*
*        #print residue_full_id*
*        resseq = residue_full_id[3][1]*
*        #print resseq*
*        resseq_list.extend([resseq])*
*    #print resseq_list*
*    print "\nThe first residue of chain %s is %s" % ( str(chain.id
<http://chain.id>), resseq_list[0] )*
*    print "The last residue of chain %s is %s" % ( str(chain.id
<http://chain.id>), resseq_list[-1] )*
*    print "The total number of residues into chain %s is %s\n" % (
str(chain.id <http://chain.id>), len(resseq_list) )*

but the IDs for the chains differ from those shown, for example, in PyMOL.

Trying to figure out the cause, and comparing a PDB file with a CIF for the
same macromolecule, I realized that the cause lies in the variables "
*_atom_site.label_asym_id*" and "*_atom_site.auth_asym_id*" of CIF file,
which correspond to columns [27:28] and [88:89] in the ATOM row of CIF file.

Reading here <http://www.openstructure.org/docs/1.3/io/mmcif/>, and in
particular "*AddMMCifPDBChainTr (cif_chain_id, pdb_chain_id)*", I thought
that in practice the BioPython CIF parser considers "*label_asym_id*"
instead of "*auth_asym_id*". So I opened the file *MMCIFParser.py*, and
effectively I found, at line 37:

*    chain_id_list=mmcif_dict["_atom_site.label_asym_id"]*

I tried to replace it with:

*    chain_id_list=mmcif_dict["_atom_site.auth_asym_id"]*

and reloading my script, the output has been the same as the one reported
by PyMOL, for some test CIF files, but not for all.

Is there an option, in BioPython, that enables the output directly in that
format? Eventually, it might be a good idea to implement it, as seen in that
web page <http://www.openstructure.org/docs/1.3/io/mmcif/>?
Is there also another better way to know the total number of residues for
each chain, such as in mine?

Thanks a lot, and many greetings to the BioPython mailing-list: this is my
first time here!

Riccardo Volpe

*X3D PyMOL Molecule Viewer (WebGL-powered) <http://chembioscripting.hol.es>*

*ChemBioScripting | Gioacchino Riccardo Volpe*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150126/68e017e8/attachment.html>


More information about the Biopython mailing list