[Biopython] BioPython MMCIFParser.py chain.id

Wed Jan 28 12:48:37 UTC 2015

I solved parsing the cif file to get the two chain IDs, saving them in two
different variables


*X3D PyMOL Molecule Viewer (WebGL-powered) <http://chembioscripting.hol.es>*

*ChemBioScripting | Gioacchino Riccardo Volpe*

2015-01-26 21:32 GMT+01:00 Riccardo <mitma07 at gmail.com>:

> Ok, reading the MMCIFParser.py file, I found out that "len(resseq_list)"
> is equivalent to "len(chain.get_list())".
>
> There is still to understand, using a CIF file, how to get the id for
> chains equal to the id used for PDB file, that is get "auth_asym_id"
> instead of "label_asym_id": is there a builtin option in BioPython?
>
> Thanks,
> Riccardo
>
>
> *X3D PyMOL Molecule Viewer (WebGL-powered)
> <http://chembioscripting.hol.es>*
>
> *ChemBioScripting | Gioacchino Riccardo Volpe*
>
> 2015-01-26 19:36 GMT+01:00 Riccardo <mitma07 at gmail.com>:
>
>> Hello to the BioPython mailing-list,
>> I'm using BioPython to calculate the dihedral angles in a protein
>> together with the total number of residues for each chain; I made use of
>> this construct for the total number of residues:
>>
>> *    resseq_list = []*
>> *    for residue in chain:*
>> *        #print residue*
>> *        residue_full_id = residue.get_full_id()*
>> *        #print residue_full_id*
>> *        resseq = residue_full_id[3][1]*
>> *        #print resseq*
>> *        resseq_list.extend([resseq])*
>> *    #print resseq_list*
>> *    print "\nThe first residue of chain %s is %s" % ( str(chain.id
>> <http://chain.id>), resseq_list[0] )*
>> *    print "The last residue of chain %s is %s" % ( str(chain.id
>> <http://chain.id>), resseq_list[-1] )*
>> *    print "The total number of residues into chain %s is %s\n" % (
>> str(chain.id <http://chain.id>), len(resseq_list) )*
>>
>> but the IDs for the chains differ from those shown, for example, in PyMOL.
>>
>> Trying to figure out the cause, and comparing a PDB file with a CIF for
>> the same macromolecule, I realized that the cause lies in the variables "
>> *_atom_site.label_asym_id*" and "*_atom_site.auth_asym_id*" of CIF file,
>> which correspond to columns [27:28] and [88:89] in the ATOM row of CIF file.
>>
>> Reading here <http://www.openstructure.org/docs/1.3/io/mmcif/>, and in
>> particular "*AddMMCifPDBChainTr (cif_chain_id, pdb_chain_id)*", I
>> thought that in practice the BioPython CIF parser considers "
>> *label_asym_id*" instead of "*auth_asym_id*". So I opened the file
>> *MMCIFParser.py*, and effectively I found, at line 37:
>>
>> *    chain_id_list=mmcif_dict["_atom_site.label_asym_id"]*
>>
>> I tried to replace it with:
>>
>> *    chain_id_list=mmcif_dict["_atom_site.auth_asym_id"]*
>>
>> and reloading my script, the output has been the same as the one reported
>> by PyMOL, for some test CIF files, but not for all.
>>
>> Is there an option, in BioPython, that enables the output directly in
>> that format? Eventually, it might be a good idea to implement it, as seen
>> in that web page <http://www.openstructure.org/docs/1.3/io/mmcif/>?
>> Is there also another better way to know the total number of residues for
>> each chain, such as in mine?
>>
>> Thanks a lot, and many greetings to the BioPython mailing-list: this is
>> my first time here!
>>
>> Riccardo Volpe
>>
>> *X3D PyMOL Molecule Viewer (WebGL-powered)
>> <http://chembioscripting.hol.es>*
>>
>> *ChemBioScripting | Gioacchino Riccardo Volpe*
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150128/78cc4fe7/attachment.html>