[Biopython] Best way to change the chain identifiers of a set of residues

Fri Nov 27 11:48:20 UTC 2015

Thank you all :) I have found a solution now, but I am testing some of the
suggestions you gave me :) Best,

Claudia

2015-11-26 18:38 GMT+00:00 João Rodrigues <j.p.g.l.m.rodrigues at gmail.com>:

> Also, instead of unfold_entities, use structure.get_residues() that
> returns a list (copy) of the residues that you can safely iterate and use
> detach_children() on. Again, on my phone, so it might not be entirely
> true...
>
> A qui, 26/11/2015, 18:36, João Rodrigues <j.p.g.l.m.rodrigues at gmail.com>
> escreveu:
>
>> Hi Claudia, you can use the build_peptides() function from the
>> Bio.PDB.Polypeptides module. This should give you directly the fragments of
>> the structure, based on a distance criterion. For the chain, you have to
>> create new chains.
>>
>> I'm on my phone so I can't verify it, but I'd first create a new empty
>> structure (and model), get the fragments of the parsed structure, and then
>> based on the fragment length add then to the new structure with a
>> sequential chain ID. I think this is the optimal way.
>>
>> What Jordan suggested works wonders if you want to filter a given
>> structure. If you want just to remove small fragments, you could just tag
>> these residues with a negative bfactor for example and then use this to
>> filter with Select() and PDBIO().
>>
>> Cheers,
>>
>> João
>>
>> A qui, 26/11/2015, 14:44, Claudia Millán Nebot <cmncri at ibmb.csic.es>
>> escreveu:
>>
>>> Hi Jordan, the removal of the residues is working, is the renaming of
>>> the chains that is causing trouble. I attach an example of input/output
>>> produced by the function.
>>>
>>>
>>> 2015-11-26 9:09 GMT+00:00 Jordan Willis <jwillis0720 at gmail.com>:
>>>
>>>> Also, the detach_child method should work in place, so I’m not sure why
>>>> this is not working. Can you give an example PDB?
>>>>
>>>> Jordan
>>>>
>>>> On Nov 25, 2015, at 12:16 PM, Claudia Millán Nebot <cmncri at ibmb.csic.es>
>>>> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> I am writing a function that examines a structure, and if there are
>>>> discontinuous regions that are smaller than a certain size, they will be
>>>> removed from the structure. Then, I would like to write the structure as a
>>>> pdb in which the chain identifiers are different for each discontinuous
>>>> fragment. For that purpose, I want to change the chain id of certain
>>>> residues. ¿What will be the best way to do it? Because right now it is not
>>>> working, of course, because I am iterating over something that I am trying
>>>> to change at the same time. Maybe I am missing something very obvious or
>>>> straightforward, but I do not see what will be the best way to do it...
>>>> ¿Maybe creating and empty chain and using the set_parent method?
>>>>
>>>> The current code looks like this:
>>>> def trimByContinuityLimit(pdb_file,min_size):
>>>>     parser=PDBParser()
>>>>     structure=parser.get_structure(pdb_file[:-4],pdb_file)
>>>>     residues=Selection.unfold_entities(structure,'R')
>>>>
>>>> list_id="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890"
>>>>     dictio_chainid={}
>>>>     residues_to_remove=[]
>>>>     current_listres=[]
>>>>     index=0
>>>>     for i in range(len(residues)-1):
>>>>         res1=residues[i]
>>>>         res2=residues[i+1]
>>>>         id1=res1.id
>>>>         id2=res2.id
>>>>         check=Bioinformatics.checkContinuity(res1,res2)
>>>>         #print 'check',check
>>>>         #print 'list_id[index]',list_id[index]
>>>>         if check==True:
>>>>             #print "These two residues are consecutive",res1,res2
>>>>             if id1 not in current_listres:
>>>>                 current_listres.append(id1)
>>>>             dictio_chainid[id1]=list_id[index]
>>>>             if id2 not in current_listres:
>>>>                 current_listres.append(id2)
>>>>             dictio_chainid[id2]=list_id[index]
>>>>             #print 'list_id[index]',list_id[index]
>>>>             #print 'id1,dictio_chainid[id1]',dictio_chainid[id1],id1
>>>>             #print 'id2,dictio_chainid[id2]',dictio_chainid[id2],id2
>>>>         elif check==False:
>>>>             #print "These two residues are not consecutive",res1,res2
>>>>             if id1 not in current_listres:
>>>>                current_listres.append(id1)
>>>>                dictio_chainid[id1]=list_id[index]
>>>>             if len(current_listres)<min_size:
>>>>                residues_to_remove.extend(current_listres)
>>>>             if i==len(residues)-2 and min_size>1: # If we reach this
>>>> point, then the last residue is not continuous so it is single :
>>>>                residues_to_remove.append(id2)
>>>>             else:
>>>>                current_listres=[]
>>>>                current_listres.append(id2)
>>>>                index=index+1
>>>>                dictio_chainid[id2]=list_id[index]
>>>>     # Remove the residues and write the pdb
>>>>     for model in structure:
>>>>        for chain in model:
>>>>            for residue in chain:
>>>>                id_res=residue.id
>>>>                if id_res in residues_to_remove:
>>>>                    chain.detach_child(id_res)
>>>>                else:
>>>>                    chain.id=dictio_chainid[id_res]
>>>>     io=PDBIO()
>>>>     io.set_structure(structure)
>>>>     io.save(pdb_file[:-4]+'_trimmed.pdb',write_end=False)
>>>>
>>>> Thanks in advance :)
>>>>
>>>> Claudia
>>>> _______________________________________________
>>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>>
>>>>
>>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20151127/31da9e64/attachment.html>