[Biopython] Extracting a PDB list

Fred Ludlow fred.ludlow at gmail.com
Wed Feb 11 10:06:08 UTC 2015


Doesn't the biopython polypeptide builder handle this sort of thing? (I
think it uses CA distances).

e.g. for PDB entry 3beg (which has a chain break in both chain A and chain
B):

Something like this:

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder

structure = PDBParser().get_structure('3beg', '/path/to/3beg.pdb')
ppb = PPBuilder()
for model in structure:
    for chain in model:
        pp = ppb.build_peptides(chain)
        if len(pp) > 1:
            # Do something!
            print c, len(pp)

This prints
<Chain id=A> 2
<Chain id=B> 2


I'd take a look at the PPBuilder source first just to check it's really
doing what you want, but if it's not quite right you can probably write
your own subclass that does what you want.

Cheers,

Fred

ps I'd avoid writing your own parser unless there's a good reason for not
using the biopython (or another) one. The PDB File format has lots of
quirks as Lenna pointed out



On 11 February 2015 at 02:21, Lenna Peterson <arklenna at gmail.com> wrote:

> I want to point out that this approach relies on consecutive residue
> numbering as a proxy for "no chain break." I've run into cases where this
> is not true - the main place this breaks down is with insertion codes.
> Alternate locations may also present challenges.
>
> A more robust method would be to check that the coordinates of sequential
> CA atoms are within the expected distance.
>
> For rolling one's own PDB parser, I'd recommend looking at the source code
> of Biopython's PDB parser for the column numbers that correspond to
> specific fields.
>
> Cheers,
>
> Lenna
>
> On Tue, Feb 10, 2015 at 8:46 PM, David Shin <davidsshin at lbl.gov> wrote:
>
>> Hi Patrick,
>>
>> You should be able to write a script to do this (shell script with some
>> python or awk).
>>
>> Off the top of my head, for each file you would:
>>
>> for each file:
>>    extract the lines with ^ATOM into a new file to make things easier
>>    read each line into some list
>>    subtract the residue number from each line from the next line in the
>> list
>>       if that value is > 1
>>           print something ( the file name, or some flag)
>>       else there are no breaks... can do something else if you want
>> end
>>
>> The only tough parts are using spaces to separate items. If say a protein
>> had 1000 residues, then the 1000 will run into the chain ID. So that's
>> something to consider. Using specific column numbers would be the better
>> way.
>>
>> That and I'm not sure about the uniformity of PDB files that are really
>> old.
>>
>> Let me know if that helps, if not, I can maybe help out further.
>>
>> Dave
>>
>>
>>
>>
>> On Tue, Feb 10, 2015 at 2:24 PM, João Rodrigues <
>> j.p.g.l.m.rodrigues at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Without manually checking every single one, there is no such list, at
>>> least that I know of. Your best bet could be to reduce your resolution as
>>> low as possible, usually those structures are of very good quality.
>>>
>>> Cheers,
>>>
>>> João
>>>
>>> 2015-02-10 22:35 GMT+01:00 PC <patrick.cossins at inbox.com>:
>>>
>>>> Hi,
>>>>
>>>> I do know about PISCES lists but I want a list of PDB's without any
>>>> chain breaks.
>>>>
>>>> Is there such a list or a way to obtain such a list?
>>>>
>>>> Thank you.
>>>>
>>>> ____________________________________________________________
>>>> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>>>> Check it out at http://www.inbox.com/earth
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>>
>>>
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>
>>
>>
>>
>> --
>> David Shin, Ph.D
>> Lawrence Berkeley National Labs
>> 1 Cyclotron Road
>> MS 83-R0101
>> Berkeley, CA 94720
>> USA
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150211/67e34068/attachment.html>


More information about the Biopython mailing list