[Biopython] Extracting a PDB list

Wed Feb 11 02:21:33 UTC 2015

I want to point out that this approach relies on consecutive residue
numbering as a proxy for "no chain break." I've run into cases where this
is not true - the main place this breaks down is with insertion codes.
Alternate locations may also present challenges.

A more robust method would be to check that the coordinates of sequential
CA atoms are within the expected distance.

For rolling one's own PDB parser, I'd recommend looking at the source code
of Biopython's PDB parser for the column numbers that correspond to
specific fields.

Cheers,

Lenna

On Tue, Feb 10, 2015 at 8:46 PM, David Shin <davidsshin at lbl.gov> wrote:

> Hi Patrick,
>
> You should be able to write a script to do this (shell script with some
> python or awk).
>
> Off the top of my head, for each file you would:
>
> for each file:
>    extract the lines with ^ATOM into a new file to make things easier
>    read each line into some list
>    subtract the residue number from each line from the next line in the
> list
>       if that value is > 1
>           print something ( the file name, or some flag)
>       else there are no breaks... can do something else if you want
> end
>
> The only tough parts are using spaces to separate items. If say a protein
> had 1000 residues, then the 1000 will run into the chain ID. So that's
> something to consider. Using specific column numbers would be the better
> way.
>
> That and I'm not sure about the uniformity of PDB files that are really
> old.
>
> Let me know if that helps, if not, I can maybe help out further.
>
> Dave
>
>
>
>
> On Tue, Feb 10, 2015 at 2:24 PM, João Rodrigues <
> j.p.g.l.m.rodrigues at gmail.com> wrote:
>
>> Hi,
>>
>> Without manually checking every single one, there is no such list, at
>> least that I know of. Your best bet could be to reduce your resolution as
>> low as possible, usually those structures are of very good quality.
>>
>> Cheers,
>>
>> João
>>
>> 2015-02-10 22:35 GMT+01:00 PC <patrick.cossins at inbox.com>:
>>
>>> Hi,
>>>
>>> I do know about PISCES lists but I want a list of PDB's without any
>>> chain breaks.
>>>
>>> Is there such a list or a way to obtain such a list?
>>>
>>> Thank you.
>>>
>>> ____________________________________________________________
>>> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>>> Check it out at http://www.inbox.com/earth
>>>
>>>
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>
>>
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
>
>
> --
> David Shin, Ph.D
> Lawrence Berkeley National Labs
> 1 Cyclotron Road
> MS 83-R0101
> Berkeley, CA 94720
> USA
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150210/e5d8a636/attachment.html>