[Biopython-dev] Benchmarking PDBParser

Peter Cock p.j.a.cock at googlemail.com
Wed May 4 10:39:19 UTC 2011


On Wed, May 4, 2011 at 11:21 AM, João Rodrigues <anaryin at gmail.com> wrote:
> Hello all,
>
> Following a few discussions, I'm tempted to benchmark the current
> implementation of the PDBParser and see how it fares against an old
> implementation (I think I'll use 1.48 since older versions need Numerical
> Python). The main objective is to see if the recent developments have a
> significant impact in its speed.
>
> I thought of downloading the entire PDB but since it would take several
> days, I downloaded the CATH domain list instead. Those are just protein ATOM
> records, without any header, but since all modifications were essentially
> dealing with ATOM records, etc, I think it might be as valid.
>
> I'll be running tests today and tomorrow and I'll put the results up
> somewhere later on. I'm also making the scripts available so it is easy to
> benchmark it later on.
>
> Thoughts or suggestions?
>
> Cheers,
>
> João

That sounds like a good idea. While you are at it, you could try
both the strict and permissive modes - I wonder what proportion
of the current PDB has problems in the data?

Peter




More information about the Biopython-dev mailing list