[Biopython-dev] Benchmarking PDBParser

João Rodrigues anaryin at gmail.com
Fri May 6 07:45:53 UTC 2011


Hello all,

I'd love to come with results but I ran into some problems. The parser is
consuming too much memory after a while (>2GB) and I can't get reliable
timings then because of swapping.. Therefore, I'll just take a random sample
of 8000 structures and use it as a benchmark.

I'll post the results today, shall I put it up on the wiki? This could be an
interesting thing to post for both users and future developments.

Best,

João [...] Rodrigues
http://nmr.chem.uu.nl/~joao



On Wed, May 4, 2011 at 3:57 PM, João Rodrigues <anaryin at gmail.com> wrote:

> Hey Chad,
>
> That's exactly what I ended up doing and it is done ;) Pretty quick, I was
> hoping for a day or so!
>
> Best,
>
>
> João [...] Rodrigues
> http://nmr.chem.uu.nl/~joao
>
>
>
> On Wed, May 4, 2011 at 3:55 PM, Chad Davis <chad.a.davis at gmail.com> wrote:
>
>> I'd be very interested in this as well.
>> I'm working on some modifications (in the alpha stages still) to the
>> BioPerl PDB parser (based on the Perl Data Language, analogous to
>> NumPy) and would be interested to compare all of them (BioPython old
>> and new, BioPerl old and new).
>>
>> In my experience, downloading the PDB, just the divided structures,
>> works best with rsync, and I believe it should only take several
>> hours, not several days, the first time. It should be as easy as:
>>
>> rsync -a rsync.wwpdb.org::ftp_data/structures/divided/pdb/ ./pdb
>>
>> Other options:
>> http://www.wwpdb.org/downloads.html
>>
>> Chad
>>
>>
>> On Wed, May 4, 2011 at 15:23, João Rodrigues <anaryin at gmail.com> wrote:
>> > Just a word of advice. I tried to download the whole PDB with PDBList.py
>> and
>> > I ran into an error. Their server shut me down due to too many
>> connections.
>> > Perhaps adding an exception catcher like the one we have for NCBI
>> servers
>> > would be useful?
>> >
>> > Preliminary results show some degradation of speed..
>> >
>> > ==> benchmark_CATH-biopython_149.time <==
>> > Total time spent: 530.686s
>> > Average time per structure: 46.839ms
>> >
>> > ==> benchmark_CATH-biopython_current.time <==
>> > Total time spent: 686.176s
>> > Average time per structure: 60.563ms
>> >
>> > I'll write a full summary when I finish downloading the PDB and testing
>> it.
>> > _______________________________________________
>> > Biopython-dev mailing list
>> > Biopython-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
>> >
>>
>
>




More information about the Biopython-dev mailing list