[Biopython-dev] [biopython] Feature: Python implementation of MMCIF parser (#33)

Peter Cock p.j.a.cock at googlemail.com
Sat Apr 21 10:32:33 UTC 2012


On Saturday, April 21, 2012, Lenna Peterson wrote:

>
> > ### What it needs ###
> > Addition of PLY dependency to setup.py.
> > I'm not quite sure how to handle this, as PLY wouldn't be necessary on
> > a platform with C Python. Thoughts? Which non-CPython implementations
> > are worth testing?


Basically Jython (which we've tried to support for a while) and PyPy
(which I would like to officially support in future). Although a pure
python setup can be useful in other settings, e.g. Windows
development without the compilers otherwise needed.

However, neither of those have NumPy (yet), which we need for
the PDB module that would use the MMCIF parser.

>
> > New C module tested on Python 2.6 on Mac OS X and Debian. I hope it
> > still works on Windows.
> > On my machine, the C module processes a 30,000 line test file in 10-15
> > ms; the Python module takes ~150 ms.


That's a factor of ten slower, but still sounds fast enough perhaps
that we don't really need the C code for usability.

>
> I've started testing the PLY lexer on PyPy. NumPyPy now implements
> more functions needed by PDB; the only things I found to be missing
> are random and linalg. This eliminates Superimposer, FragmentMapper,
> and Vector.
>
> I played around with trying to spoof "import numpy" to automatically
> import numpypy (code here: https://gist.github.com/2432815) but I
> don't think that's wise yet.
>
> My last commit to this branch was a few changes to allow the MMCIF
> parser to work on NumPy. PyPy won't run `setup.py test` due to global
> numpy failure, but if I install this branch and `pypy test_MMCIF.py`,
> it passes.
>
> Anybody with more PyPy and/or package structuring experience

have thoughts?


I filed a few bugs on missing code in PyPy's NumPy re-implementation
(now called numpypy), good to hear they are getting closer to being
enough for us to run Bio.PDB on it. Thank you for exploring this.

Right now with in you shoes for MMCIF parsing I would focus on
the parser failures with certain input files - there is an open bug
on RedMine https://redmine.open-bio.org/issues/2626 and the
Issue of multiple models (Eric can probably advise here),
https://redmine.open-bio.org/issues/2943

And I must close this bug now your earlier work has been
checked in - https://redmine.open-bio.org/issues/2619

Thanks!

Peter

>



More information about the Biopython-dev mailing list