[Biopython-dev] pypaml

Fri Jan 14 13:50:08 UTC 2011

On Fri, Jan 14, 2011 at 1:36 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi Peter,
> Thanks for the welcome!
>
>> Assuming you wrote all the code (or have your co-authors agreement),
>> then yes, you can just change the licence. If you want to you can
>> update the code in your repository and website, maybe make a new
>> release while you are at it. Alternatively, you could just leave the
>> standalone pypaml code as it is (under the GPL), but base your
>> Biopython contributions on it (under the Biopython MIT/BSD licence).
>
> I wrote all the code myself so changing it shouldn't be a problem. I
> tend to license tools with the GPL by habit but I'm not opposed to
> relicensing it.

For standalone projects I also like the GPL, but for libraries LGPL
is better. However, in the scientific Python community people have
generally followed the Python licence convention and gone with
the more flexible MIT/BSD style licence.

>> I would suggest that you don't make API changes to standalone
>> pypaml, so as not to disrupt your existing users. However some of
>> the work like Python 2.5 support might be worth doing there (before
>> looking at Biopython integration). As a bonus, that should also mean
>> you can use pypaml under Jython (Python on the JVM).
>>
>>> - check coding standards as described in the Contributing to Biopython wiki
>>> - make some changes to be compatible with Python 2.5: I use @property
>>> and @x.setter decorator tags which are only 2.6+. I think that's the
>>> only incompatability
>>
>> If so that doesn't sound too hard to update.
>
> I think, as it stands, the CODEML api is complete so no real changes
> need to be made there. As for the decorators, that was actually added
> in the last commit I made, so rolling back is quite simple.

You can of course define properties, setters, getters etc without using
decorators (this is what we do in Biopython).

>>> - double-check the CODEML output parsing for many PAML versions; the
>>> output is notoriously non-standard from release to release. I may have
>>> to build some version-checking into the parser. I wrote it based on
>>> the output of PAML 4.3
>>
>> From Chris Field's comments last year, that may be a lot of work for
>> relatively little gain. I don't use PAML and have no idea what versions
>> are typically used though.
>> http://lists.open-bio.org/pipermail/biopython/2010-September/006760.html
>
> I would suggest that we don't support very old versions. Perhaps from
> 4.x up (currently it's at 4.4c). Most of the parsing is done via
> regular expressions, so changes in the order of the outputs shouldn't
> matter. Changes in the wording will. This is something to work on.

You may be able to get some comments from any PAML users on the
main Biopython discussion list to guide you here.

>>> - build some unit tests (I'm new to this in Python so I need to learn
>>> a bit about that
>>
>> We've tried to cover the basics in a chapter in our tutorial,
>> http://biopython.org/DIST/docs/tutorial/Tutorial.html
>> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
>
> Thanks I'll check them out
>
>> Does that mean you have wrappers for calling the PAML command
>> line tools? Can you point me at the code for that - I'd like a quick
>> look to see if it makes sense to switch over to the Bio.Application
>> based system we're trying to standardise on in Biopython. On the
>> other hand, if you have a much higher level wrapper maybe it is
>> fine as it is (e.g. the Bio.PopGen wrappers follow their own route,
>> although they use Bio.Application for the low level API inside).
>
> I use the subprocess library of python to call the command line tool.
> PAML programs work by calling the tool with a control file as its
> argument. The control file specifies all of the run arguments,
> including the data files, output files, and other variables.
> Basically, pypaml works by dynamically building a control file via
> properties for the data files and a dictionary for the other
> variables, running the command line tool with that control file as its
> parameter, and then grabbing the output file, parsing it and storing
> the results in a dictionary object.
>
> The run() function, line 217, does this:
> http://code.google.com/p/pypaml/source/browse/trunk/src/pypaml/codeml.py
> with the actual subprocess call happening at 239/241 (verbose/silent).
>
> So, much of the code is dedicated to building the control file and
> parsing the output. I'm not as familiar with the other PAML programs,
> but a look through the manual indicates that they operate in a similar
> manner. (sorry that the code isn't fully commented yet)

Having looked at that briefly, since this is a command line tool
driven by a configuration input file, rather than command line
switches and arguments, I see no reason to bother with using
our Bio.Application framework.

By the way, have you ever tried using this under Windows?

> Ok, well, time to get cracking then. I'll add the Bugzilla item and
> make some changes in the standalone. I'll then inform the dev-list
> when things are in better condition for integration!

That sounds like a plan.

Peter