[Biopython-dev] pypaml

Fri Jan 14 08:36:48 EST 2011

Hi Peter,
Thanks for the welcome!

> Assuming you wrote all the code (or have your co-authors agreement),
> then yes, you can just change the licence. If you want to you can
> update the code in your repository and website, maybe make a new
> release while you are at it. Alternatively, you could just leave the
> standalone pypaml code as it is (under the GPL), but base your
> Biopython contributions on it (under the Biopython MIT/BSD licence).

I wrote all the code myself so changing it shouldn't be a problem. I
tend to license tools with the GPL by habit but I'm not opposed to
relicensing it.

> I would suggest that you don't make API changes to standalone
> pypaml, so as not to disrupt your existing users. However some of
> the work like Python 2.5 support might be worth doing there (before
> looking at Biopython integration). As a bonus, that should also mean
> you can use pypaml under Jython (Python on the JVM).
>
>> - check coding standards as described in the Contributing to Biopython wiki
>> - make some changes to be compatible with Python 2.5: I use @property
>> and @x.setter decorator tags which are only 2.6+. I think that's the
>> only incompatability
>
> If so that doesn't sound too hard to update.

I think, as it stands, the CODEML api is complete so no real changes
need to be made there. As for the decorators, that was actually added
in the last commit I made, so rolling back is quite simple.

>> - double-check the CODEML output parsing for many PAML versions; the
>> output is notoriously non-standard from release to release. I may have
>> to build some version-checking into the parser. I wrote it based on
>> the output of PAML 4.3
>
> From Chris Field's comments last year, that may be a lot of work for
> relatively little gain. I don't use PAML and have no idea what versions
> are typically used though.
> http://lists.open-bio.org/pipermail/biopython/2010-September/006760.html

I would suggest that we don't support very old versions. Perhaps from
4.x up (currently it's at 4.4c). Most of the parsing is done via
regular expressions, so changes in the order of the outputs shouldn't
matter. Changes in the wording will. This is something to work on.

>> - build some unit tests (I'm new to this in Python so I need to learn
>> a bit about that
>
> We've tried to cover the basics in a chapter in our tutorial,
> http://biopython.org/DIST/docs/tutorial/Tutorial.html
> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf

Thanks I'll check them out

> Does that mean you have wrappers for calling the PAML command
> line tools? Can you point me at the code for that - I'd like a quick
> look to see if it makes sense to switch over to the Bio.Application
> based system we're trying to standardise on in Biopython. On the
> other hand, if you have a much higher level wrapper maybe it is
> fine as it is (e.g. the Bio.PopGen wrappers follow their own route,
> although they use Bio.Application for the low level API inside).

I use the subprocess library of python to call the command line tool.
PAML programs work by calling the tool with a control file as its
argument. The control file specifies all of the run arguments,
including the data files, output files, and other variables.
Basically, pypaml works by dynamically building a control file via
properties for the data files and a dictionary for the other
variables, running the command line tool with that control file as its
parameter, and then grabbing the output file, parsing it and storing
the results in a dictionary object.

The run() function, line 217, does this:
http://code.google.com/p/pypaml/source/browse/trunk/src/pypaml/codeml.py
with the actual subprocess call happening at 239/241 (verbose/silent).

So, much of the code is dedicated to building the control file and
parsing the output. I'm not as familiar with the other PAML programs,
but a look through the manual indicates that they operate in a similar
manner. (sorry that the code isn't fully commented yet)

Ok, well, time to get cracking then. I'll add the Bugzilla item and
make some changes in the standalone. I'll then inform the dev-list
when things are in better condition for integration!

Cheers,
Brandon

>> So, as I understand it, I should file an enhancement bug over at the
>> Bugzilla site.
>
> That would be useful to give us a reference number for tracking it.
> A lot of your email would make a good introduction to the issue to
> put in the comment.
>
>> In the meantime I can start working on some of the
>> points listed above. I also need to refresh my memory of using git
>> since I've gotten in the dirty habit of using svn (assuming this is
>> all approved)! Is there anything else I need to do for now?
>
> Doing your work on a github fork of the Biopython repository
> would be great (although you may want to start with adding unit
> tests or doing Python 2.5 changes within standalone pypaml).
>
> Peter.
>