[Biopython] Codeml parser in Biopython?

Tue Sep 14 16:15:18 UTC 2010

Thanks Chris, 
Good to know.. I am dealing with paml results for the first time, but somehow 
thought that outputs were standard. Apparently not...
Now that I started writing my own python parser, I see that even among models of 
the same run, the text changes without any obvious reason (from 'omega' to 'w' 
etc). Indeed frustrating!
Does the Bioperl solution  include different parsers for different types of 
analysis ex The Branch analysis models, another for the Site Analysis models 
etc? It would be good o have one for all, but I am not sure this is feasible...I 
start with separate parsers and will see how it can be generalized.
Thanks, 
Anastasia

________________________________
From: Chris Fields <cjfields at illinois.edu>
To: Peter <biopython at maubp.freeserve.co.uk>
Cc: natassa <natassa_g_2000 at yahoo.com>; biopython at biopython.org
Sent: Tue, September 14, 2010 4:59:49 PM
Subject: Re: [Biopython] Codeml parser in Biopython?

On Sep 14, 2010, at 4:04 AM, Peter wrote:

> Hi Anastasia,
> 
> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>> Hi Peter,
>> 
>>> 
>>> Could you post a short example of the kind of output you are looking at?
>>> 
>> 
>> Here is an example output, but this caan differ depending on the model used
>> (there are several models for Branch, Site, BranchSite, but all are pretty
>> standard)
>> 
> 
> Thanks - that looks possible to parse, but not very easy (especially if the
> codeml output changes slightly between versions).
> 
>>> 
>>> Can you get codeml to output what you need in another format, such as NEXUS?
>>> 
>> 
>> Haven't tried that, but as you can see, this is a very verbose output and
>> NEXUS does not seem an option.
> 
> At first glance, the NEXUS format could hold a lot of that information.
> Another possibility might be phyloXML. However, you are at the mercy
> of the codeml tool and what it supports. I might be worth politely asking
> the author(s) about supporting one of these more standard formats as
> a optional output.
> 
>> Ultimately, I want to parse this to get all the information I need in a
>> tabulated file. I am still working out what exactly I need (there are 
standard
>> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the 
>>type
>> of downstram analysis). I will now work on the pypaml class and modify the
>> original code to make it more generic (it seems that it only works for Site
>> Models).
> 
> Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
> unless he agrees to re-license it we cannot include it in Biopython.
> 
>> Will let you know, was just wondering if there was already a solution.There 
is
>> one in Bioperl, but heard it is very slow and in any case, I don't understand
>> much of perl....
> 
> I don't know much Perl either ;)
> 
> Peter

Just a warning from those experienced with paml parsers (bioperl): the output is 
notoriously shifty even between minor releases (sections get reordered, etc), so 
pretty much any parse needs to accommodate that.  It's extremely frustrating.

chris