[Biopython] Codeml parser in Biopython?

Chris Fields cjfields at illinois.edu
Tue Sep 14 10:59:49 EDT 2010


On Sep 14, 2010, at 4:04 AM, Peter wrote:

> Hi Anastasia,
> 
> On Tue, Sep 14, 2010 at 9:02 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>> Hi Peter,
>> 
>>> 
>>> Could you post a short example of the kind of output you are looking at?
>>> 
>> 
>> Here is an example output, but this caan differ depending on the model used
>> (there are several models for Branch, Site, BranchSite, but all are pretty
>> standard)
>> 
> 
> Thanks - that looks possible to parse, but not very easy (especially if the
> codeml output changes slightly between versions).
> 
>>> 
>>> Can you get codeml to output what you need in another format, such as NEXUS?
>>> 
>> 
>> Haven't tried that, but as you can see, this is a very verbose output and
>> NEXUS does not seem an option.
> 
> At first glance, the NEXUS format could hold a lot of that information.
> Another possibility might be phyloXML. However, you are at the mercy
> of the codeml tool and what it supports. I might be worth politely asking
> the author(s) about supporting one of these more standard formats as
> a optional output.
> 
>> Ultimately, I want to parse this to get all the information I need in a
>> tabulated file. I am still working out what exactly I need (there are standard
>> values to get out, as LnL, branch length, Dn/Ds, but it also depends on the type
>> of downstram analysis). I will now work on the pypaml class and modify the
>> original code to make it more generic (it seems that it only works for Site
>> Models).
> 
> Note that Ziheng Yang's pypaml code is licensed under the GPL v3, so
> unless he agrees to re-license it we cannot include it in Biopython.
> 
>> Will let you know, was just wondering if there was already a solution.There is
>> one in Bioperl, but heard it is very slow and in any case, I don't understand
>> much of perl....
> 
> I don't know much Perl either ;)
> 
> Peter

Just a warning from those experienced with paml parsers (bioperl): the output is notoriously shifty even between minor releases (sections get reordered, etc), so pretty much any parse needs to accommodate that.  It's extremely frustrating.

chris




More information about the Biopython mailing list