[Biopython-dev] Parsing PAML supplementary output

Tue Oct 11 10:13:03 UTC 2011

On Tue, Oct 11, 2011 at 11:01 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
>
>> Some of those examples don't really look like PHYLIP anymore to me.
>>
>> If there is any simple change to allow the current parser to cope
>> with (but ignore) any extra meta data like this, that sounds sensible
>> (with unit tests of course - grin).
>
> Agreed, it can get quite messy, though look at the link I provided; even
> the PHYLIP-specific example that they give includes some supplementary
> info at the top, as well as a tree at the bottom:
>
>  4   40   W
> W         0101001111 0101110101 0101110011
>          1101010110
> dmras1    GTCGTCGTTG GACCTGGAGG CGTGGGCAAG
>
> spras     GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
> scras1    GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
> scras2    GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
>          TCCGCGCTCA
>          AGTGCTTTGA
>          TCTGCTTTAA
>          TCTGCTTTGA
> 1
> ((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));
>

I would consider that to be a meta file containing a PHYLIP
alignment and a tree, but in itself it isn't a PHYLIP alignment.

That looks like exactly the kind of issue NEXUS was designed
to solve: how to embed alignments, trees and other stuff into
a single plain text file for input into a phylogenetic tool.

Doesn't PHYLIP have an XML format these days? Trying
to parse something like that text (without a formal standard)
seems like a painful exercise and long term maintenance
headache.

Peter