[Biopython-dev] Parsing PAML supplementary output
Brandon Invergo
b.invergo at gmail.com
Tue Oct 11 10:01:59 UTC 2011
> Some of those examples don't really look like PHYLIP anymore to me.
>
> If there is any simple change to allow the current parser to cope
> with (but ignore) any extra meta data like this, that sounds sensible
> (with unit tests of course - grin).
Agreed, it can get quite messy, though look at the link I provided; even
the PHYLIP-specific example that they give includes some supplementary
info at the top, as well as a tree at the bottom:
4 40 W
W 0101001111 0101110101 0101110011
1101010110
dmras1 GTCGTCGTTG GACCTGGAGG CGTGGGCAAG
spras GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
scras1 GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
scras2 GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
TCCGCGCTCA
AGTGCTTTGA
TCTGCTTTAA
TCTGCTTTGA
1
((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));
I agree that trying to shoehorn that functionality into Biopython as
written would be a mess. Another option that I can think of, however,
would be to shift such extra formatting duties to the Biopython
application interface which needs them, since that's the only place
they're relevant. So I could, for example, make a PAML-specific subclass
of PhylipWriter which handles all these weird PAML-specific options. Or
if there were to be a PHYLIP interface and the program took that above
example as input, it would be the duty of the interface to write a file
with those options, the alignment and the tree all together.
Just a thought.
For the short term, though, when I implement the sequential format, I'll
go ahead and update the code to at least handle flags in the header
line. To handle the supp. info should be straight forward, since I
believe that each supp. line must begin with the option flag that
requires the info; if the option flag exists in the header, ignore any
following lines which begin with that flag character.
Unit tests will abound.
-brandon
More information about the Biopython-dev
mailing list