[Biopython-dev] Parsing PAML supplementary output

Brandon Invergo b.invergo at gmail.com
Tue Oct 11 06:01:59 EDT 2011


> Some of those examples don't really look like PHYLIP anymore to me.
> 
> If there is any simple change to allow the current parser to cope
> with (but ignore) any extra meta data like this, that sounds sensible
> (with unit tests of course - grin).

Agreed, it can get quite messy, though look at the link I provided; even
the PHYLIP-specific example that they give includes some supplementary
info at the top, as well as a tree at the bottom:

 4   40   W					
W         0101001111 0101110101 0101110011	
	  1101010110
dmras1    GTCGTCGTTG GACCTGGAGG CGTGGGCAAG	

spras     GTAGTTGTAG GAGATGGTGG TGTTGGTAAA
scras1    GTAGTTGTCG GTGGAGGTGG CGTTGGTAAA
scras2    GTCGTCGTTG GTGGTGGTGG TGTTGGTAAA
	  TCCGCGCTCA
	  AGTGCTTTGA
	  TCTGCTTTAA
	  TCTGCTTTGA
1						
((dmras1,ddrasa),((hschras,spras),(scras1,scras2)));


I agree that trying to shoehorn that functionality into Biopython as
written would be a mess. Another option that I can think of, however,
would be to shift such extra formatting duties to the Biopython
application interface which needs them, since that's the only place
they're relevant. So I could, for example, make a PAML-specific subclass
of PhylipWriter which handles all these weird PAML-specific options. Or
if there were to be a PHYLIP interface and the program took that above
example as input, it would be the duty of the interface to write a file
with those options, the alignment and the tree all together. 
Just a thought.

For the short term, though, when I implement the sequential format, I'll
go ahead and update the code to at least handle flags in the header
line. To handle the supp. info should be straight forward, since I
believe that each supp. line must begin with the option flag that
requires the info; if the option flag exists in the header, ignore any
following lines which begin with that flag character. 

Unit tests will abound.

-brandon



More information about the Biopython-dev mailing list