[Biopython-dev] Parsing PAML supplementary output

Brandon Invergo b.invergo at gmail.com
Tue Oct 11 07:51:26 UTC 2011

> If you can extend the current PHYLIP parser (strict or relaxed)
> to cover interleaved and sequential, that would be nice. For
> strict mode at least, we can in principle follow whatever the
> original PHYLIP tools do to detect this automatically. It may
> be safer to make it explicit though - from what I recall without
> seeing the PHYLIP implementation's source code it was not
> obvious how to do this reliably.
I checked out the PHYLIP code and yes it's not really obvious how the
mode is detected. In fact, it seems that many of the programs ask for
user input to specify the format of the alignment.

So, regarding making it explicit, I'm not sure if this is what you meant
but I was thinking it might be simplest to add another Iterator/Writer
pair in the PhylipIO module for SequentialPhylip which inherit from the
basic Phylip classes, overriding the next() method in the iterator and
the write_alignment() method in the writer, much in the way that the
RelaxedPhylip classes work. 

This would mean that there would be no flexibility in the naming rules
(ie relaxed vs strict) for the SequentialPhylip format, unless I were to
also make a RelaxedSequentialPhylip pair of classes. PAML relaxes the
sequence name length restriction to 30 characters and since the whole
reason for embarking on this exercise was to support PAML's output of
PHYLIP alignments, if only one naming convention is to be implemented I
think it would be best to default to the relaxed rules.

Slightly unrelated musings: I was thinking that with Biopython's support
for reading PHYLIP alignments and Newick trees into objects, at some
point it would probably be convenient to make the Bio.Phylo.PAML package
more integrated by allowing the user to pass in such objects as
arguments rather than writing them to files first; the PAML module could
write them to temp files itself. I think some minor changes might have
to be made in places (ie for PAML to accept interleaved alignments, the
header line must contain an 'I' flag after the seq # and seq len
integers) and I'd have to think about how best to allow passing such
objects while still retaining the ability to specify filenames without
using kludgy, non-pythonic type-checking. Anyway, another task for
another day, but I thought I'd throw it out there.


More information about the Biopython-dev mailing list