[Biopython-dev] New Newick parser in Bio.Phylo
Ben Morris
ben at bendmorris.com
Mon Feb 11 02:39:24 UTC 2013
On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
> Hi Ben,
>
> I've noticed a couple new characteristics of the Newick parser that I had
> questions about.
>
> 1. There is no longer a way to tell the parser to treat internal node labels
> as confidence values. Lots of files in the wild do record the support values
> here, including those generated by RAxML, PhyML, FastTree and MrBayes, so
> I'd like to restore this option, and perhaps make it the default. I think
> the condition is:
>
> if not (self.values_are_confidence or self.comments_are_confidence or
> current_clade.is_terminal()): # parse confidence from node label
>
> Is there an easy way to add this option to the parser? I'm trying to get
> this to work in the "else" clause in parse_tree, where unquoted node labels
> are handled.
>
>
> 2. Confidence values are required to be between 0.0 and 1.0. Also, support
> values recorded as integers are treated as percentages and divided by 100
> automatically. The phyloXML spec doesn't have this range requirement. RAxML
> scales bootstraps to 100, but PhyML records the raw number of supporting
> bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap
> replicates). So, I'd prefer to leave the confidence values as they are,
> requiring only that they be numeric. Thoughts?
>
>
> Thanks,
> Eric
1. One issue is that current_clade.is_terminal() will always be true
at that point because current_clade's children haven't been parsed
yet. Putting the check in the "process_clade" function (which is
called when the closing paren is hit, and therefore all children
should have been parsed) should fix this.
So, if values_are_confidence and comments_are_confidence are both
false and a node label is numeric, it should be treated as confidence,
and clade.name should be set to None - is that correct?
2. This should be as simple as removing current lines 123-127.
~Ben
More information about the Biopython-dev
mailing list