[Biopython] help with confidence values on PhyloXML tree objects?

Jon Sanders jsanders at oeb.harvard.edu
Tue Dec 13 18:17:09 UTC 2011


Thanks Eric! I got the hang of the PhyloXML confidence objects now, so
that's straightened out.

Still having issues with the tree parsing. I tried throwing in extra colons
with a regex, both before and after the tip/edge label, but that didn't
change the behavior of the parser, and all the tip/edge labels were still
imported as confidence values. Poking around some documentation on the
newick format, it seems like the edge labels might be tricking the parser
into thinking there are confidence values present, since there's no clear
way to distinguish between them. I'll try playing around with supressing
the edge labels in PyCogent and see if I can't pass a decent tree to
BioPython side for proper PhyloXML output.

Ugh.

-j

On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich <eric.talevich at gmail.com>wrote:

> Hi Jon,
>
> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders <jsanders at oeb.harvard.edu>wrote:
>
>> So I have two problems.
>>
>>
>> Problem 1: when importing my newick-formatted trees, which were generated
>> in PyCogent, the terminal labels and branch labels are read in as
>> confidence values because they're numerical. So
>>
>>    ((((41:0.01494,44:0.00014)0.604:0...
>>
>> is read in with blank name='' values and 41, 44, 0.605, etc. as
>> 'confidence' values.
>>
>
> Hmm, I'll take a look at the Newick parser. I think I've used numeric
> taxon labels before without a problem, but PyCogent wasn't involved.
>
> It might work if you can coax PyCogent into writing the Newick files with
> an extra colon:
> ((((:41:0.01494,:44:0.00014):0.604:0...
>
>
>
>> Problem 2: I would like to store multiple confidence values per node, but
>> I
>> can't figure out how to do it.
>>
>> I can get the plain old 'confidence' attribute set by:
>>
>>   clade.confidence = .05
>>
>> but can't figure out how to add and set new confidence types. Any
>> suggestions?
>>
>
> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence
> class.
>
> In PhyloXML trees, the attribute "clade.confidence" is actually a Python
> property pointing to the first element of "clade.confidences", a list of
> Confidence objects. It's syntax sugar to keep compatibility with Newick,
> which just has a numeric value there.
>
> You can use it like this:
>
> from Bio.Phylo import PhyloXML
>
> # Create new Confidence instances
> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap")
> # The second argument is optional
> a_posterior_probability = PhyloXML.Confidence(0.99)
>
> # Select a clade from your tree to modify
> a_clade = mytree.clade[...]
>
> # Modify the list of Confidences directly
> a_clade.confidences.append(a_bootstrap_value)
> a_clade.confidences.append(a_posterior_probability)
>
>
> If you've assigned multiple confidence values to a clade, using the
> PhyloXML class, then the "clade.confidence" shortcut won't work anymore
> because it's not clear which confidence you mean. So you'll have to use
> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in
> PhyloXML format to preserve the extra data.
>
> Hope that helps.
>
> Best regards,
> Eric
>



-- 
"If you hold a cat by the tail you learn things you cannot learn any other
way."
                         --Mark Twain



More information about the Biopython mailing list