[Biopython] help with confidence values on PhyloXML tree objects?
Jon Sanders
jsanders at oeb.harvard.edu
Tue Dec 13 13:55:20 EST 2011
Update: yup, seems to be a problem with numeric tip names.
1) getting rid of internal edge name doesn't help
2) appending 'a' to tip names fixes it
3) this tree: (((1,2),(3,4)),5); loads the numeric tip names as branch
lengths
4) this tree: (((1:0.01,2:0.01):0.01,(3:0.01,4:0.01):0.01):0.01,5:0.03);
loads the numeric tip names as confidence values and branch lenghts
correctly
I might try poking around the parser too, although my python foo has little
bar.
-j
On Tue, Dec 13, 2011 at 1:17 PM, Jon Sanders <jsanders at oeb.harvard.edu>wrote:
> Thanks Eric! I got the hang of the PhyloXML confidence objects now, so
> that's straightened out.
>
> Still having issues with the tree parsing. I tried throwing in extra
> colons with a regex, both before and after the tip/edge label, but that
> didn't change the behavior of the parser, and all the tip/edge labels were
> still imported as confidence values. Poking around some documentation on
> the newick format, it seems like the edge labels might be tricking the
> parser into thinking there are confidence values present, since there's no
> clear way to distinguish between them. I'll try playing around with
> supressing the edge labels in PyCogent and see if I can't pass a decent
> tree to BioPython side for proper PhyloXML output.
>
> Ugh.
>
> -j
>
>
> On Fri, Dec 9, 2011 at 6:26 PM, Eric Talevich <eric.talevich at gmail.com>wrote:
>
>> Hi Jon,
>>
>> On Fri, Dec 9, 2011 at 4:53 PM, Jon Sanders <jsanders at oeb.harvard.edu>wrote:
>>
>>> So I have two problems.
>>>
>>>
>>> Problem 1: when importing my newick-formatted trees, which were generated
>>> in PyCogent, the terminal labels and branch labels are read in as
>>> confidence values because they're numerical. So
>>>
>>> ((((41:0.01494,44:0.00014)0.604:0...
>>>
>>> is read in with blank name='' values and 41, 44, 0.605, etc. as
>>> 'confidence' values.
>>>
>>
>> Hmm, I'll take a look at the Newick parser. I think I've used numeric
>> taxon labels before without a problem, but PyCogent wasn't involved.
>>
>> It might work if you can coax PyCogent into writing the Newick files with
>> an extra colon:
>> ((((:41:0.01494,:44:0.00014):0.604:0...
>>
>>
>>
>>> Problem 2: I would like to store multiple confidence values per node,
>>> but I
>>> can't figure out how to do it.
>>>
>>> I can get the plain old 'confidence' attribute set by:
>>>
>>> clade.confidence = .05
>>>
>>> but can't figure out how to add and set new confidence types. Any
>>> suggestions?
>>>
>>
>> The confidence types are instances of the Bio.Phylo.PhyloXML.Confidence
>> class.
>>
>> In PhyloXML trees, the attribute "clade.confidence" is actually a Python
>> property pointing to the first element of "clade.confidences", a list of
>> Confidence objects. It's syntax sugar to keep compatibility with Newick,
>> which just has a numeric value there.
>>
>> You can use it like this:
>>
>> from Bio.Phylo import PhyloXML
>>
>> # Create new Confidence instances
>> a_bootstrap_value = PhyloXML.Confidence(83, type="bootstrap")
>> # The second argument is optional
>> a_posterior_probability = PhyloXML.Confidence(0.99)
>>
>> # Select a clade from your tree to modify
>> a_clade = mytree.clade[...]
>>
>> # Modify the list of Confidences directly
>> a_clade.confidences.append(a_bootstrap_value)
>> a_clade.confidences.append(a_posterior_probability)
>>
>>
>> If you've assigned multiple confidence values to a clade, using the
>> PhyloXML class, then the "clade.confidence" shortcut won't work anymore
>> because it's not clear which confidence you mean. So you'll have to use
>> e.g. clade.confidences[0] or clade.confidences[1], and save it the tree in
>> PhyloXML format to preserve the extra data.
>>
>> Hope that helps.
>>
>> Best regards,
>> Eric
>>
>
>
>
> --
> "If you hold a cat by the tail you learn things you cannot learn any other
> way."
> --Mark Twain
>
>
--
"If you hold a cat by the tail you learn things you cannot learn any other
way."
--Mark Twain
More information about the Biopython
mailing list