[Biopython-dev] [Bug 3134] to_networkx returns weird stuff

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Aug 31 01:43:17 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3134





------- Comment #3 from eric.talevich at gmail.com  2010-08-30 21:43 EST -------
(In reply to comment #2)
> thanx for the quick response!
> 
> the problem is that the standard way using pylab produces ugly squares instead
> of arrow head in the final layout.

True. Do you know a way to fix that from NetworkX/matplotlib, or is that the
whole reason you're exporting to Graphviz?


> but more importantly, I want to perform
> complex graph operations on the tree using networkx and use Bio.Phylo really
> just as a means of parsing ;-)

Great, that's what it's there for. :)


> I think that when providing a function like to_networkx, it should behave in a
> manner the user of networkx expects. Why not just use a unique hashable
> identifier like integers as standard string representation for ALL nodes, and
> use graphviz'/networkx' label attribute for any name label the node might have?

OK, but wouldn't you want to be able to retrieve all of the original clade's
data from any node in a networkx graph?

Currently, the arrangement is:

- Clade objects are the hashable object used for keys
- Given a node in a networkx graph produced by to_networkx, you can uniquely
locate that clade in the original tree using the tree.find_* methods -- it's
still a valid target, and duplicate names aren't a problem
- Other clade attributes, like taxonomy and bootstrap values, are also still
available on the node
- Serializing the graph nodes for Graphviz goes haywire, so we provide
draw_graphviz as a workaround

I think you're suggesting:

- Use id(clade) or some arbitrary unique integer as keys
- Attach the clade name, if available, to the networkx node as a label...
right? How would I do this?
- To keep other clade attributes with the node, maybe add them to the optional
dictionary associated with each node, like we already do for branch colors and
widths
- At some point, generate a lookup table to associate the graph nodes' unique
integer identifiers with the original clade objects -- or at least make this
possible through another function
- Serializing for Graphviz will work cleanly

> Using the string representation of labeled leafs as identifiers in networkx is
> also dangerous, since they will be used as identifiers in graphviz and underly
> a number of restrictions (no whitespace etc.)

Indeed, and as you've seen, the strings need to be unique. One alternative is
to mimic Python's default repr() style for representing complex classes:
'<PhyloXML.Clade instance at 0xb753b52c>'

But then, switching to the string name where clades do have the 'name'
attribute set would be inconsistent.


> I'd propose the following: in Clade, __repr__() should return the name of the
> node, if it has one, or a unique identifier like id() (the memory adress) with
> an additional "..." around them to make it a valid graphviz identifier..
> 
> def __repr__(self):
>   if self.name != None:
>     return self.name
>   else:
>     return "\""+str(id(self))+"\""

Remember that the NetworkX labels don't necessarily need to be the same as the
string representation of clades in Bio.Phylo -- it's just convenient if they
match.

So __repr__ could be:
<Clade at 0x655321>
<Clade "A. thaliana" at 0x655444>

While your function could be used to create labels in to_networkx.


Thanks for your help,
Eric


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list