[Biopython-dev] [Bug 3134] to_networkx returns weird stuff
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Aug 31 01:43:17 UTC 2010
http://bugzilla.open-bio.org/show_bug.cgi?id=3134
------- Comment #3 from eric.talevich at gmail.com 2010-08-30 21:43 EST -------
(In reply to comment #2)
> thanx for the quick response!
>
> the problem is that the standard way using pylab produces ugly squares instead
> of arrow head in the final layout.
True. Do you know a way to fix that from NetworkX/matplotlib, or is that the
whole reason you're exporting to Graphviz?
> but more importantly, I want to perform
> complex graph operations on the tree using networkx and use Bio.Phylo really
> just as a means of parsing ;-)
Great, that's what it's there for. :)
> I think that when providing a function like to_networkx, it should behave in a
> manner the user of networkx expects. Why not just use a unique hashable
> identifier like integers as standard string representation for ALL nodes, and
> use graphviz'/networkx' label attribute for any name label the node might have?
OK, but wouldn't you want to be able to retrieve all of the original clade's
data from any node in a networkx graph?
Currently, the arrangement is:
- Clade objects are the hashable object used for keys
- Given a node in a networkx graph produced by to_networkx, you can uniquely
locate that clade in the original tree using the tree.find_* methods -- it's
still a valid target, and duplicate names aren't a problem
- Other clade attributes, like taxonomy and bootstrap values, are also still
available on the node
- Serializing the graph nodes for Graphviz goes haywire, so we provide
draw_graphviz as a workaround
I think you're suggesting:
- Use id(clade) or some arbitrary unique integer as keys
- Attach the clade name, if available, to the networkx node as a label...
right? How would I do this?
- To keep other clade attributes with the node, maybe add them to the optional
dictionary associated with each node, like we already do for branch colors and
widths
- At some point, generate a lookup table to associate the graph nodes' unique
integer identifiers with the original clade objects -- or at least make this
possible through another function
- Serializing for Graphviz will work cleanly
> Using the string representation of labeled leafs as identifiers in networkx is
> also dangerous, since they will be used as identifiers in graphviz and underly
> a number of restrictions (no whitespace etc.)
Indeed, and as you've seen, the strings need to be unique. One alternative is
to mimic Python's default repr() style for representing complex classes:
'<PhyloXML.Clade instance at 0xb753b52c>'
But then, switching to the string name where clades do have the 'name'
attribute set would be inconsistent.
> I'd propose the following: in Clade, __repr__() should return the name of the
> node, if it has one, or a unique identifier like id() (the memory adress) with
> an additional "..." around them to make it a valid graphviz identifier..
>
> def __repr__(self):
> if self.name != None:
> return self.name
> else:
> return "\""+str(id(self))+"\""
Remember that the NetworkX labels don't necessarily need to be the same as the
string representation of clades in Bio.Phylo -- it's just convenient if they
match.
So __repr__ could be:
<Clade at 0x655321>
<Clade "A. thaliana" at 0x655444>
While your function could be used to create labels in to_networkx.
Thanks for your help,
Eric
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list