[Biopython-dev] Code review request for phyloxml branch

Eric Talevich eric.talevich at gmail.com
Fri Sep 25 15:51:15 UTC 2009

Hi Jaime,

Just working on bindings would certainly be easier. The best way to transfer
tree information from Biopython to ETE would be serializing the trees in
phyloXML format (to preserve the annotations) and loading that file in ETE.
I see that ETE allows rich annotation of tree objects, but I don't see
phyloXML or NeXML listed as supported file formats -- is there another
standard format you're using to store this information? If not, I think ETE
would benefit from a phyloXML parser. Since Biopython license is
GPL-compatible (I believe), you could borrow Bio.TreeIO.PhyloXMLIO directly
and just port the Phylogeny and Clade classes to ETE's base classes instead
of Bio.Tree.BaseTree's Tree and Node classes.

Beyond that, some support for BioSQL to store sequences etc. would also help
link ETE to any of the other Bio* projects. There's some example code in
Biopython's top-level BioSQL directory, if you're interested.


On Fri, Sep 25, 2009 at 11:28 AM, Jaime Huerta Cepas <jhuerta at crg.es> wrote:

> Hi Eric,
> Thanks for your comments,
> I really see a lot of potential parts in ETE that could be used from
> biopython, however, for the moment, we would rather prefer not to modify
> current ETE's  GPL license. As far as I know, the main difference between
> GPL and BSD-like licenses is that, with the second, you could relicense the
> code at any moment under any other policy, including private and close
> licenses. GPL includes a protection for this by ensuring that any code based
> on GPL sources must be always GPL compatible, and that's why we have chosen
> it. Moreover, the use of a BSD-like license would prevent us to use a lot of
> great GPL code out there.
> It is not my purpose to open a debate about licenses. I just wonder if
> biopython could provide any way to link/bind external software, perhaps as
> addons or plugins. This would be great, since many extra features (not only
> from ETE but from other sources) could be added on specific demands. This
> would also mitigate the problem of very specific dependencies, since many of
> them would be optional. From my side, I could work for providing bindings
> between biopython and ETE's tree graphical rendering features, inline
> visualization GUI, extended newick support, tree manipulation and the
> methods within the ETE package.
> I will be out of the office for several weeks, but if you see any way to
> collaborate I will be happy to discuss this a bit more in detail...
> Cheers!
> Jaime
> On Fri, Sep 25, 2009 at 5:54 AM, Eric Talevich <eric.talevich at gmail.com>wrote:
>> Hello, Jaime,
>> Sorry I didn't respond directly to your earlier post -- I wrote half of an
>> e-mail, then realized I had no good suggestions on what to do so I scrapped
>> it.
>> My Tree and TreeIO code is basically a complete parser for the phyloXML
>> format, plus a few base classes extracted out in hopes of eventually
>> creating a unified set of format-independent objects, as in SeqIO and
>> AlignIO. Your code for working with trees looks much more complete than
>> mine, so if some of it can be incorporated into Biopython, I think that
>> would be great.
>> I see these issues with integration:
>> 1. It's GPL, while Biopython uses a more permissive custom license
>> resembling the BSD and MIT licenses. Would you be willing and able to
>> relicense parts of your work for Biopython?
>> 2. Python 2.5 dependency: Biopython still supports Py2.4, so this will
>> require some compatibility fixes -- not a huge problem.
>> 3. Scipy and numpy dependencies: Numpy is considered a semi-optional
>> dependency in Biopython, so if it can be imported on the fly by just the
>> functions that need it (hopefully no core ones), that would be best. If
>> not... we can discuss. Scipy isn't used anywhere else in Biopython yet, so
>> it would be better to make that an optional, on-the-fly import, too.
>> 4. PyQt4 is a big package and I'm not sure it's as common in scientists'
>> Python installations as numpy and scipy, so if the underlying algorithms for
>> tree layout could be ported to Reportlab, matplotlib or PIL, that would be
>> ideal. I personally would like to be able to pair sequence snippets with the
>> leaves of a standard phylogram, so if you need me to do some additional work
>> to get this section ported to Biopython, I'd consider it time well spent.
>> 5. Presumably, the tree object type in ETE is different from Bio.Tree or
>> Bio.Nexus, so porting the core tree manipulation code to Biopython would
>> require a substantial effort somewhere.
>> 6. The PhylomeDB connector is cool, and browsing the source, looks like it
>> wouldn't require much effort at all to drop into Biopython.
>> Thanks for letting us know about this.
>> Cheers,
>> Eric
>> On Thu, Sep 24, 2009 at 6:45 AM, Jaime Huerta Cepas <jhuerta at crg.es>wrote:
>>> Hi,
>>> ( I'm the developer of ETE. )
>>> I agree that PyQt4 is an important dependence. I chose it because
>>> Qt4-QGraphicsScene environment offers many possibilities like openGL
>>> rendering, unlimited image size, performance, and good bindings to python.
>>> However, I am working on my code to allow the rendering algorithm to use any
>>> other graphical library. So, you could render the same tree images using
>>> different backends. If you think this is useful for you, please let me know
>>> and we can think how to integrat it with biopython.
>>> Regarding the GUI, it is not a standalone application but one more method
>>> within the Tree objects. The GUI  can be started at any point of the
>>> execution and the main program will continue after you close it. I did it
>>> like this because I think is quite useful for working within interactive
>>> python sessions.
>>> I develop a lot of  code around tree handling, so if you think I can
>>> help, please tell me.
>>> jaime.
>>>>  > *Graphics*
>>>> > I finally fixed the networkx/graphviz/matplotlib drawing to leave
>>>> unlabeled
>>>> > nodes inconspicuous, so the resulting graphic is much cleaner, perhaps
>>>> even
>>>> > usable. Plus, the nodes are now a pretty shade of blue. Still, it
>>>> would be
>>>> > nice to have a Reportlab-based module in Bio.Graphics to print
>>>> phylogenies
>>>> > in the way biologists are used to seeing them. Does anyone know of
>>>> existing
>>>> > code that could be borrowed for this? I looked at ETE (announced on
>>>> the main
>>>> > biopython list last week) and liked the examples, but it uses PyQt4
>>>> and a
>>>> > standalone GUI for display, which is a substantial departure from the
>>>> > Biopython way of doing things.
>>>> I still haven't tracked down my old report lab code, but it wasn't
>>>> object
>>>> orientated and would need a lot of work to bring up to standard...
>>>> Peter
>>>> _______________________________________________
>>>> Biopython-dev mailing list
>>>> Biopython-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>> --
>>> =========================
>>> Jaime Huerta-Cepas, Ph.D.
>>> CRG-Centre for Genomic Regulation
>>> Doctor Aiguader, 88
>>> PRBB Building
>>> 08003 Barcelona, Spain
>>> http://www.crg.es/comparative_genomics
>>> =========================
> --
> =========================
> Jaime Huerta-Cepas, Ph.D.
> CRG-Centre for Genomic Regulation
> Doctor Aiguader, 88
> PRBB Building
> 08003 Barcelona, Spain
> http://www.crg.es/comparative_genomics
> =========================

More information about the Biopython-dev mailing list