[Biopython-dev] Code review request for phyloxml branch

Jaime Huerta Cepas jhuerta at crg.es
Fri Sep 25 12:13:44 EDT 2009


Hi,


> Just working on bindings would certainly be easier. The best way to
> transfer tree information from Biopython to ETE would be serializing the
> trees in phyloXML format (to preserve the annotations) and loading that file
> in ETE. I see that ETE allows rich annotation of tree objects, but I don't
> see phyloXML or NeXML listed as supported file formats -- is there another
> standard format you're using to store this information?

Extended newick (http://www.phylosoft.org/NHX/) is the only rich format
currently supported by ETE, however only text string representation of tree
node annotations are allowed by this standard. Beyond this, you should use a
cpickle approach to save complex annotated trees. I'm certainly interested
in PhyloXML and NexML support, so, for sure, this could be a nice starting
point.

If not, I think ETE would benefit from a phyloXML parser. Since Biopython
> license is GPL-compatible (I believe), you could borrow
> Bio.TreeIO.PhyloXMLIO directly and just port the Phylogeny and Clade classes
> to ETE's base classes instead of Bio.Tree.BaseTree's Tree and Node classes.
>
I think there is no problem in using BSD license from GPL sources, the
problem would be in the other way around. Then I will take a look at your
phyloxml code to find the best way to bind both packages through phyloXML
serialization.


> Beyond that, some support for BioSQL to store sequences etc. would also
> help link ETE to any of the other Bio* projects. There's some example code
> in Biopython's top-level BioSQL directory, if you're interested.
>
Ok. I'll take a look also. Thanks.

cheers,
Jaime.



>
> Cheers,
> Eric
>
>
> On Fri, Sep 25, 2009 at 11:28 AM, Jaime Huerta Cepas <jhuerta at crg.es>wrote:
>
>> Hi Eric,
>>
>> Thanks for your comments,
>> I really see a lot of potential parts in ETE that could be used from
>> biopython, however, for the moment, we would rather prefer not to modify
>> current ETE's  GPL license. As far as I know, the main difference between
>> GPL and BSD-like licenses is that, with the second, you could relicense the
>> code at any moment under any other policy, including private and close
>> licenses. GPL includes a protection for this by ensuring that any code based
>> on GPL sources must be always GPL compatible, and that's why we have chosen
>> it. Moreover, the use of a BSD-like license would prevent us to use a lot of
>> great GPL code out there.
>>
>> It is not my purpose to open a debate about licenses. I just wonder if
>> biopython could provide any way to link/bind external software, perhaps as
>> addons or plugins. This would be great, since many extra features (not only
>> from ETE but from other sources) could be added on specific demands. This
>> would also mitigate the problem of very specific dependencies, since many of
>> them would be optional. From my side, I could work for providing bindings
>> between biopython and ETE's tree graphical rendering features, inline
>> visualization GUI, extended newick support, tree manipulation and the
>> methods within the ETE package.
>>
>> I will be out of the office for several weeks, but if you see any way to
>> collaborate I will be happy to discuss this a bit more in detail...
>>
>> Cheers!
>> Jaime
>>
>>
>> On Fri, Sep 25, 2009 at 5:54 AM, Eric Talevich <eric.talevich at gmail.com>wrote:
>>
>>> Hello, Jaime,
>>>
>>> Sorry I didn't respond directly to your earlier post -- I wrote half of
>>> an e-mail, then realized I had no good suggestions on what to do so I
>>> scrapped it.
>>>
>>> My Tree and TreeIO code is basically a complete parser for the phyloXML
>>> format, plus a few base classes extracted out in hopes of eventually
>>> creating a unified set of format-independent objects, as in SeqIO and
>>> AlignIO. Your code for working with trees looks much more complete than
>>> mine, so if some of it can be incorporated into Biopython, I think that
>>> would be great.
>>>
>>> I see these issues with integration:
>>> 1. It's GPL, while Biopython uses a more permissive custom license
>>> resembling the BSD and MIT licenses. Would you be willing and able to
>>> relicense parts of your work for Biopython?
>>>
>>> 2. Python 2.5 dependency: Biopython still supports Py2.4, so this will
>>> require some compatibility fixes -- not a huge problem.
>>>
>>> 3. Scipy and numpy dependencies: Numpy is considered a semi-optional
>>> dependency in Biopython, so if it can be imported on the fly by just the
>>> functions that need it (hopefully no core ones), that would be best. If
>>> not... we can discuss. Scipy isn't used anywhere else in Biopython yet, so
>>> it would be better to make that an optional, on-the-fly import, too.
>>>
>>> 4. PyQt4 is a big package and I'm not sure it's as common in scientists'
>>> Python installations as numpy and scipy, so if the underlying algorithms for
>>> tree layout could be ported to Reportlab, matplotlib or PIL, that would be
>>> ideal. I personally would like to be able to pair sequence snippets with the
>>> leaves of a standard phylogram, so if you need me to do some additional work
>>> to get this section ported to Biopython, I'd consider it time well spent.
>>>
>>> 5. Presumably, the tree object type in ETE is different from Bio.Tree or
>>> Bio.Nexus, so porting the core tree manipulation code to Biopython would
>>> require a substantial effort somewhere.
>>>
>>> 6. The PhylomeDB connector is cool, and browsing the source, looks like
>>> it wouldn't require much effort at all to drop into Biopython.
>>>
>>> Thanks for letting us know about this.
>>>
>>> Cheers,
>>> Eric
>>>
>>>
>>>
>>> On Thu, Sep 24, 2009 at 6:45 AM, Jaime Huerta Cepas <jhuerta at crg.es>wrote:
>>>
>>>> Hi,
>>>>
>>>> ( I'm the developer of ETE. )
>>>> I agree that PyQt4 is an important dependence. I chose it because
>>>> Qt4-QGraphicsScene environment offers many possibilities like openGL
>>>> rendering, unlimited image size, performance, and good bindings to python.
>>>> However, I am working on my code to allow the rendering algorithm to use any
>>>> other graphical library. So, you could render the same tree images using
>>>> different backends. If you think this is useful for you, please let me know
>>>> and we can think how to integrat it with biopython.
>>>> Regarding the GUI, it is not a standalone application but one more
>>>> method within the Tree objects. The GUI  can be started at any point of the
>>>> execution and the main program will continue after you close it. I did it
>>>> like this because I think is quite useful for working within interactive
>>>> python sessions.
>>>>
>>>> I develop a lot of  code around tree handling, so if you think I can
>>>> help, please tell me.
>>>> jaime.
>>>>
>>>>
>>>>
>>>>>  > *Graphics*
>>>>> > I finally fixed the networkx/graphviz/matplotlib drawing to leave
>>>>> unlabeled
>>>>> > nodes inconspicuous, so the resulting graphic is much cleaner,
>>>>> perhaps even
>>>>> > usable. Plus, the nodes are now a pretty shade of blue. Still, it
>>>>> would be
>>>>> > nice to have a Reportlab-based module in Bio.Graphics to print
>>>>> phylogenies
>>>>> > in the way biologists are used to seeing them. Does anyone know of
>>>>> existing
>>>>> > code that could be borrowed for this? I looked at ETE (announced on
>>>>> the main
>>>>> > biopython list last week) and liked the examples, but it uses PyQt4
>>>>> and a
>>>>> > standalone GUI for display, which is a substantial departure from the
>>>>> > Biopython way of doing things.
>>>>>
>>>>> I still haven't tracked down my old report lab code, but it wasn't
>>>>> object
>>>>> orientated and would need a lot of work to bring up to standard...
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Peter
>>>>>
>>>>> _______________________________________________
>>>>> Biopython-dev mailing list
>>>>> Biopython-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> =========================
>>>> Jaime Huerta-Cepas, Ph.D.
>>>> CRG-Centre for Genomic Regulation
>>>> Doctor Aiguader, 88
>>>> PRBB Building
>>>> 08003 Barcelona, Spain
>>>> http://www.crg.es/comparative_genomics
>>>> =========================
>>>>
>>>>
>>>
>>
>>
>> --
>> =========================
>> Jaime Huerta-Cepas, Ph.D.
>> CRG-Centre for Genomic Regulation
>> Doctor Aiguader, 88
>> PRBB Building
>> 08003 Barcelona, Spain
>> http://www.crg.es/comparative_genomics
>> =========================
>>
>>
>


-- 
=========================
Jaime Huerta-Cepas, Ph.D.
CRG-Centre for Genomic Regulation
Doctor Aiguader, 88
PRBB Building
08003 Barcelona, Spain
http://www.crg.es/comparative_genomics
=========================


More information about the Biopython-dev mailing list