[Biopython-dev] [Wg-phyloinformatics] BioGeography update

Brad Chapman chapmanb at 50mail.com
Tue Jul 7 09:02:48 EDT 2009


Hi Stephen;

> In reference to the lagrange code, I see the concern with the  
> licensing. I think that this could be corrected however with a simple  
> rewrite when conforming to the BioPython standards. 

We can require lagrange to be installed and use imports to
grab the needed code. The other option is that y'all can explicitly
relicense a subset of the code under the Biopython license.

> I can see however  
> where the Bio.Nexus functionality might not be sufficient for tree  
> manipulation. I am not a contributor to the BioPython dev group so I  
> cannot speak to those specifics, but as a user I can see separating  
> out the tree functions from the Nexus package (and tree I/O in  
> general) as logically a phylogenetic tree structure has little to do  
> with the nexus file format. It can be somewhat awkward to deal with in  
> the current form. A more general implementation might be a Bio.Tree  
> package with I/O readers in Nexus and Newick and XML, etc.

Definitely. Eric has been discussing this with regards to the
PhyloXML project and we had been looking at other Tree
representations: in PyCogent and Thomas Mailund's Newick module.
Considering the lagrange tree model makes a lot of sense as well.
What I'd like to see is a stab at a generalized Tree object that
supports the operations you need and that the Bio.Nexus parser can
produce, exactly as you describe. Eric and Nick, what do you think
about coordinating on this?

> Just a thought and I am happy to work on the tree code in whatever  
> capacity it would be helpful to Nick.

Awesome. We're very open to generalizing the Tree representation in
Biopython. What I'm trying to avoid is having multiple Nexus/Newick
parsers; this is confusing to users and too much duplicated effort.
It sounds like we're on the same page in coming together on
something that will work for everyone.

Brad


> Take care,
> Stephen
> ==================
> Stephen A. Smith
> Postdoctoral Researcher
> NESCent: National Evolutionary Synthesis Center
> page: http://blackrim.org
> blog: http://blackrim.net/semaphoront
> sasmith at nescent.org
> 
> 
> 
> On Jul 4, 2009, at 4:11 PM, Brad Chapman wrote:
> 
> > Hi Nick;
> > Thanks much for the update. I'm cc'ing in the Biopython dev list to
> > keep everyone there in the loop as well.
> >
> >> I have worked out a number of better functions for searching xml
> >> database results, i.e. finding all elements with tags y that exist
> >> somewhere inside elements with tags x.  This is much more flexible in
> >> the event that data of interest resides at different levels of a
> >> hierarchy, which I have found in some cases.
> >
> > Awesome. Echoing what Hilmar mentioned, it would be good to step back
> > and this point and talk about integration with Biopython. A couple
> > of thoughts and suggestions along those lines:
> >
> > - You've included code from Lagrange which worries me for two
> >  reasons. First, this overlaps with existing Biopython functionality
> >  in Bio.Nexus; we want to eliminate that as it's confusing for
> >  users of the package to find different non-compatible
> >  implementations. If the existing code doesn't work for you in some
> >  way, could you flesh out those issues on the Biopython dev list so we
> >  can work to resolve them. Secondly, lagrange is licensed under the
> >  GPL so practically it is not compatible with Biopython, which is
> >  licensed much more freely.
> >
> > - You've settled on a flat system of coding with functions and no
> >  nesting inside of classes. This makes it difficult to flesh up the
> >  public API from internal functions. We could help make this more
> >  clear in a couple of ways:
> >
> >  - Organizing related functionality into classes.
> >  - Prefixing internal functions with underscrores to indicate they
> >    are not meant to be called by users.
> >  - Starting to provide some user documentation, ideally centered
> >    around use cases. Often these help provide a way to think about
> >    the usability of the code and hint at ways to improve it.
> >
> > Hope this is helpful and I'm happy to offer more specific
> > suggestions as you dig into it. Have a great 4th of July weekend,
> >
> > Brad
> >
> >
> >> Stephen Smith wrote:
> >>> These look really great. Glad the lagrange tree code is working  
> >>> out. I
> >>> am very excited for the merging of the Biopython and the lagrange  
> >>> tree
> >>> classes. More details to come.
> >>> Stephen
> >>> ==================
> >>> Stephen A. Smith
> >>> Postdoctoral Researcher
> >>> NESCent: National Evolutionary Synthesis Center
> >>> page: http://blackrim.org
> >>> blog: http://blackrim.net/semaphoront
> >>> sasmith at nescent.org
> >>>
> >>>
> >>>
> >>> On Jun 24, 2009, at 12:47 AM, Nick Matzke wrote:
> >>>
> >>>> OK, here's the latest...
> >>>>
> >>>> New functions: a bunch of stuff dealing with phylogenetic trees,  
> >>>> making
> >>>> use of the tree/node class in Stephen Smith's lagrange (GNU public
> >>>> license), which was superior to the half-baked (and not GPL) tree/ 
> >>>> node
> >>>> class I was using before GSoC started.
> >>>>
> >>>> =============
> >>>> read_ultrametric_Newick(newickstr):
> >>>> Read a Newick file into a tree object (a series of node objects  
> >>>> links to
> >>>> parent and daughter nodes), also reading node ages and node  
> >>>> labels if
> >>>> any.
> >>>>
> >>>> list_leaves(phylo_obj):
> >>>> Print out all of the leaves in above a node object
> >>>>
> >>>> treelength(node):
> >>>> Gets the total branchlength above a given node by recursively  
> >>>> adding
> >>>> through tree.
> >>>>
> >>>> phylodistance(node1, node2):
> >>>> Get the phylogenetic distance (branch length) between two nodes.
> >>>>
> >>>> get_distance_matrix(phylo_obj):
> >>>> Get a matrix of all of the pairwise distances between the tips of  
> >>>> a tree.
> >>>>
> >>>> get_mrca_array(phylo_obj):
> >>>> Get a square list of lists (array) listing the mrca of each pair of
> >>>> leaves (half-diagonal matrix)
> >>>>
> >>>> subset_tree(phylo_obj, list_to_keep):
> >>>> Given a list of tips and a tree, remove all other tips and  
> >>>> resulting
> >>>> redundant nodes to produce a new smaller tree.
> >>>>
> >>>> prune_single_desc_nodes(node):
> >>>> Follow a tree from the bottom up, pruning any nodes with only one
> >>>> descendent
> >>>>
> >>>> find_new_root(node):
> >>>> Search up tree from root and make new root at first divergence
> >>>>
> >>>> make_None_list_array(xdim, ydim):
> >>>> Make a list of lists ("array") with the specified dimensions
> >>>>
> >>>> get_PD_to_mrca(node, mrca, PD):
> >>>> Add up the phylogenetic distance from a node to the specified  
> >>>> ancestor
> >>>> (mrca).  Find mrca with find_1st_match.
> >>>>
> >>>> find_1st_match(list1, list2):
> >>>> Find the first match in two ordered lists.
> >>>>
> >>>> get_ancestors_list(node, anc_list):
> >>>> Get the list of ancestors of a given node
> >>>>
> >>>> addup_PD(node, PD):
> >>>> Adds the branchlength of the current node to the total PD measure.
> >>>>
> >>>> print_tree_outline_format(phylo_obj):
> >>>> Prints the tree out in "outline" format (daughter clades are  
> >>>> indented,
> >>>> etc.)
> >>>>
> >>>> print_Node(node, rank):
> >>>> Prints the node in question, and recursively all daughter nodes,
> >>>> maintaining rank as it goes.
> >>>>
> >>>> lagrange_disclaimer():
> >>>> Just prints lagrange citation etc. in code using lagrange  
> >>>> libraries.
> >>>> =============
> >>>>
> >>>>
> >>>>
> >>>> What's next:
> >>>>
> >>>> I'm going to spend the rest of this week following up on Brad's
> >>>> suggestions to make the code more standard, with the priority of
> >>>> figuring out how I can revise the current BioPython phylogeny  
> >>>> class, to
> >>>> resemble the better version in lagrange, so that there is a generic
> >>>> flexible phylogeny/newick parser that can be used generally as  
> >>>> well as
> >>>> by my BioGeography package specifically.
> >>>>
> >>>> updated wiki/git:
> >>>> http://biopython.org/wiki/BioGeography#June. 
> >>>> 2C_week_3:_Functions_to_read_user-specified_Newick_files_. 
> >>>> 28with_ages_and_internal_node_labels. 
> >>>> 29_and_generate_basic_summary_information.
> >>>>
> >>>> http://github.com/nmatzke/biopython/commits/Geography
> >>>>
> >>>> Cheers!
> >>>> Nick
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Nick Matzke wrote:
> >>>>> Sorry my update is slow, it is coming in a bit!  Thanks, Nick
> >>>>>
> >>>>> Brad Chapman wrote:
> >>>>>> Nick;
> >>>>>> Thanks for the update -- hope y'all are having fun at the  
> >>>>>> Evolution
> >>>>>> meeting and have managed to meet up.
> >>>>>>
> >>>>>>> Basically this week I added functions to download & parse large
> >>>>>>> numbers of records, get TaxonOccurrence gbifKeys, and search  
> >>>>>>> with
> >>>>>>> those keys.  Main functions:
> >>>>>>
> >>>>>> Good stuff. My main comment echoes a couple of things we  
> >>>>>> discussed
> >>>>>> earlier:
> >>>>>>
> >>>>>> - It is not clear to a user which functions are API functions to
> >>>>>> call and which are used internally. Prefixing the internal
> >>>>>> functions with underscores (_) and organizing these into classes
> >>>>>> will help with this.
> >>>>>>
> >>>>>> - I still noticed some tempfile writing from what we discussed  
> >>>>>> last
> >>>>>> week. If you have problems using in memory file handles let us
> >>>>>> know and we can discuss more.
> >>>>>>
> >>>>>> In general if your coding style is to get it out there and then
> >>>>>> re-factor, that is cool. But please put some time into the
> >>>>>> schedule for this so I know not to bug you before you've actually
> >>>>>> had a chance to go through things a second time. Also, it's a  
> >>>>>> good
> >>>>>> idea to do this in segments as we go along. From experience, if  
> >>>>>> you
> >>>>>> build up too much code that needs rework it becomes more mentally
> >>>>>> difficult to get into the rewriting.
> >>>>>>
> >>>>>>> An issue:
> >>>>>>>
> >>>>>>> Next week come functions to process phylogenetic trees.  I  
> >>>>>>> have had
> >>>>>>> issues with the current BioPython newick parser etc.;  
> >>>>>>> basically what
> >>>>>>> exists appears to not accept node label information which is  
> >>>>>>> required
> >>>>>>> to store e.g. branchlengths which are crucial for the sorts of  
> >>>>>>> things
> >>>>>>> I have to do in the future.  So unless there is a better  
> >>>>>>> suggestion I
> >>>>>>> plan to upload modify & upload my own tree parsing/using  
> >>>>>>> functions.  I
> >>>>>>> am open to suggestions in this matter.
> >>>>>>
> >>>>>> We do not want to introduce duplicated code for Newick tree  
> >>>>>> parsing in
> >>>>>> Biopython. This is a good opportunity to engage the development  
> >>>>>> list
> >>>>>> to help figure out how to fix the current parser to do what you
> >>>>>> need. If you are not sure how to get started, the best way is  
> >>>>>> to get
> >>>>>> together a small test file that demonstrates your problems, and  
> >>>>>> post
> >>>>>> it to the list. It would be more useful to everyone to have your
> >>>>>> fixes in the main parser.
> >>>>>>
> >>>>>> Brad
> >>>>>>
> >>>>>
> >>>>
> >>>> -- 
> >>>> ====================================================
> >>>> Nicholas J. Matzke
> >>>> Ph.D. Candidate, Graduate Student Researcher
> >>>> Huelsenbeck Lab
> >>>> Center for Theoretical Evolutionary Genomics
> >>>> 4151 VLSB (Valley Life Sciences Building)
> >>>> Department of Integrative Biology
> >>>> University of California, Berkeley
> >>>>
> >>>> Lab websites:
> >>>> http://ib.berkeley.edu/people/lab_detail.php?lab=54
> >>>> http://fisher.berkeley.edu/cteg/hlab.html
> >>>> Dept. personal page:
> >>>> http://ib.berkeley.edu/people/students/person_detail.php?person=370
> >>>> Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html
> >>>> Lab phone: 510-643-6299
> >>>> Dept. fax: 510-643-6264
> >>>> Cell phone: 510-301-0179
> >>>> Email: matzke at berkeley.edu
> >>>>
> >>>> Mailing address:
> >>>> Department of Integrative Biology
> >>>> 3060 VLSB #3140
> >>>> Berkeley, CA 94720-3140
> >>>>
> >>>> -----------------------------------------------------
> >>>> "[W]hen people thought the earth was flat, they were wrong. When  
> >>>> people
> >>>> thought the earth was spherical, they were wrong. But if you  
> >>>> think that
> >>>> thinking the earth is spherical is just as wrong as thinking the  
> >>>> earth
> >>>> is flat, then your view is wronger than both of them put together."
> >>>>
> >>>> Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical  
> >>>> Inquirer,
> >>>> 14(1), 35-44. Fall 1989.
> >>>> http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
> >>>> ====================================================
> >>>> _______________________________________________
> >>>> Wg-phyloinformatics mailing list
> >>>> Wg-phyloinformatics at nescent.org
> >>>> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics
> >>>
> >>>
> >>
> >> -- 
> >> ====================================================
> >> Nicholas J. Matzke
> >> Ph.D. Candidate, Graduate Student Researcher
> >> Huelsenbeck Lab
> >> Center for Theoretical Evolutionary Genomics
> >> 4151 VLSB (Valley Life Sciences Building)
> >> Department of Integrative Biology
> >> University of California, Berkeley
> >>
> >> Lab websites:
> >> http://ib.berkeley.edu/people/lab_detail.php?lab=54
> >> http://fisher.berkeley.edu/cteg/hlab.html
> >> Dept. personal page:
> >> http://ib.berkeley.edu/people/students/person_detail.php?person=370
> >> Lab personal page: http://fisher.berkeley.edu/cteg/members/ 
> >> matzke.html
> >> Lab phone: 510-643-6299
> >> Dept. fax: 510-643-6264
> >> Cell phone: 510-301-0179
> >> Email: matzke at berkeley.edu
> >>
> >> Mailing address:
> >> Department of Integrative Biology
> >> 3060 VLSB #3140
> >> Berkeley, CA 94720-3140
> >>
> >> -----------------------------------------------------
> >> "[W]hen people thought the earth was flat, they were wrong. When  
> >> people
> >> thought the earth was spherical, they were wrong. But if you think  
> >> that
> >> thinking the earth is spherical is just as wrong as thinking the  
> >> earth
> >> is flat, then your view is wronger than both of them put together."
> >>
> >> Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical  
> >> Inquirer,
> >> 14(1), 35-44. Fall 1989.
> >> http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
> >> ====================================================
> >> _______________________________________________
> >> Wg-phyloinformatics mailing list
> >> Wg-phyloinformatics at nescent.org
> >> https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics
> > _______________________________________________
> > Wg-phyloinformatics mailing list
> > Wg-phyloinformatics at nescent.org
> > https://lists.nescent.org/mailman/listinfo/wg-phyloinformatics
> 


More information about the Biopython-dev mailing list