From idoerg at gmail.com Mon Jan 6 14:56:17 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 6 Jan 2014 14:56:17 -0500 Subject: [Biopython-dev] OBO parser & DAG Message-ID: Hi all, Is there any effort going on for developing the OBO parser & Bio-ontology DAG? If not, my lab wants to push this. We already have a basic representation using digraph from networkx, and a basic OBO parser. But i'm checking to see if there is no duplicate effort here. All very initial development. Parser: https://github.com/idoerg/go-parser (the relevan module is https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py DAG: https://github.com/osamajomaa/DAGON Thanks, Iddo -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From bartek at rezolwenta.eu.org Mon Jan 6 16:59:28 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 6 Jan 2014 22:59:28 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References: Message-ID: Hi Iddo, My student has also recently implemented a module for ontologies. Maybe we can somehow merge these efforts. His code can be found here: https://github.com/tosterovic/biopython the relevant part is Bio/Ontology best Bartek On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: > Hi all, > > Is there any effort going on for developing the OBO parser & Bio-ontology > DAG? If not, my lab wants to push this. We already have a basic > representation using digraph from networkx, and a basic OBO parser. But i'm > checking to see if there is no duplicate effort here. > > All very initial development. > > Parser: > https://github.com/idoerg/go-parser > > (the relevan module is > https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py > > > DAG: > https://github.com/osamajomaa/DAGON > > Thanks, > > Iddo > > -- > > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From idoerg at gmail.com Mon Jan 6 17:17:20 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 6 Jan 2014 17:17:20 -0500 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References: Message-ID: Hi Bartek, Thanks. I looked at it a bit. Any reason why you did your own digraphs instead of using networkx? See also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph That said, it seems very mature. But the networkx does many functions for plotting, interrogating and manipulating graphs. Your OBO parser seems quite mature. Are you planning to add other edge types? (E.g. "part_of"). What we are trying to do here, is set up (besides the parser & DAG implementation) also a measure of DAG similarities. This is due to my interest in assessing function similarity. That will be a separate module (perhaps not even useful to Biopython). So now I am not sure what to do :/ The IO modules seem complete & usable, but I would have rather seen a DAG implementation using networkx. Ideas? On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski wrote: > Hi Iddo, > > My student has also recently implemented a module for ontologies. Maybe we > can somehow merge these efforts. His code can be found here: > https://github.com/tosterovic/biopython > > the relevant part is Bio/Ontology > > best > Bartek > > On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: > >> Hi all, >> >> Is there any effort going on for developing the OBO parser & Bio-ontology >> DAG? If not, my lab wants to push this. We already have a basic >> representation using digraph from networkx, and a basic OBO parser. But >> i'm >> checking to see if there is no duplicate effort here. >> >> All very initial development. >> >> Parser: >> https://github.com/idoerg/go-parser >> >> (the relevan module is >> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >> >> >> DAG: >> https://github.com/osamajomaa/DAGON >> >> Thanks, >> >> Iddo >> >> -- >> >> Iddo Friedberg >> http://iddo-friedberg.net/contact.html >> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >> >>----.<--.>++++++.<<<<------------------------------------. >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > > > -- > Bartek Wilczynski > ================== > Institute of Informatics > University of Warsaw > http://www.mimuw.edu.pl/~bartek > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From bartek at rezolwenta.eu.org Mon Jan 6 17:26:35 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 6 Jan 2014 23:26:35 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I will meet with Kamil sometime this week and we will discuss options for switching to networkx or at least adding some compatibility layer for it. I think the information about the edge type is preserved in the DAG after parsing, so I'm not sure what you mean by "supporting" other types of relationships. Our interest was mostly in ontology term enrichment analysis, which Kamil implemented, and his version is also usable for parsing, but I think we are quite open to changes still at this point and I'm sure we will be able to come up with a good version merging the important features from both versions. best Bartek On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg wrote: > Hi Bartek, > > Thanks. I looked at it a bit. > > Any reason why you did your own digraphs instead of using networkx? See > also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph > > That said, it seems very mature. But the networkx does many functions for > plotting, interrogating and manipulating graphs. > > Your OBO parser seems quite mature. Are you planning to add other edge > types? (E.g. "part_of"). > > What we are trying to do here, is set up (besides the parser & DAG > implementation) also a measure of DAG similarities. This is due to my > interest in assessing function similarity. That will be a separate module > (perhaps not even useful to Biopython). > > So now I am not sure what to do :/ The IO modules seem complete & usable, > but I would have rather seen a DAG implementation using networkx. > > Ideas? > > > > On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> Hi Iddo, >> >> My student has also recently implemented a module for ontologies. Maybe >> we can somehow merge these efforts. His code can be found here: >> https://github.com/tosterovic/biopython >> >> the relevant part is Bio/Ontology >> >> best >> Bartek >> >> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: >> >>> Hi all, >>> >>> Is there any effort going on for developing the OBO parser & Bio-ontology >>> DAG? If not, my lab wants to push this. We already have a basic >>> representation using digraph from networkx, and a basic OBO parser. But >>> i'm >>> checking to see if there is no duplicate effort here. >>> >>> All very initial development. >>> >>> Parser: >>> https://github.com/idoerg/go-parser >>> >>> (the relevan module is >>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >>> >>> >>> DAG: >>> https://github.com/osamajomaa/DAGON >>> >>> Thanks, >>> >>> Iddo >>> >>> -- >>> >>> Iddo Friedberg >>> http://iddo-friedberg.net/contact.html >>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>> >>----.<--.>++++++.<<<<------------------------------------. >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>> >>> >> >> >> -- >> Bartek Wilczynski >> ================== >> Institute of Informatics >> University of Warsaw >> http://www.mimuw.edu.pl/~bartek >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From marco.galardini at unifi.it Tue Jan 7 08:51:44 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Tue, 07 Jan 2014 14:51:44 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? Message-ID: <52CC0670.5070909@unifi.it> Dear biopythoners, will you be interested in the addition of a PM data files parser in Biopython? Such phenotypic data should be widely used in microbial experiments, even though just a few tools/libraries are available to parse and analyze such data (so far it's available a R package called opm and a python command line tool developed by me, called DuctApe). If you think that the ability to parse such files and perform some basic analysis (like curve parameters extraction) would be interesting as a new feature for biopython, I would very much like to contribute. If you were interested in such addition, could you pleaseindicate if there is any specification on the style guide for the biopython parsers? Thanks for the amazing work, Marco -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- From bartek at rezolwenta.eu.org Tue Jan 7 15:51:19 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 7 Jan 2014 21:51:19 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I've talked with Kamil today and we've looked through some of your code. We have actually evaluated networkx before implementing our library and thought it was unnecessary for our purposes to make it a requirement for biopython as the enrichment analysis is using a relatively small subset of graph operations. However it would be very easy to make functions for converting our ontologies to networkx digraphs, either with or without gene annotations as additional attributes. As for support for different types of transitivity in relations of different type (as in your inference of ancestry for is_a and part_of relations) we are currently not supporting it, but after thinking about it, we will make a change to support this feature. Probably we will let the user to (optionally) define the transitivity between relationship types (i.e. is_a + part_of becomes part_of, etc). In general, it would be very helpful if you could give us some rough idea about your expected use cases. For example: are you expecting to modify the graphs in the networkx objects? What will you use the inferred ancestor lists for? So that the changes we make will be as useful to the community as possible. Naturally, if anyone else wants to contribute their ideas or use-cases, you are most welcome... best Bartek On Mon, Jan 6, 2014 at 11:42 PM, Iddo Friedberg wrote: > I will meet with my student tomorrow (cc'd) and we can continue this > further. > > Osama: Bartek Wilczynski's group has been working on a OBO parser / GO > module too. Their parser seems complete & useful. Their digraph > implementation is not in networkx, so I'm not sure about adopting that as > is. In any case, let's meet tomorrow and talk, and maybe the four of us can > work out a collaborative plan if we feel it's useful. > > Cheers, > > Iddo > > > On Mon, Jan 6, 2014 at 5:26 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> Hi, >> >> I will meet with Kamil sometime this week and we will discuss options for >> switching to networkx or at least adding some compatibility layer for it. I >> think the information about the edge type is preserved in the DAG after >> parsing, so I'm not sure what you mean by "supporting" other types of >> relationships. Our interest was mostly in ontology term enrichment >> analysis, which Kamil implemented, and his version is also usable for >> parsing, but I think we are quite open to changes still at this point and >> I'm sure we will be able to come up with a good version merging the >> important features from both versions. >> >> best >> Bartek >> >> >> On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg wrote: >> >>> Hi Bartek, >>> >>> Thanks. I looked at it a bit. >>> >>> Any reason why you did your own digraphs instead of using networkx? See >>> also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph >>> >>> That said, it seems very mature. But the networkx does many functions >>> for plotting, interrogating and manipulating graphs. >>> >>> Your OBO parser seems quite mature. Are you planning to add other edge >>> types? (E.g. "part_of"). >>> >>> What we are trying to do here, is set up (besides the parser & DAG >>> implementation) also a measure of DAG similarities. This is due to my >>> interest in assessing function similarity. That will be a separate module >>> (perhaps not even useful to Biopython). >>> >>> So now I am not sure what to do :/ The IO modules seem complete & >>> usable, but I would have rather seen a DAG implementation using networkx. >>> >>> Ideas? >>> >>> >>> >>> On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski < >>> bartek at rezolwenta.eu.org> wrote: >>> >>>> Hi Iddo, >>>> >>>> My student has also recently implemented a module for ontologies. Maybe >>>> we can somehow merge these efforts. His code can be found here: >>>> https://github.com/tosterovic/biopython >>>> >>>> the relevant part is Bio/Ontology >>>> >>>> best >>>> Bartek >>>> >>>> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: >>>> >>>>> Hi all, >>>>> >>>>> Is there any effort going on for developing the OBO parser & >>>>> Bio-ontology >>>>> DAG? If not, my lab wants to push this. We already have a basic >>>>> representation using digraph from networkx, and a basic OBO parser. >>>>> But i'm >>>>> checking to see if there is no duplicate effort here. >>>>> >>>>> All very initial development. >>>>> >>>>> Parser: >>>>> https://github.com/idoerg/go-parser >>>>> >>>>> (the relevan module is >>>>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >>>>> >>>>> >>>>> DAG: >>>>> https://github.com/osamajomaa/DAGON >>>>> >>>>> Thanks, >>>>> >>>>> Iddo >>>>> >>>>> -- >>>>> >>>>> Iddo Friedberg >>>>> http://iddo-friedberg.net/contact.html >>>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>>>> >>----.<--.>++++++.<<<<------------------------------------. >>>>> _______________________________________________ >>>>> Biopython-dev mailing list >>>>> Biopython-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>>>> >>>>> >>>> >>>> >>>> -- >>>> Bartek Wilczynski >>>> ================== >>>> Institute of Informatics >>>> University of Warsaw >>>> http://www.mimuw.edu.pl/~bartek >>>> >>> >>> >>> >>> -- >>> Iddo Friedberg >>> http://iddo-friedberg.net/contact.html >>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>> >>----.<--.>++++++.<<<<------------------------------------. >>> >> >> >> >> -- >> Bartek Wilczynski >> ================== >> Institute of Informatics >> University of Warsaw >> http://www.mimuw.edu.pl/~bartek >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From mjldehoon at yahoo.com Wed Jan 8 00:53:19 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 7 Jan 2014 21:53:19 -0800 (PST) Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52CC0670.5070909@unifi.it> Message-ID: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> Hi Marco, > will you be interested in the addition of a PM data files > parser in Biopython? Yes it makes sense to me. > any specification on the style guide for the biopython parsers? There is no strict set of rules, but to get you started, many modules follow this format: - Assuming a PM data file contains only a single data set, the module should contain a function "read" that takes either a file name or a file handle as the argument. - The module should contain a class (typically called "Record") that can store the data in the data file. The "read" function returns an object of this class. - Try to avoid third-party dependencies if at all possible. Would it make sense to have a single Bio.Microarray module that can house the various microarray parsers (PM, Affy, others)? Best, -Michiel. -------------------------------------------- On Tue, 1/7/14, Marco Galardini wrote: Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? To: "Biopython-Dev Mailing List" Date: Tuesday, January 7, 2014, 8:51 AM Dear biopythoners, will you be interested in the addition of a PM data files parser in Biopython? Such phenotypic data should be widely used in microbial experiments, even though just a few tools/libraries are available to parse and analyze such data (so far it's available a R package called opm and a python command line tool developed by me, called DuctApe). If you think that the ability to parse such files and perform some basic analysis (like curve parameters extraction) would be interesting as a new feature for biopython, I would very much like to contribute. If you were interested in such addition, could you pleaseindicate if there is any specification on the style guide for the biopython parsers? Thanks for the amazing work, Marco -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone:? +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From marco.galardini at unifi.it Wed Jan 8 05:32:40 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Wed, 08 Jan 2014 11:32:40 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: <52CD2948.7050102@unifi.it> Hi, On 01/08/2014 06:53 AM, Michiel de Hoon wrote: >> any specification on the style guide for the biopython parsers? > There is no strict set of rules, but to get you started, many modules > follow this format: > - Assuming a PM data file contains only a single data set, the module > should contain a function "read" that takes either a file name or a file > handle as the argument. Unfortunately, the situation is a bit mixed up: there are basically three file formats for PM data: as csv files (which can contain one or more data sets or 'plates') and as yaml/json, which can contain also some metadata. I would therefore use a similar approach as the SeqIO module, having a parse() and a read() method that returns an exception if the file contains more than one record. > - The module should contain a class (typically called "Record") that > can store the data in the data file. The "read" function returns an > object of this class. > - Try to avoid third-party dependencies if at all possible. So far the dependencies would be pyYaml (for the yaml/json parsing, but maybe i could use the stdlib json module) and numpy/scipy for the extraction of curve parameters. Does this sound ok? > > Would it make sense to have a single Bio.Microarray module that can > house the various microarray parsers (PM, Affy, others)? I don't know if that would be a good strategy: the Phenotype Microarrays are very different from the other proper microarrays; how about a "phenomics" module? > > Best, > -Michiel. Kind regards, Marco From idoerg at gmail.com Wed Jan 8 11:55:16 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 8 Jan 2014 11:55:16 -0500 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi Bartek. See inlined responses below. Anyone else has ideas on how to do this? On Tue, Jan 7, 2014 at 3:51 PM, Bartek Wilczynski wrote: > Hi, > > I've talked with Kamil today and we've looked through some of your code. > > We have actually evaluated networkx before implementing our library and > thought it was unnecessary for our purposes to make it a requirement for > biopython as the enrichment analysis is using a relatively small subset of > graph operations. > > I understand your rationale, but I disagree with it, mainly for design reasons. 1. Enrichment analysis is only one of many different applications that can be performed with GO. Therefore, saying that features are unnecessary because a particular use case does not require them should not be a design consideration for a module that is intended for general use. Rather, a generic package handling ontologies should be just that: generic, and disengaged from any kind of application. Therefore, if your package is intended for biopython the use-case (enrichment analysis) should be decoupled from the parser + data structure. 2. The graph features that you wrote in Digraph exist in networkx anyway, or am I missing something? So why not take advantage of nx instead of redoing it even if it does have many redundant (for you) graph manipulation & diagnostic features? Someone else may want to use these features, including the graphics nx provides, etc. > However it would be very easy to make functions for converting our > ontologies to networkx digraphs, either with or without gene annotations as > additional attributes. > > Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I agree there. > As for support for different types of transitivity in relations of > different type (as in your inference of ancestry for is_a and part_of > relations) we are currently not supporting it, but after thinking about it, > we will make a change to support this feature. Probably we will let the > user to (optionally) define the transitivity between relationship types > (i.e. is_a + part_of becomes part_of, etc). > > In general, it would be very helpful if you could give us some rough idea > about your expected use cases. For example: are you expecting to modify the > graphs in the networkx objects? What will you use the inferred ancestor > lists for? So that the changes we make will be as useful to the community > as possible. > The idea is that expected use cases should not impact the design of a basic parser + data structure. In my lab, we are looking at inferred ancestors lists to calculate semantic similarity, but it really doesn't matter what we (or anyone) will end up using the GO module for. If you provide enrichment analysis *on top* of the parser + data structure (as a separate module), and we provide semantic similarity (again as a separate module *on top* of the parser + data structure) those are nice bonuses. But the parser + data structure should be as general as possible. That is: include all the information in the OBO file, placed in a digraph structure that can be comprehensively interrogated, visualized and manipulated (which is what nx offers). > Naturally, if anyone else wants to contribute their ideas or use-cases, > you are most welcome... > > best > Bartek > > > > On Mon, Jan 6, 2014 at 11:42 PM, Iddo Friedberg wrote: > >> I will meet with my student tomorrow (cc'd) and we can continue this >> further. >> >> Osama: Bartek Wilczynski's group has been working on a OBO parser / GO >> module too. Their parser seems complete & useful. Their digraph >> implementation is not in networkx, so I'm not sure about adopting that as >> is. In any case, let's meet tomorrow and talk, and maybe the four of us can >> work out a collaborative plan if we feel it's useful. >> >> Cheers, >> >> Iddo >> >> >> On Mon, Jan 6, 2014 at 5:26 PM, Bartek Wilczynski < >> bartek at rezolwenta.eu.org> wrote: >> >>> Hi, >>> >>> I will meet with Kamil sometime this week and we will discuss options >>> for switching to networkx or at least adding some compatibility layer for >>> it. I think the information about the edge type is preserved in the DAG >>> after parsing, so I'm not sure what you mean by "supporting" other types of >>> relationships. Our interest was mostly in ontology term enrichment >>> analysis, which Kamil implemented, and his version is also usable for >>> parsing, but I think we are quite open to changes still at this point and >>> I'm sure we will be able to come up with a good version merging the >>> important features from both versions. >>> >>> best >>> Bartek >>> >>> >>> On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg wrote: >>> >>>> Hi Bartek, >>>> >>>> Thanks. I looked at it a bit. >>>> >>>> Any reason why you did your own digraphs instead of using networkx? See >>>> also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph >>>> >>>> That said, it seems very mature. But the networkx does many functions >>>> for plotting, interrogating and manipulating graphs. >>>> >>>> Your OBO parser seems quite mature. Are you planning to add other edge >>>> types? (E.g. "part_of"). >>>> >>>> What we are trying to do here, is set up (besides the parser & DAG >>>> implementation) also a measure of DAG similarities. This is due to my >>>> interest in assessing function similarity. That will be a separate module >>>> (perhaps not even useful to Biopython). >>>> >>>> So now I am not sure what to do :/ The IO modules seem complete & >>>> usable, but I would have rather seen a DAG implementation using networkx. >>>> >>>> Ideas? >>>> >>>> >>>> >>>> On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski < >>>> bartek at rezolwenta.eu.org> wrote: >>>> >>>>> Hi Iddo, >>>>> >>>>> My student has also recently implemented a module for ontologies. >>>>> Maybe we can somehow merge these efforts. His code can be found here: >>>>> https://github.com/tosterovic/biopython >>>>> >>>>> the relevant part is Bio/Ontology >>>>> >>>>> best >>>>> Bartek >>>>> >>>>> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Is there any effort going on for developing the OBO parser & >>>>>> Bio-ontology >>>>>> DAG? If not, my lab wants to push this. We already have a basic >>>>>> representation using digraph from networkx, and a basic OBO parser. >>>>>> But i'm >>>>>> checking to see if there is no duplicate effort here. >>>>>> >>>>>> All very initial development. >>>>>> >>>>>> Parser: >>>>>> https://github.com/idoerg/go-parser >>>>>> >>>>>> (the relevan module is >>>>>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >>>>>> >>>>>> >>>>>> DAG: >>>>>> https://github.com/osamajomaa/DAGON >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Iddo >>>>>> >>>>>> -- >>>>>> >>>>>> Iddo Friedberg >>>>>> http://iddo-friedberg.net/contact.html >>>>>> >>>>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>>>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>>>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>>>>> >>----.<--.>++++++.<<<<------------------------------------. >>>>>> _______________________________________________ >>>>>> Biopython-dev mailing list >>>>>> Biopython-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Bartek Wilczynski >>>>> ================== >>>>> Institute of Informatics >>>>> University of Warsaw >>>>> http://www.mimuw.edu.pl/~bartek >>>>> >>>> >>>> >>>> >>>> -- >>>> Iddo Friedberg >>>> http://iddo-friedberg.net/contact.html >>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>>> >>----.<--.>++++++.<<<<------------------------------------. >>>> >>> >>> >>> >>> -- >>> Bartek Wilczynski >>> ================== >>> Institute of Informatics >>> University of Warsaw >>> http://www.mimuw.edu.pl/~bartek >>> >> >> >> >> -- >> Iddo Friedberg >> http://iddo-friedberg.net/contact.html >> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >> >>----.<--.>++++++.<<<<------------------------------------. >> > > > > -- > Bartek Wilczynski > ================== > Institute of Informatics > University of Warsaw > http://www.mimuw.edu.pl/~bartek > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From eric.talevich at gmail.com Wed Jan 8 14:18:52 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 8 Jan 2014 11:18:52 -0800 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: On Tue, Jan 7, 2014 at 12:51 PM, Bartek Wilczynski wrote: > [...] > We have actually evaluated networkx before implementing our library and > thought it was unnecessary for our purposes to make it a requirement for > biopython as the enrichment analysis is using a relatively small subset of > graph operations. > > However it would be very easy to make functions for converting our > ontologies to networkx digraphs, either with or without gene annotations as > additional attributes. > > [...] > Chiming in briefly -- While it does make sense to avoid new dependencies in Biopython modules, the Bio.Phylo module already uses networkx for a couple of non-core functions. Since networkx is portable, pure Python (as far as I can tell) and has no additional hard dependencies of its own, it would not necessarily be a major inconvenience to the user if this new module depends on networkx. From bartek at rezolwenta.eu.org Wed Jan 8 16:05:47 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 8 Jan 2014 22:05:47 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I'll answer below (even though I do have a bad habit of top-posting my answers, sorry). On Wed, Jan 8, 2014 at 5:55 PM, Iddo Friedberg wrote: >I wrote about not using networkx as the main data structure for ontologies > > I understand your rationale, but I disagree with it, mainly for design > reasons. > > Ok. I guess I was a bit too brief with my explanation. We have considered using networkx and decided not to use it mainly because it was not very useful, and implementing what was neccessary for parsing was not an issue for Kamil. Networkx is currently not a dependency for biopython or for bio.phylo and it is not even listed as an "optional software" along with reportlab and such (http://biopython.org/wiki/Download I guess it should after Eric's comment). My understaing is that this is a policy of thinking twice before adding something additional as a dependency, because we would need to care for compatibility with different networkx versions. Taking Bio.Phylo as an example I do think it is a good policy to keep such libraries as optional if possible. Besides, we did not particularly liked the way digraphs are implemented in Networkx with a heavy use of dictionaries, as this might get slow for large dictionaries. Now to your specific points: > 1. Enrichment analysis is only one of many different applications that can > be performed with GO. Therefore, saying that features are unnecessary > because a particular use case does not require them should not be a design > consideration for a module that is intended for general use. Rather, a > generic package handling ontologies should be just that: generic, and > disengaged from any kind of application. Therefore, if your package is > intended for biopython the use-case (enrichment analysis) should be > decoupled from the parser + data structure. > > We were obviously tailoring this to our needs, but I have to disagree with your argument. Because of the reasons above, I think that we should use external digraph library _only_ if it is _necessary_ for the parsing and storing and it clearly isn't. For the separation of parsing and enrichment, we do want to keep the parser separate from the enrichment analysis and I thought it was quite clear with the use of separate classes, but we are absolutely open to discuss how to organize these modules. If you take the parser as a separate module - using networkx is even less needed(really no need for a big graph manipulation lib here). > 2. The graph features that you wrote in Digraph exist in networkx anyway, > or am I missing something? So why not take advantage of nx instead of > redoing it even if it does have many redundant (for you) graph manipulation > & diagnostic features? Someone else may want to use these features, > including the graphics nx provides, etc. > Yes, the point is that parsing, storing a digraph is a simple thing and there is no need to add a large library for that. If there was a digraph library in biopython, it would be stupid not to use it, but I don't feel we need to add a dependency here. > >> However it would be very easy to make functions for converting our >> ontologies to networkx digraphs, either with or without gene annotations as >> additional attributes. >> >> > Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I > agree there. > > That's also exactly what is done in the Bio.Phylo. We are planning to write a function analogous to Bio.Phylo._util.to_networkx() which would take a simple digraph obtained from parsing an OBO file and give you a networkx digraph with all the data for manipulation. > As for support for different types of transitivity in relations of >> different type (as in your inference of ancestry for is_a and part_of >> relations) we are currently not supporting it, but after thinking about it, >> we will make a change to support this feature. Probably we will let the >> user to (optionally) define the transitivity between relationship types >> (i.e. is_a + part_of becomes part_of, etc). >> >> In general, it would be very helpful if you could give us some rough idea >> about your expected use cases. For example: are you expecting to modify the >> graphs in the networkx objects? What will you use the inferred ancestor >> lists for? So that the changes we make will be as useful to the community >> as possible. >> > > > The idea is that expected use cases should not impact the design of a > basic parser + data structure. In my lab, we are looking at inferred > ancestors lists to calculate semantic similarity, but it really doesn't > matter what we (or anyone) will end up using the GO module for. If you > provide enrichment analysis *on top* of the parser + data structure (as a > separate module), and we provide semantic similarity (again as a separate > module *on top* of the parser + data structure) those are nice bonuses. But > the parser + data structure should be as general as possible. That is: > include all the information in the OBO file, placed in a digraph structure > that can be comprehensively interrogated, visualized and manipulated (which > is what nx offers). > > I was unfortunately not very clear here. What I meant was that we were considering what is necessary for typical uses of ontologies were parsing, and accessing the terms. And I think that is valid in the sense that majority of users is treating Ontologies as read-only data (not that many biopython users are making their own ontologies, otherwise, it would have been implemented ages ago...). As for the second argument: I do fully agree that there should be some separation between ontology and annotation reading and any functionality "on-top" of it. But I think that this would be not a reasonable thing to do to include networkx as the main data structure. Currently there is only one library that biopython depends on and it is numpy. I do not see networkx as equaly important. I think that we should go the way paved by the bio.phylo and use the simple digraph (which already holds all the information from the OBO files afaik) for parsing output and convert it to networkx where necessary. best Bartek -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From eric.talevich at gmail.com Wed Jan 8 16:47:12 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 8 Jan 2014 13:47:12 -0800 Subject: [Biopython-dev] List of Biopython dependencies Message-ID: Folks, It appears that there are two lists of Biopython dependencies on the wiki: http://biopython.org/wiki/List_of_applications_executed_via_Biopython http://biopython.org/wiki/Download#Required_Software There is potential for confusion here. Shall we merge these lists on the Download page? We could follow Debian's approach of splitting the list into Depends (strict requirements, e.g. Python), Recommended (non-essential but important, e.g. numpy and reportlab), and Suggested (everything else). Regards, Eric From p.j.a.cock at googlemail.com Wed Jan 8 16:55:13 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 8 Jan 2014 21:55:13 +0000 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References: Message-ID: On Wednesday, January 8, 2014, Eric Talevich wrote: > Folks, > > It appears that there are two lists of Biopython dependencies on the wiki: > http://biopython.org/wiki/List_of_applications_executed_via_Biopython > http://biopython.org/wiki/Download#Required_Software > > There is potential for confusion here. Shall we merge these lists on the > Download page? Tiago started the list of applications page while updating the buildbots - it is as noted a work in progress. Maybe it should be just command line applications, which can then be linked to from the downloads page? > We could follow Debian's approach of splitting the list into Depends > (strict requirements, e.g. Python), Recommended (non-essential but > important, e.g. numpy and reportlab), and Suggested (everything else). > That split makes sense for libraries, but is it also sensible for command line applications Biopython can call? Regards, Peter From idoerg at gmail.com Wed Jan 8 16:59:00 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 8 Jan 2014 16:59:00 -0500 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: So it seems like we are debating minimal external dependencies vs. maximizing functionality. How about the following: the OBO file will be read into an independent, basic digraph like Bartek's team has already constructed. But we will also have the ability to transfer the biopython DAG into a networkx DAG, so that anyone wishing to play elaborate games with the ontology structure (as we do), can do so without re-inventing the wheel. How does that sound? One thing about networkx: I still really, really like it :), and we started writing the digraph based on it because of this page: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph But the fact that this spec using networkx has been written does not have to commit us to this particular design. On Wed, Jan 8, 2014 at 4:05 PM, Bartek Wilczynski wrote: > Hi, > > I'll answer below (even though I do have a bad habit of top-posting my > answers, sorry). > > > On Wed, Jan 8, 2014 at 5:55 PM, Iddo Friedberg wrote: > > >I wrote about not using networkx as the main data structure for ontologies > >> >> I understand your rationale, but I disagree with it, mainly for design >> reasons. >> >> Ok. I guess I was a bit too brief with my explanation. We have considered > using networkx and decided not to use it mainly because it was not very > useful, and implementing what was neccessary for parsing was not an issue > for Kamil. Networkx is currently not a dependency for biopython or for > bio.phylo and it is not even listed as an "optional software" along with > reportlab and such (http://biopython.org/wiki/Download I guess it should > after Eric's comment). My understaing is that this is a policy of thinking > twice before adding something additional as a dependency, because we would > need to care for compatibility with different networkx versions. Taking > Bio.Phylo as an example I do think it is a good policy to keep such > libraries as optional if possible. Besides, we did not particularly liked > the way digraphs are implemented in Networkx with a heavy use of > dictionaries, as this might get slow for large dictionaries. > Now to your specific points: > > >> 1. Enrichment analysis is only one of many different applications that >> can be performed with GO. Therefore, saying that features are unnecessary >> because a particular use case does not require them should not be a design >> consideration for a module that is intended for general use. Rather, a >> generic package handling ontologies should be just that: generic, and >> disengaged from any kind of application. Therefore, if your package is >> intended for biopython the use-case (enrichment analysis) should be >> decoupled from the parser + data structure. >> >> We were obviously tailoring this to our needs, but I have to disagree > with your argument. Because of the reasons above, I think that we should > use external digraph library _only_ if it is _necessary_ for the parsing > and storing and it clearly isn't. > > For the separation of parsing and enrichment, we do want to keep the > parser separate from the enrichment analysis and I thought it was quite > clear with the use of separate classes, but we are absolutely open to > discuss how to organize these modules. > > If you take the parser as a separate module - using networkx is even less > needed(really no need for a big graph manipulation lib here). > > >> 2. The graph features that you wrote in Digraph exist in networkx anyway, >> or am I missing something? So why not take advantage of nx instead of >> redoing it even if it does have many redundant (for you) graph manipulation >> & diagnostic features? Someone else may want to use these features, >> including the graphics nx provides, etc. >> > > Yes, the point is that parsing, storing a digraph is a simple thing and > there is no need to add a large library for that. If there was a digraph > library in biopython, it would be stupid not to use it, but I don't feel we > need to add a dependency here. > > >> >>> However it would be very easy to make functions for converting our >>> ontologies to networkx digraphs, either with or without gene annotations as >>> additional attributes. >>> >>> >> Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I >> agree there. >> >> > That's also exactly what is done in the Bio.Phylo. We are planning to > write a function analogous to Bio.Phylo._util.to_networkx() which would > take a simple digraph obtained from parsing an OBO file and give you a > networkx digraph with all the data for manipulation. > > > >> As for support for different types of transitivity in relations of >>> different type (as in your inference of ancestry for is_a and part_of >>> relations) we are currently not supporting it, but after thinking about it, >>> we will make a change to support this feature. Probably we will let the >>> user to (optionally) define the transitivity between relationship types >>> (i.e. is_a + part_of becomes part_of, etc). >>> >>> In general, it would be very helpful if you could give us some rough >>> idea about your expected use cases. For example: are you expecting to >>> modify the graphs in the networkx objects? What will you use the inferred >>> ancestor lists for? So that the changes we make will be as useful to the >>> community as possible. >>> >> >> >> The idea is that expected use cases should not impact the design of a >> basic parser + data structure. In my lab, we are looking at inferred >> ancestors lists to calculate semantic similarity, but it really doesn't >> matter what we (or anyone) will end up using the GO module for. If you >> provide enrichment analysis *on top* of the parser + data structure (as a >> separate module), and we provide semantic similarity (again as a separate >> module *on top* of the parser + data structure) those are nice bonuses. But >> the parser + data structure should be as general as possible. That is: >> include all the information in the OBO file, placed in a digraph structure >> that can be comprehensively interrogated, visualized and manipulated (which >> is what nx offers). >> >> I was unfortunately not very clear here. What I meant was that we were > considering what is necessary for typical uses of ontologies were parsing, > and accessing the terms. And I think that is valid in the sense that > majority of users is treating Ontologies as read-only data (not that many > biopython users are making their own ontologies, otherwise, it would have > been implemented ages ago...). > > As for the second argument: I do fully agree that there should be some > separation between ontology and annotation reading and any functionality > "on-top" of it. But I think that this would be not a reasonable thing to do > to include networkx as the main data structure. Currently there is only one > library that biopython depends on and it is numpy. I do not see networkx as > equaly important. I think that we should go the way paved by the bio.phylo > and use the simple digraph (which already holds all the information from > the OBO files afaik) for parsing output and convert it to networkx where > necessary. > > best > Bartek > -- > Bartek Wilczynski > ================== > Institute of Informatics > University of Warsaw > http://www.mimuw.edu.pl/~bartek > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From eric.talevich at gmail.com Wed Jan 8 17:51:45 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 8 Jan 2014 14:51:45 -0800 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References:

Message-ID: On Wed, Jan 8, 2014 at 1:55 PM, Peter Cock wrote: > > > On Wednesday, January 8, 2014, Eric Talevich wrote: > >> Folks, >> >> It appears that there are two lists of Biopython dependencies on the wiki: >> http://biopython.org/wiki/List_of_applications_executed_via_Biopython >> http://biopython.org/wiki/Download#Required_Software >> >> There is potential for confusion here. Shall we merge these lists on the >> Download page? > > > Tiago started the list of applications page while updating the > buildbots - it is as noted a work in progress. Maybe it should > be just command line applications, which can then be linked > to from the downloads page? > So the list of command line applications on the Download page would be moved to the separate Wiki page? Seems reasonable if the list is long. We could follow Debian's approach of splitting the list into Depends >> (strict requirements, e.g. Python), Recommended (non-essential but >> important, e.g. numpy and reportlab), and Suggested (everything else). >> > > That split makes sense for libraries, but is it also sensible > for command line applications Biopython can call? > The sub-headings could be: - Required software - Recommended Python libraries [needed to enable significant functionality] - Additional Python libraries [used by individual Biopython modules for additional features] "In addition Biopython includes wrapper code for calling a number of third party command line tools [link]" I imagine this format would be useful to sysadmins (and maybe distro package maintainers, too). But it may be just as well to merge the "required" and "recommended" lists and just note caveats inline as is currently done. From tiagoantao at gmail.com Thu Jan 9 04:02:25 2014 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 9 Jan 2014 09:02:25 +0000 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References:

Message-ID: Hi all, What I am trying to do is to get a list of stuff that assures that biopython is tested top to bottom. This in order to assure QA for releases and a complete and stable test framework. I am also trying to create a VM machine for testing (based on CloudBioLinux - with Brad's help), so that anyone can do a full test and there is a complete VM with everything needed out-of-the-box. It so happens that I have been travelling since mid-December (back in office on Monday) and things have been mostly frozen (restarting next week). The format for the list is really not much important, but it has a completely different slant from the requirements page: it is meant for developers and testing, etc. Whatever you guys prefer is fine by me. I just plan to review the list top-to-bottom and assure that it is complete and up-to-date. Tiago On 8 January 2014 22:51, Eric Talevich wrote: > On Wed, Jan 8, 2014 at 1:55 PM, Peter Cock wrote: > >> >> >> On Wednesday, January 8, 2014, Eric Talevich wrote: >> >>> Folks, >>> >>> It appears that there are two lists of Biopython dependencies on the >>> wiki: >>> http://biopython.org/wiki/List_of_applications_executed_via_Biopython >>> http://biopython.org/wiki/Download#Required_Software >>> >>> There is potential for confusion here. Shall we merge these lists on the >>> Download page? >> >> >> Tiago started the list of applications page while updating the >> buildbots - it is as noted a work in progress. Maybe it should >> be just command line applications, which can then be linked >> to from the downloads page? >> > > So the list of command line applications on the Download page would be > moved to the separate Wiki page? Seems reasonable if the list is long. > > > We could follow Debian's approach of splitting the list into Depends >>> (strict requirements, e.g. Python), Recommended (non-essential but >>> important, e.g. numpy and reportlab), and Suggested (everything else). >>> >> >> That split makes sense for libraries, but is it also sensible >> for command line applications Biopython can call? >> > > The sub-headings could be: > - Required software > - Recommended Python libraries [needed to enable significant functionality] > - Additional Python libraries [used by individual Biopython modules for > additional features] > "In addition Biopython includes wrapper code for calling a number of third > party command line tools [link]" > > I imagine this format would be useful to sysadmins (and maybe distro > package maintainers, too). But it may be just as well to merge the > "required" and "recommended" lists and just note caveats inline as is > currently done. > -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From arklenna at gmail.com Thu Jan 9 09:50:34 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 9 Jan 2014 09:50:34 -0500 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References:

Message-ID: On Thu, Jan 9, 2014 at 4:02 AM, Tiago Ant?o wrote: > Hi all, > > What I am trying to do is to get a list of stuff that assures that > biopython is tested top to bottom. This in order to assure QA for releases > and a complete and stable test framework. > I am also trying to create a VM machine for testing (based on CloudBioLinux > - with Brad's help), so that anyone can do a full test and there is a > complete VM with everything needed out-of-the-box. > It so happens that I have been travelling since mid-December (back in > office on Monday) and things have been mostly frozen (restarting next > week). > > The format for the list is really not much important, but it has a > completely different slant from the requirements page: it is meant for > developers and testing, etc. Whatever you guys prefer is fine by me. I just > plan to review the list top-to-bottom and assure that it is complete and > up-to-date. > > The difference of intent is logical, but it should be possible to make a single list that fulfills both goals. The user audience list divided by requirements, recommended, etc. can also be a top-to-bottom testing list - to test, install everything from every list. It would be more error-prone to maintain two separate lists. Lenna From bartek at rezolwenta.eu.org Thu Jan 9 18:06:30 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 10 Jan 2014 00:06:30 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I think that's a great soution. In fact Kamil has now implemented such conversion. Take a look if this suits your needs: https://github.com/tosterovic/biopython/commit/904e51303391411b42697205b09181378662807d We would be very happy to contribute this module to biopython repo, so it would be great if more people would take a look and suggest changes needed for accepting this as a part of biopython best Bartek On Wed, Jan 8, 2014 at 10:59 PM, Iddo Friedberg wrote: > So it seems like we are debating minimal external dependencies vs. > maximizing functionality. > > How about the following: the OBO file will be read into an independent, > basic digraph like Bartek's team has already constructed. > > But we will also have the ability to transfer the biopython DAG into a > networkx DAG, so that anyone wishing to play elaborate games with the > ontology structure (as we do), can do so without re-inventing the wheel. > > How does that sound? > > One thing about networkx: I still really, really like it :), and we > started writing the digraph based on it because of this page: > > http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph > > But the fact that this spec using networkx has been written does not have > to commit us to this particular design. > > > > On Wed, Jan 8, 2014 at 4:05 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> Hi, >> >> I'll answer below (even though I do have a bad habit of top-posting my >> answers, sorry). >> >> >> On Wed, Jan 8, 2014 at 5:55 PM, Iddo Friedberg wrote: >> >> >I wrote about not using networkx as the main data structure for >> ontologies >> >>> >>> I understand your rationale, but I disagree with it, mainly for design >>> reasons. >>> >>> Ok. I guess I was a bit too brief with my explanation. We have >> considered using networkx and decided not to use it mainly because it was >> not very useful, and implementing what was neccessary for parsing was not >> an issue for Kamil. Networkx is currently not a dependency for biopython or >> for bio.phylo and it is not even listed as an "optional software" along >> with reportlab and such (http://biopython.org/wiki/Download I guess it >> should after Eric's comment). My understaing is that this is a policy of >> thinking twice before adding something additional as a dependency, because >> we would need to care for compatibility with different networkx versions. >> Taking Bio.Phylo as an example I do think it is a good policy to keep such >> libraries as optional if possible. Besides, we did not particularly liked >> the way digraphs are implemented in Networkx with a heavy use of >> dictionaries, as this might get slow for large dictionaries. >> Now to your specific points: >> >> >>> 1. Enrichment analysis is only one of many different applications that >>> can be performed with GO. Therefore, saying that features are unnecessary >>> because a particular use case does not require them should not be a design >>> consideration for a module that is intended for general use. Rather, a >>> generic package handling ontologies should be just that: generic, and >>> disengaged from any kind of application. Therefore, if your package is >>> intended for biopython the use-case (enrichment analysis) should be >>> decoupled from the parser + data structure. >>> >>> We were obviously tailoring this to our needs, but I have to disagree >> with your argument. Because of the reasons above, I think that we should >> use external digraph library _only_ if it is _necessary_ for the parsing >> and storing and it clearly isn't. >> >> For the separation of parsing and enrichment, we do want to keep the >> parser separate from the enrichment analysis and I thought it was quite >> clear with the use of separate classes, but we are absolutely open to >> discuss how to organize these modules. >> >> If you take the parser as a separate module - using networkx is even less >> needed(really no need for a big graph manipulation lib here). >> >> >>> 2. The graph features that you wrote in Digraph exist in networkx >>> anyway, or am I missing something? So why not take advantage of nx instead >>> of redoing it even if it does have many redundant (for you) graph >>> manipulation & diagnostic features? Someone else may want to use these >>> features, including the graphics nx provides, etc. >>> >> >> Yes, the point is that parsing, storing a digraph is a simple thing and >> there is no need to add a large library for that. If there was a digraph >> library in biopython, it would be stupid not to use it, but I don't feel we >> need to add a dependency here. >> >> >>> >>>> However it would be very easy to make functions for converting our >>>> ontologies to networkx digraphs, either with or without gene annotations as >>>> additional attributes. >>>> >>>> >>> Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I >>> agree there. >>> >>> >> That's also exactly what is done in the Bio.Phylo. We are planning to >> write a function analogous to Bio.Phylo._util.to_networkx() which would >> take a simple digraph obtained from parsing an OBO file and give you a >> networkx digraph with all the data for manipulation. >> >> >> >>> As for support for different types of transitivity in relations of >>>> different type (as in your inference of ancestry for is_a and part_of >>>> relations) we are currently not supporting it, but after thinking about it, >>>> we will make a change to support this feature. Probably we will let the >>>> user to (optionally) define the transitivity between relationship types >>>> (i.e. is_a + part_of becomes part_of, etc). >>>> >>>> In general, it would be very helpful if you could give us some rough >>>> idea about your expected use cases. For example: are you expecting to >>>> modify the graphs in the networkx objects? What will you use the inferred >>>> ancestor lists for? So that the changes we make will be as useful to the >>>> community as possible. >>>> >>> >>> >>> The idea is that expected use cases should not impact the design of a >>> basic parser + data structure. In my lab, we are looking at inferred >>> ancestors lists to calculate semantic similarity, but it really doesn't >>> matter what we (or anyone) will end up using the GO module for. If you >>> provide enrichment analysis *on top* of the parser + data structure (as a >>> separate module), and we provide semantic similarity (again as a separate >>> module *on top* of the parser + data structure) those are nice bonuses. But >>> the parser + data structure should be as general as possible. That is: >>> include all the information in the OBO file, placed in a digraph structure >>> that can be comprehensively interrogated, visualized and manipulated (which >>> is what nx offers). >>> >>> I was unfortunately not very clear here. What I meant was that we were >> considering what is necessary for typical uses of ontologies were parsing, >> and accessing the terms. And I think that is valid in the sense that >> majority of users is treating Ontologies as read-only data (not that many >> biopython users are making their own ontologies, otherwise, it would have >> been implemented ages ago...). >> >> As for the second argument: I do fully agree that there should be some >> separation between ontology and annotation reading and any functionality >> "on-top" of it. But I think that this would be not a reasonable thing to do >> to include networkx as the main data structure. Currently there is only one >> library that biopython depends on and it is numpy. I do not see networkx as >> equaly important. I think that we should go the way paved by the bio.phylo >> and use the simple digraph (which already holds all the information from >> the OBO files afaik) for parsing output and convert it to networkx where >> necessary. >> >> best >> Bartek >> -- >> Bartek Wilczynski >> ================== >> Institute of Informatics >> University of Warsaw >> http://www.mimuw.edu.pl/~bartek >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From mjldehoon at yahoo.com Fri Jan 10 03:36:50 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 10 Jan 2014 00:36:50 -0800 (PST) Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52CD2948.7050102@unifi.it> Message-ID: <1389343010.49219.YahooMailBasic@web164005.mail.gq1.yahoo.com> Hi Marco, > So far the dependencies would be pyYaml (for the yaml/json > parsing, but maybe i could use the stdlib json module) and > numpy/scipy for the extraction of curve parameters. Does > this sound ok? It would be better to avoid scipy, as currently it is not used in any Biopython module. Same goes for pyYaml; keep in mind that the number of users may decrease if there are more dependencies (and it puts a burden on developers also for testing). Numpy should be OK; a quick count suggests that 12 biopython modules are already using it. > how about a "phenomics" module? In principle that sounds fine to me. Best, -Michiel. From marco.galardini at unifi.it Sat Jan 11 07:49:53 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Sat, 11 Jan 2014 13:49:53 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <1389343010.49219.YahooMailBasic@web164005.mail.gq1.yahoo.com> References: <1389343010.49219.YahooMailBasic@web164005.mail.gq1.yahoo.com> Message-ID: <52D13DF1.50308@unifi.it> On 10/01/2014 09:36, Michiel de Hoon wrote: > Hi Marco, > >> So far the dependencies would be pyYaml (for the yaml/json > > parsing, but maybe i could use the stdlib json module) and > > numpy/scipy for the extraction of curve parameters. Does > > this sound ok? > It would be better to avoid scipy, as currently it is not used in any Biopython module. > Same goes for pyYaml; keep in mind that the number of users may decrease if there are more dependencies (and it puts a burden on developers also for testing). > Numpy should be OK; a quick count suggests that 12 biopython modules are already using it. Ok, I'll try to remove all this dependencies: if scipy will not be included, then most probably also numpy won't be needed. The drawback is that it would be difficult to fit the sigmoid functions to the PM curves (so the lag, plateau and slope parameters won't be computed). > >> how about a "phenomics" module? > In principle that sounds fine to me. > > Best, > -Michiel. > I'll try to draft a first tentative version and let you know how it goes. Best, Marco From mjldehoon at yahoo.com Sun Jan 12 21:07:33 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 12 Jan 2014 18:07:33 -0800 (PST) Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52D13DF1.50308@unifi.it> Message-ID: <1389578853.40150.YahooMailBasic@web164003.mail.gq1.yahoo.com> Hi Marco, -------------------------------------------- > if scipy will not be included, then most probably also numpy won't be needed. The > drawback is that it would be difficult to fit the sigmoid functions to the PM curves > (so the lag, plateau and slope parameters won't be computed). If it is not too complicated, you may consider implementing a routine (using numpy if it helps) to fit the sigmoid functions to the PM curves. Best, -Michiel. From p.j.a.cock at googlemail.com Tue Jan 14 18:31:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Jan 2014 23:31:55 +0000 Subject: [Biopython-dev] Fwd: Call for Org Admins for OBF's 2014 Google Summer of Code participation Message-ID: Would anyone like to help organise our GSoC efforts? See below... Thanks! Peter ---------- Forwarded message ---------- From: Hilmar Lapp The 2014 Google Summer of Code (GSoC) is coming up soon. The published timeline [1] puts the mentoring organization applications from Feb 3 to 14. OBF participated on behalf of our member projects from 2010-2012, and those participations were both important and successful. Through them, our projects gained new contributors, new features, and new community members. The mentors involved from our projects learned as much from the experience as the students, and formed bonds. The mentoring organization payment allowed OBF to sponsor community events and infrastructure. To participate this year, we have to designate 2-3 people as primary and backup organization administrators. This is an important role, and we are looking for people from our community to step forward to serve. An org admin?s role is in many ways that of a cat herder. The whole team of mentors and admins creates the experience for the students, but it falls on the admin to ?keep it together.? Google holds the mentoring organization, not its mentors, accountable for the actions (or non-actions) of its mentors or community, and it falls on the org admin to carry that accountability through to the org?s mentors. The org admin?s responsibilities include: ? Representing our online face to GSoC, in particular to GSoC students. ? Shepherding our mentoring organization application, and submitting it. ? Working out processes and rules for mentors as well as students that promote transparency, fairness, and protect from late-in-the-game surprises. ? Knowing GSoC rules and processes, and making sure ours are consistent with them. ? Reminding participants of rules, and enforcing them in the event it is necessary. ? Mediating, and sometimes arbitrating between students and mentors when needed. ? Ensuring that GSoC timelines are met by everyone. The person we are looking for will genuinely care about the well-being of our communities, is well organized, stays calm in email storms, communicates clearly, has good people skills, and generally is known as a good listener. If you are interested in helping us out in this role, please email us (by Jan 21, 2014) a statement at board at open-bio.org explaining how you would fit well in this role, and what your vision for our GSoC participation is. You need not be a developer or programmer to respond, but for now we do require that you have been active in some capacity in at least one of our project?s communities. Please include in your email a brief summary of such activities even if you are a core developer for one of our projects. We are looking forward to hearing from you! Hilmar Lapp, OBF President, on behalf of the OBF Board of Directors [1] http://www.google-melange.com/gsoc/events/google/gsoc2014 From saketkc at gmail.com Wed Jan 22 16:59:54 2014 From: saketkc at gmail.com (Saket Choudhary) Date: Thu, 23 Jan 2014 03:29:54 +0530 Subject: [Biopython-dev] Samtools Wrapper Message-ID: I had written a Samtools(http://samtools.sourceforge.net/) wrapper for Biopython. The initial pull request had a lot of of issues. I have tried to incorporate most of the changes as suggested by Peter. It however requires more testing. If some people on the list can help me in testing, it would help me get it merged onto the master. Pull request: https://github.com/biopython/biopython/pull/180 Repo: https://github.com/saketkc/biopython/tree/samtools_wrapper Thanks, Saket From zruan1991 at gmail.com Thu Jan 23 16:50:07 2014 From: zruan1991 at gmail.com (Zheng Ruan) Date: Thu, 23 Jan 2014 16:50:07 -0500 Subject: [Biopython-dev] Pull Request for Codon Alignment GSoC project In-Reply-To: References:

Message-ID: Sorry for the late update. But I fixed the compatibility issues of my code and it's now able to run under different python environment. It seems I messed up my commit in the summer, so I manually rebased all the code. The individual commit message may not be informative now. The Travis CI result can be found at https://travis-ci.org/biopython/biopython/builds/17506590. Failure in python3.3 is in the Consensus test and I assume it's not my problem hopefully. My pull request: https://github.com/biopython/biopython/pull/259 Thanks, Ruan On Fri, Nov 15, 2013 at 6:16 AM, Peter Cock wrote: > On Fri, Nov 15, 2013 at 2:57 AM, Zheng Ruan wrote: > > Hi all, > > > > Since a beta version of biopython has been released, it might be safe to > > merge my GSoC work. > > We'll normally only do minor bug fixes and documentation improvements > between a beta and final release, but this is perfect timing for getting > your > work merged immediately after Biopython 1.63 is out. > > (So far there have been no problems reported with the beta, and > we've fixed some newline issues when testing under Windows) > > > I just made a pull request and there seems to be some > > problems when running the code in different platforms. In > > python2.7 and python3.3, the error message complains they > > cannot find CodonAlign module. I'm not sure why this happens. > > Some discussion is on the pull request itself: > https://github.com/biopython/biopython/pull/259 > > > The dependency of Scipy and Numpy might also > > introduce some problems for the testing. Thanks! > > We've got a mechanism for skipping tests when a dependency > is missing - the MissingExternalDependencyError and > MissingPythonDependencyError exceptions. You need > to use this in test_CodonAlign.py, see test_PDB.py etc. > > Related to this we would need to install new dependencies > on some of the buildbot slaves and perhaps TravisCI. > > Regards, > > Peter > From saketkc at gmail.com Tue Jan 28 16:13:05 2014 From: saketkc at gmail.com (Saket Choudhary) Date: Wed, 29 Jan 2014 02:43:05 +0530 Subject: [Biopython-dev] Samtools Wrapper In-Reply-To: References: Message-ID: Anyone interested? On 23 January 2014 03:29, Saket Choudhary wrote: > I had written a Samtools(http://samtools.sourceforge.net/) wrapper for > Biopython. The initial pull request > had a lot of of issues. I have tried to incorporate most of the > changes as suggested by Peter. It however requires more testing. > > If some people on the list can help me in testing, it would help me > get it merged onto the master. > > Pull request: https://github.com/biopython/biopython/pull/180 > Repo: https://github.com/saketkc/biopython/tree/samtools_wrapper > > > Thanks, > Saket From christian at brueffer.de Tue Jan 28 16:20:05 2014 From: christian at brueffer.de (Christian Brueffer) Date: Tue, 28 Jan 2014 22:20:05 +0100 Subject: [Biopython-dev] Samtools Wrapper In-Reply-To: References: Message-ID: <52E81F05.7070609@brueffer.de> Hi Saket, absolutely. Hopefully I can get around to testing the wrapper this week. Chris On 1/28/14 10:13 PM, Saket Choudhary wrote: > Anyone interested? > > On 23 January 2014 03:29, Saket Choudhary wrote: >> I had written a Samtools(http://samtools.sourceforge.net/) wrapper for >> Biopython. The initial pull request >> had a lot of of issues. I have tried to incorporate most of the >> changes as suggested by Peter. It however requires more testing. >> >> If some people on the list can help me in testing, it would help me >> get it merged onto the master. >> >> Pull request: https://github.com/biopython/biopython/pull/180 >> Repo: https://github.com/saketkc/biopython/tree/samtools_wrapper >> >> >> Thanks, >> Saket > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From saketkc at gmail.com Tue Jan 28 16:50:07 2014 From: saketkc at gmail.com (Saket Choudhary) Date: Wed, 29 Jan 2014 03:20:07 +0530 Subject: [Biopython-dev] Samtools Wrapper In-Reply-To: <52E81F05.7070609@brueffer.de> References: <52E81F05.7070609@brueffer.de> Message-ID: Thanks Christian, look forward to your comments :) On 29 January 2014 02:50, Christian Brueffer wrote: > Hi Saket, > > absolutely. Hopefully I can get around to testing the wrapper this > week. > > Chris > > > On 1/28/14 10:13 PM, Saket Choudhary wrote: >> Anyone interested? >> >> On 23 January 2014 03:29, Saket Choudhary wrote: >>> I had written a Samtools(http://samtools.sourceforge.net/) wrapper for >>> Biopython. The initial pull request >>> had a lot of of issues. I have tried to incorporate most of the >>> changes as suggested by Peter. It however requires more testing. >>> >>> If some people on the list can help me in testing, it would help me >>> get it merged onto the master. >>> >>> Pull request: https://github.com/biopython/biopython/pull/180 >>> Repo: https://github.com/saketkc/biopython/tree/samtools_wrapper >>> >>> >>> Thanks, >>> Saket >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Fri Jan 31 05:55:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 31 Jan 2014 10:55:55 +0000 Subject: [Biopython-dev] [Biopython] Google Summer of Code 2014 - Call for project ideas In-Reply-To: References: Message-ID: On Wed, Jan 29, 2014 at 9:29 PM, Eric Talevich wrote: > Hi folks, > > Google Summer of Code is on again for 2014, and the Open Bioinformatics > Foundation (OBF) is once again applying as a mentoring organization. > Participating in GSoC as an organization is very competitive, and we will > need your help in gathering a good set of ideas and potential mentors for > Biopython's role in GSoC this year. > > If you have an idea for a Summer of Code project, please post your idea > here on the Biopython mailing list for discussion and start an outline on > this wiki page: > http://biopython.org/wiki/Google_Summer_of_Code > > We also welcome ideas that fit with OBF's mission but are not part of a > single Bio* project, or span multiple projects -- these ideas can be posted > on the OBF wiki and discussed on the OBF mailing list: > http://www.open-bio.org/wiki/Google_Summer_of_Code#Project_ideas > http://lists.open-bio.org/mailman/listinfo/open-bio-l > > Here's to another fun and productive Summer of Code! > > Cheers, > Eric & Raoul Thanks Eric & Raoul, Remember that the ideas don't have to come from potential mentors - if as a student there is something you'd particularly like to work on please ask, and perhaps we can find a suitable (Biopython) mentor. Regards, Peter From idoerg at gmail.com Mon Jan 6 19:56:17 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 6 Jan 2014 14:56:17 -0500 Subject: [Biopython-dev] OBO parser & DAG Message-ID: Hi all, Is there any effort going on for developing the OBO parser & Bio-ontology DAG? If not, my lab wants to push this. We already have a basic representation using digraph from networkx, and a basic OBO parser. But i'm checking to see if there is no duplicate effort here. All very initial development. Parser: https://github.com/idoerg/go-parser (the relevan module is https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py DAG: https://github.com/osamajomaa/DAGON Thanks, Iddo -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From bartek at rezolwenta.eu.org Mon Jan 6 21:59:28 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 6 Jan 2014 22:59:28 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References: Message-ID: Hi Iddo, My student has also recently implemented a module for ontologies. Maybe we can somehow merge these efforts. His code can be found here: https://github.com/tosterovic/biopython the relevant part is Bio/Ontology best Bartek On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: > Hi all, > > Is there any effort going on for developing the OBO parser & Bio-ontology > DAG? If not, my lab wants to push this. We already have a basic > representation using digraph from networkx, and a basic OBO parser. But i'm > checking to see if there is no duplicate effort here. > > All very initial development. > > Parser: > https://github.com/idoerg/go-parser > > (the relevan module is > https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py > > > DAG: > https://github.com/osamajomaa/DAGON > > Thanks, > > Iddo > > -- > > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From idoerg at gmail.com Mon Jan 6 22:17:20 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Mon, 6 Jan 2014 17:17:20 -0500 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References: Message-ID: Hi Bartek, Thanks. I looked at it a bit. Any reason why you did your own digraphs instead of using networkx? See also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph That said, it seems very mature. But the networkx does many functions for plotting, interrogating and manipulating graphs. Your OBO parser seems quite mature. Are you planning to add other edge types? (E.g. "part_of"). What we are trying to do here, is set up (besides the parser & DAG implementation) also a measure of DAG similarities. This is due to my interest in assessing function similarity. That will be a separate module (perhaps not even useful to Biopython). So now I am not sure what to do :/ The IO modules seem complete & usable, but I would have rather seen a DAG implementation using networkx. Ideas? On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski wrote: > Hi Iddo, > > My student has also recently implemented a module for ontologies. Maybe we > can somehow merge these efforts. His code can be found here: > https://github.com/tosterovic/biopython > > the relevant part is Bio/Ontology > > best > Bartek > > On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: > >> Hi all, >> >> Is there any effort going on for developing the OBO parser & Bio-ontology >> DAG? If not, my lab wants to push this. We already have a basic >> representation using digraph from networkx, and a basic OBO parser. But >> i'm >> checking to see if there is no duplicate effort here. >> >> All very initial development. >> >> Parser: >> https://github.com/idoerg/go-parser >> >> (the relevan module is >> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >> >> >> DAG: >> https://github.com/osamajomaa/DAGON >> >> Thanks, >> >> Iddo >> >> -- >> >> Iddo Friedberg >> http://iddo-friedberg.net/contact.html >> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >> >>----.<--.>++++++.<<<<------------------------------------. >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > > > -- > Bartek Wilczynski > ================== > Institute of Informatics > University of Warsaw > http://www.mimuw.edu.pl/~bartek > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From bartek at rezolwenta.eu.org Mon Jan 6 22:26:35 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 6 Jan 2014 23:26:35 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I will meet with Kamil sometime this week and we will discuss options for switching to networkx or at least adding some compatibility layer for it. I think the information about the edge type is preserved in the DAG after parsing, so I'm not sure what you mean by "supporting" other types of relationships. Our interest was mostly in ontology term enrichment analysis, which Kamil implemented, and his version is also usable for parsing, but I think we are quite open to changes still at this point and I'm sure we will be able to come up with a good version merging the important features from both versions. best Bartek On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg wrote: > Hi Bartek, > > Thanks. I looked at it a bit. > > Any reason why you did your own digraphs instead of using networkx? See > also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph > > That said, it seems very mature. But the networkx does many functions for > plotting, interrogating and manipulating graphs. > > Your OBO parser seems quite mature. Are you planning to add other edge > types? (E.g. "part_of"). > > What we are trying to do here, is set up (besides the parser & DAG > implementation) also a measure of DAG similarities. This is due to my > interest in assessing function similarity. That will be a separate module > (perhaps not even useful to Biopython). > > So now I am not sure what to do :/ The IO modules seem complete & usable, > but I would have rather seen a DAG implementation using networkx. > > Ideas? > > > > On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> Hi Iddo, >> >> My student has also recently implemented a module for ontologies. Maybe >> we can somehow merge these efforts. His code can be found here: >> https://github.com/tosterovic/biopython >> >> the relevant part is Bio/Ontology >> >> best >> Bartek >> >> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: >> >>> Hi all, >>> >>> Is there any effort going on for developing the OBO parser & Bio-ontology >>> DAG? If not, my lab wants to push this. We already have a basic >>> representation using digraph from networkx, and a basic OBO parser. But >>> i'm >>> checking to see if there is no duplicate effort here. >>> >>> All very initial development. >>> >>> Parser: >>> https://github.com/idoerg/go-parser >>> >>> (the relevan module is >>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >>> >>> >>> DAG: >>> https://github.com/osamajomaa/DAGON >>> >>> Thanks, >>> >>> Iddo >>> >>> -- >>> >>> Iddo Friedberg >>> http://iddo-friedberg.net/contact.html >>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>> >>----.<--.>++++++.<<<<------------------------------------. >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>> >>> >> >> >> -- >> Bartek Wilczynski >> ================== >> Institute of Informatics >> University of Warsaw >> http://www.mimuw.edu.pl/~bartek >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From marco.galardini at unifi.it Tue Jan 7 13:51:44 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Tue, 07 Jan 2014 14:51:44 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? Message-ID: <52CC0670.5070909@unifi.it> Dear biopythoners, will you be interested in the addition of a PM data files parser in Biopython? Such phenotypic data should be widely used in microbial experiments, even though just a few tools/libraries are available to parse and analyze such data (so far it's available a R package called opm and a python command line tool developed by me, called DuctApe). If you think that the ability to parse such files and perform some basic analysis (like curve parameters extraction) would be interesting as a new feature for biopython, I would very much like to contribute. If you were interested in such addition, could you pleaseindicate if there is any specification on the style guide for the biopython parsers? Thanks for the amazing work, Marco -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- From bartek at rezolwenta.eu.org Tue Jan 7 20:51:19 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 7 Jan 2014 21:51:19 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I've talked with Kamil today and we've looked through some of your code. We have actually evaluated networkx before implementing our library and thought it was unnecessary for our purposes to make it a requirement for biopython as the enrichment analysis is using a relatively small subset of graph operations. However it would be very easy to make functions for converting our ontologies to networkx digraphs, either with or without gene annotations as additional attributes. As for support for different types of transitivity in relations of different type (as in your inference of ancestry for is_a and part_of relations) we are currently not supporting it, but after thinking about it, we will make a change to support this feature. Probably we will let the user to (optionally) define the transitivity between relationship types (i.e. is_a + part_of becomes part_of, etc). In general, it would be very helpful if you could give us some rough idea about your expected use cases. For example: are you expecting to modify the graphs in the networkx objects? What will you use the inferred ancestor lists for? So that the changes we make will be as useful to the community as possible. Naturally, if anyone else wants to contribute their ideas or use-cases, you are most welcome... best Bartek On Mon, Jan 6, 2014 at 11:42 PM, Iddo Friedberg wrote: > I will meet with my student tomorrow (cc'd) and we can continue this > further. > > Osama: Bartek Wilczynski's group has been working on a OBO parser / GO > module too. Their parser seems complete & useful. Their digraph > implementation is not in networkx, so I'm not sure about adopting that as > is. In any case, let's meet tomorrow and talk, and maybe the four of us can > work out a collaborative plan if we feel it's useful. > > Cheers, > > Iddo > > > On Mon, Jan 6, 2014 at 5:26 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> Hi, >> >> I will meet with Kamil sometime this week and we will discuss options for >> switching to networkx or at least adding some compatibility layer for it. I >> think the information about the edge type is preserved in the DAG after >> parsing, so I'm not sure what you mean by "supporting" other types of >> relationships. Our interest was mostly in ontology term enrichment >> analysis, which Kamil implemented, and his version is also usable for >> parsing, but I think we are quite open to changes still at this point and >> I'm sure we will be able to come up with a good version merging the >> important features from both versions. >> >> best >> Bartek >> >> >> On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg wrote: >> >>> Hi Bartek, >>> >>> Thanks. I looked at it a bit. >>> >>> Any reason why you did your own digraphs instead of using networkx? See >>> also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph >>> >>> That said, it seems very mature. But the networkx does many functions >>> for plotting, interrogating and manipulating graphs. >>> >>> Your OBO parser seems quite mature. Are you planning to add other edge >>> types? (E.g. "part_of"). >>> >>> What we are trying to do here, is set up (besides the parser & DAG >>> implementation) also a measure of DAG similarities. This is due to my >>> interest in assessing function similarity. That will be a separate module >>> (perhaps not even useful to Biopython). >>> >>> So now I am not sure what to do :/ The IO modules seem complete & >>> usable, but I would have rather seen a DAG implementation using networkx. >>> >>> Ideas? >>> >>> >>> >>> On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski < >>> bartek at rezolwenta.eu.org> wrote: >>> >>>> Hi Iddo, >>>> >>>> My student has also recently implemented a module for ontologies. Maybe >>>> we can somehow merge these efforts. His code can be found here: >>>> https://github.com/tosterovic/biopython >>>> >>>> the relevant part is Bio/Ontology >>>> >>>> best >>>> Bartek >>>> >>>> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: >>>> >>>>> Hi all, >>>>> >>>>> Is there any effort going on for developing the OBO parser & >>>>> Bio-ontology >>>>> DAG? If not, my lab wants to push this. We already have a basic >>>>> representation using digraph from networkx, and a basic OBO parser. >>>>> But i'm >>>>> checking to see if there is no duplicate effort here. >>>>> >>>>> All very initial development. >>>>> >>>>> Parser: >>>>> https://github.com/idoerg/go-parser >>>>> >>>>> (the relevan module is >>>>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >>>>> >>>>> >>>>> DAG: >>>>> https://github.com/osamajomaa/DAGON >>>>> >>>>> Thanks, >>>>> >>>>> Iddo >>>>> >>>>> -- >>>>> >>>>> Iddo Friedberg >>>>> http://iddo-friedberg.net/contact.html >>>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>>>> >>----.<--.>++++++.<<<<------------------------------------. >>>>> _______________________________________________ >>>>> Biopython-dev mailing list >>>>> Biopython-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>>>> >>>>> >>>> >>>> >>>> -- >>>> Bartek Wilczynski >>>> ================== >>>> Institute of Informatics >>>> University of Warsaw >>>> http://www.mimuw.edu.pl/~bartek >>>> >>> >>> >>> >>> -- >>> Iddo Friedberg >>> http://iddo-friedberg.net/contact.html >>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>> >>----.<--.>++++++.<<<<------------------------------------. >>> >> >> >> >> -- >> Bartek Wilczynski >> ================== >> Institute of Informatics >> University of Warsaw >> http://www.mimuw.edu.pl/~bartek >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From mjldehoon at yahoo.com Wed Jan 8 05:53:19 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 7 Jan 2014 21:53:19 -0800 (PST) Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52CC0670.5070909@unifi.it> Message-ID: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> Hi Marco, > will you be interested in the addition of a PM data files > parser in Biopython? Yes it makes sense to me. > any specification on the style guide for the biopython parsers? There is no strict set of rules, but to get you started, many modules follow this format: - Assuming a PM data file contains only a single data set, the module should contain a function "read" that takes either a file name or a file handle as the argument. - The module should contain a class (typically called "Record") that can store the data in the data file. The "read" function returns an object of this class. - Try to avoid third-party dependencies if at all possible. Would it make sense to have a single Bio.Microarray module that can house the various microarray parsers (PM, Affy, others)? Best, -Michiel. -------------------------------------------- On Tue, 1/7/14, Marco Galardini wrote: Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? To: "Biopython-Dev Mailing List" Date: Tuesday, January 7, 2014, 8:51 AM Dear biopythoners, will you be interested in the addition of a PM data files parser in Biopython? Such phenotypic data should be widely used in microbial experiments, even though just a few tools/libraries are available to parse and analyze such data (so far it's available a R package called opm and a python command line tool developed by me, called DuctApe). If you think that the ability to parse such files and perform some basic analysis (like curve parameters extraction) would be interesting as a new feature for biopython, I would very much like to contribute. If you were interested in such addition, could you pleaseindicate if there is any specification on the style guide for the biopython parsers? Thanks for the amazing work, Marco -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone:? +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From marco.galardini at unifi.it Wed Jan 8 10:32:40 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Wed, 08 Jan 2014 11:32:40 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: <52CD2948.7050102@unifi.it> Hi, On 01/08/2014 06:53 AM, Michiel de Hoon wrote: >> any specification on the style guide for the biopython parsers? > There is no strict set of rules, but to get you started, many modules > follow this format: > - Assuming a PM data file contains only a single data set, the module > should contain a function "read" that takes either a file name or a file > handle as the argument. Unfortunately, the situation is a bit mixed up: there are basically three file formats for PM data: as csv files (which can contain one or more data sets or 'plates') and as yaml/json, which can contain also some metadata. I would therefore use a similar approach as the SeqIO module, having a parse() and a read() method that returns an exception if the file contains more than one record. > - The module should contain a class (typically called "Record") that > can store the data in the data file. The "read" function returns an > object of this class. > - Try to avoid third-party dependencies if at all possible. So far the dependencies would be pyYaml (for the yaml/json parsing, but maybe i could use the stdlib json module) and numpy/scipy for the extraction of curve parameters. Does this sound ok? > > Would it make sense to have a single Bio.Microarray module that can > house the various microarray parsers (PM, Affy, others)? I don't know if that would be a good strategy: the Phenotype Microarrays are very different from the other proper microarrays; how about a "phenomics" module? > > Best, > -Michiel. Kind regards, Marco From idoerg at gmail.com Wed Jan 8 16:55:16 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 8 Jan 2014 11:55:16 -0500 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi Bartek. See inlined responses below. Anyone else has ideas on how to do this? On Tue, Jan 7, 2014 at 3:51 PM, Bartek Wilczynski wrote: > Hi, > > I've talked with Kamil today and we've looked through some of your code. > > We have actually evaluated networkx before implementing our library and > thought it was unnecessary for our purposes to make it a requirement for > biopython as the enrichment analysis is using a relatively small subset of > graph operations. > > I understand your rationale, but I disagree with it, mainly for design reasons. 1. Enrichment analysis is only one of many different applications that can be performed with GO. Therefore, saying that features are unnecessary because a particular use case does not require them should not be a design consideration for a module that is intended for general use. Rather, a generic package handling ontologies should be just that: generic, and disengaged from any kind of application. Therefore, if your package is intended for biopython the use-case (enrichment analysis) should be decoupled from the parser + data structure. 2. The graph features that you wrote in Digraph exist in networkx anyway, or am I missing something? So why not take advantage of nx instead of redoing it even if it does have many redundant (for you) graph manipulation & diagnostic features? Someone else may want to use these features, including the graphics nx provides, etc. > However it would be very easy to make functions for converting our > ontologies to networkx digraphs, either with or without gene annotations as > additional attributes. > > Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I agree there. > As for support for different types of transitivity in relations of > different type (as in your inference of ancestry for is_a and part_of > relations) we are currently not supporting it, but after thinking about it, > we will make a change to support this feature. Probably we will let the > user to (optionally) define the transitivity between relationship types > (i.e. is_a + part_of becomes part_of, etc). > > In general, it would be very helpful if you could give us some rough idea > about your expected use cases. For example: are you expecting to modify the > graphs in the networkx objects? What will you use the inferred ancestor > lists for? So that the changes we make will be as useful to the community > as possible. > The idea is that expected use cases should not impact the design of a basic parser + data structure. In my lab, we are looking at inferred ancestors lists to calculate semantic similarity, but it really doesn't matter what we (or anyone) will end up using the GO module for. If you provide enrichment analysis *on top* of the parser + data structure (as a separate module), and we provide semantic similarity (again as a separate module *on top* of the parser + data structure) those are nice bonuses. But the parser + data structure should be as general as possible. That is: include all the information in the OBO file, placed in a digraph structure that can be comprehensively interrogated, visualized and manipulated (which is what nx offers). > Naturally, if anyone else wants to contribute their ideas or use-cases, > you are most welcome... > > best > Bartek > > > > On Mon, Jan 6, 2014 at 11:42 PM, Iddo Friedberg wrote: > >> I will meet with my student tomorrow (cc'd) and we can continue this >> further. >> >> Osama: Bartek Wilczynski's group has been working on a OBO parser / GO >> module too. Their parser seems complete & useful. Their digraph >> implementation is not in networkx, so I'm not sure about adopting that as >> is. In any case, let's meet tomorrow and talk, and maybe the four of us can >> work out a collaborative plan if we feel it's useful. >> >> Cheers, >> >> Iddo >> >> >> On Mon, Jan 6, 2014 at 5:26 PM, Bartek Wilczynski < >> bartek at rezolwenta.eu.org> wrote: >> >>> Hi, >>> >>> I will meet with Kamil sometime this week and we will discuss options >>> for switching to networkx or at least adding some compatibility layer for >>> it. I think the information about the edge type is preserved in the DAG >>> after parsing, so I'm not sure what you mean by "supporting" other types of >>> relationships. Our interest was mostly in ontology term enrichment >>> analysis, which Kamil implemented, and his version is also usable for >>> parsing, but I think we are quite open to changes still at this point and >>> I'm sure we will be able to come up with a good version merging the >>> important features from both versions. >>> >>> best >>> Bartek >>> >>> >>> On Mon, Jan 6, 2014 at 11:17 PM, Iddo Friedberg wrote: >>> >>>> Hi Bartek, >>>> >>>> Thanks. I looked at it a bit. >>>> >>>> Any reason why you did your own digraphs instead of using networkx? See >>>> also: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph >>>> >>>> That said, it seems very mature. But the networkx does many functions >>>> for plotting, interrogating and manipulating graphs. >>>> >>>> Your OBO parser seems quite mature. Are you planning to add other edge >>>> types? (E.g. "part_of"). >>>> >>>> What we are trying to do here, is set up (besides the parser & DAG >>>> implementation) also a measure of DAG similarities. This is due to my >>>> interest in assessing function similarity. That will be a separate module >>>> (perhaps not even useful to Biopython). >>>> >>>> So now I am not sure what to do :/ The IO modules seem complete & >>>> usable, but I would have rather seen a DAG implementation using networkx. >>>> >>>> Ideas? >>>> >>>> >>>> >>>> On Mon, Jan 6, 2014 at 4:59 PM, Bartek Wilczynski < >>>> bartek at rezolwenta.eu.org> wrote: >>>> >>>>> Hi Iddo, >>>>> >>>>> My student has also recently implemented a module for ontologies. >>>>> Maybe we can somehow merge these efforts. His code can be found here: >>>>> https://github.com/tosterovic/biopython >>>>> >>>>> the relevant part is Bio/Ontology >>>>> >>>>> best >>>>> Bartek >>>>> >>>>> On Mon, Jan 6, 2014 at 8:56 PM, Iddo Friedberg wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Is there any effort going on for developing the OBO parser & >>>>>> Bio-ontology >>>>>> DAG? If not, my lab wants to push this. We already have a basic >>>>>> representation using digraph from networkx, and a basic OBO parser. >>>>>> But i'm >>>>>> checking to see if there is no duplicate effort here. >>>>>> >>>>>> All very initial development. >>>>>> >>>>>> Parser: >>>>>> https://github.com/idoerg/go-parser >>>>>> >>>>>> (the relevan module is >>>>>> https://github.com/idoerg/go-parser/blob/master/go_obo_parser.py >>>>>> >>>>>> >>>>>> DAG: >>>>>> https://github.com/osamajomaa/DAGON >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Iddo >>>>>> >>>>>> -- >>>>>> >>>>>> Iddo Friedberg >>>>>> http://iddo-friedberg.net/contact.html >>>>>> >>>>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>>>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>>>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>>>>> >>----.<--.>++++++.<<<<------------------------------------. >>>>>> _______________________________________________ >>>>>> Biopython-dev mailing list >>>>>> Biopython-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Bartek Wilczynski >>>>> ================== >>>>> Institute of Informatics >>>>> University of Warsaw >>>>> http://www.mimuw.edu.pl/~bartek >>>>> >>>> >>>> >>>> >>>> -- >>>> Iddo Friedberg >>>> http://iddo-friedberg.net/contact.html >>>> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >>>> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >>>> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>>> >>----.<--.>++++++.<<<<------------------------------------. >>>> >>> >>> >>> >>> -- >>> Bartek Wilczynski >>> ================== >>> Institute of Informatics >>> University of Warsaw >>> http://www.mimuw.edu.pl/~bartek >>> >> >> >> >> -- >> Iddo Friedberg >> http://iddo-friedberg.net/contact.html >> ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> >> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. >> .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >> >>----.<--.>++++++.<<<<------------------------------------. >> > > > > -- > Bartek Wilczynski > ================== > Institute of Informatics > University of Warsaw > http://www.mimuw.edu.pl/~bartek > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From eric.talevich at gmail.com Wed Jan 8 19:18:52 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 8 Jan 2014 11:18:52 -0800 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: On Tue, Jan 7, 2014 at 12:51 PM, Bartek Wilczynski wrote: > [...] > We have actually evaluated networkx before implementing our library and > thought it was unnecessary for our purposes to make it a requirement for > biopython as the enrichment analysis is using a relatively small subset of > graph operations. > > However it would be very easy to make functions for converting our > ontologies to networkx digraphs, either with or without gene annotations as > additional attributes. > > [...] > Chiming in briefly -- While it does make sense to avoid new dependencies in Biopython modules, the Bio.Phylo module already uses networkx for a couple of non-core functions. Since networkx is portable, pure Python (as far as I can tell) and has no additional hard dependencies of its own, it would not necessarily be a major inconvenience to the user if this new module depends on networkx. From bartek at rezolwenta.eu.org Wed Jan 8 21:05:47 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 8 Jan 2014 22:05:47 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I'll answer below (even though I do have a bad habit of top-posting my answers, sorry). On Wed, Jan 8, 2014 at 5:55 PM, Iddo Friedberg wrote: >I wrote about not using networkx as the main data structure for ontologies > > I understand your rationale, but I disagree with it, mainly for design > reasons. > > Ok. I guess I was a bit too brief with my explanation. We have considered using networkx and decided not to use it mainly because it was not very useful, and implementing what was neccessary for parsing was not an issue for Kamil. Networkx is currently not a dependency for biopython or for bio.phylo and it is not even listed as an "optional software" along with reportlab and such (http://biopython.org/wiki/Download I guess it should after Eric's comment). My understaing is that this is a policy of thinking twice before adding something additional as a dependency, because we would need to care for compatibility with different networkx versions. Taking Bio.Phylo as an example I do think it is a good policy to keep such libraries as optional if possible. Besides, we did not particularly liked the way digraphs are implemented in Networkx with a heavy use of dictionaries, as this might get slow for large dictionaries. Now to your specific points: > 1. Enrichment analysis is only one of many different applications that can > be performed with GO. Therefore, saying that features are unnecessary > because a particular use case does not require them should not be a design > consideration for a module that is intended for general use. Rather, a > generic package handling ontologies should be just that: generic, and > disengaged from any kind of application. Therefore, if your package is > intended for biopython the use-case (enrichment analysis) should be > decoupled from the parser + data structure. > > We were obviously tailoring this to our needs, but I have to disagree with your argument. Because of the reasons above, I think that we should use external digraph library _only_ if it is _necessary_ for the parsing and storing and it clearly isn't. For the separation of parsing and enrichment, we do want to keep the parser separate from the enrichment analysis and I thought it was quite clear with the use of separate classes, but we are absolutely open to discuss how to organize these modules. If you take the parser as a separate module - using networkx is even less needed(really no need for a big graph manipulation lib here). > 2. The graph features that you wrote in Digraph exist in networkx anyway, > or am I missing something? So why not take advantage of nx instead of > redoing it even if it does have many redundant (for you) graph manipulation > & diagnostic features? Someone else may want to use these features, > including the graphics nx provides, etc. > Yes, the point is that parsing, storing a digraph is a simple thing and there is no need to add a large library for that. If there was a digraph library in biopython, it would be stupid not to use it, but I don't feel we need to add a dependency here. > >> However it would be very easy to make functions for converting our >> ontologies to networkx digraphs, either with or without gene annotations as >> additional attributes. >> >> > Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I > agree there. > > That's also exactly what is done in the Bio.Phylo. We are planning to write a function analogous to Bio.Phylo._util.to_networkx() which would take a simple digraph obtained from parsing an OBO file and give you a networkx digraph with all the data for manipulation. > As for support for different types of transitivity in relations of >> different type (as in your inference of ancestry for is_a and part_of >> relations) we are currently not supporting it, but after thinking about it, >> we will make a change to support this feature. Probably we will let the >> user to (optionally) define the transitivity between relationship types >> (i.e. is_a + part_of becomes part_of, etc). >> >> In general, it would be very helpful if you could give us some rough idea >> about your expected use cases. For example: are you expecting to modify the >> graphs in the networkx objects? What will you use the inferred ancestor >> lists for? So that the changes we make will be as useful to the community >> as possible. >> > > > The idea is that expected use cases should not impact the design of a > basic parser + data structure. In my lab, we are looking at inferred > ancestors lists to calculate semantic similarity, but it really doesn't > matter what we (or anyone) will end up using the GO module for. If you > provide enrichment analysis *on top* of the parser + data structure (as a > separate module), and we provide semantic similarity (again as a separate > module *on top* of the parser + data structure) those are nice bonuses. But > the parser + data structure should be as general as possible. That is: > include all the information in the OBO file, placed in a digraph structure > that can be comprehensively interrogated, visualized and manipulated (which > is what nx offers). > > I was unfortunately not very clear here. What I meant was that we were considering what is necessary for typical uses of ontologies were parsing, and accessing the terms. And I think that is valid in the sense that majority of users is treating Ontologies as read-only data (not that many biopython users are making their own ontologies, otherwise, it would have been implemented ages ago...). As for the second argument: I do fully agree that there should be some separation between ontology and annotation reading and any functionality "on-top" of it. But I think that this would be not a reasonable thing to do to include networkx as the main data structure. Currently there is only one library that biopython depends on and it is numpy. I do not see networkx as equaly important. I think that we should go the way paved by the bio.phylo and use the simple digraph (which already holds all the information from the OBO files afaik) for parsing output and convert it to networkx where necessary. best Bartek -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From eric.talevich at gmail.com Wed Jan 8 21:47:12 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 8 Jan 2014 13:47:12 -0800 Subject: [Biopython-dev] List of Biopython dependencies Message-ID: Folks, It appears that there are two lists of Biopython dependencies on the wiki: http://biopython.org/wiki/List_of_applications_executed_via_Biopython http://biopython.org/wiki/Download#Required_Software There is potential for confusion here. Shall we merge these lists on the Download page? We could follow Debian's approach of splitting the list into Depends (strict requirements, e.g. Python), Recommended (non-essential but important, e.g. numpy and reportlab), and Suggested (everything else). Regards, Eric From p.j.a.cock at googlemail.com Wed Jan 8 21:55:13 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 8 Jan 2014 21:55:13 +0000 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References: Message-ID: On Wednesday, January 8, 2014, Eric Talevich wrote: > Folks, > > It appears that there are two lists of Biopython dependencies on the wiki: > http://biopython.org/wiki/List_of_applications_executed_via_Biopython > http://biopython.org/wiki/Download#Required_Software > > There is potential for confusion here. Shall we merge these lists on the > Download page? Tiago started the list of applications page while updating the buildbots - it is as noted a work in progress. Maybe it should be just command line applications, which can then be linked to from the downloads page? > We could follow Debian's approach of splitting the list into Depends > (strict requirements, e.g. Python), Recommended (non-essential but > important, e.g. numpy and reportlab), and Suggested (everything else). > That split makes sense for libraries, but is it also sensible for command line applications Biopython can call? Regards, Peter From idoerg at gmail.com Wed Jan 8 21:59:00 2014 From: idoerg at gmail.com (Iddo Friedberg) Date: Wed, 8 Jan 2014 16:59:00 -0500 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: So it seems like we are debating minimal external dependencies vs. maximizing functionality. How about the following: the OBO file will be read into an independent, basic digraph like Bartek's team has already constructed. But we will also have the ability to transfer the biopython DAG into a networkx DAG, so that anyone wishing to play elaborate games with the ontology structure (as we do), can do so without re-inventing the wheel. How does that sound? One thing about networkx: I still really, really like it :), and we started writing the digraph based on it because of this page: http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph But the fact that this spec using networkx has been written does not have to commit us to this particular design. On Wed, Jan 8, 2014 at 4:05 PM, Bartek Wilczynski wrote: > Hi, > > I'll answer below (even though I do have a bad habit of top-posting my > answers, sorry). > > > On Wed, Jan 8, 2014 at 5:55 PM, Iddo Friedberg wrote: > > >I wrote about not using networkx as the main data structure for ontologies > >> >> I understand your rationale, but I disagree with it, mainly for design >> reasons. >> >> Ok. I guess I was a bit too brief with my explanation. We have considered > using networkx and decided not to use it mainly because it was not very > useful, and implementing what was neccessary for parsing was not an issue > for Kamil. Networkx is currently not a dependency for biopython or for > bio.phylo and it is not even listed as an "optional software" along with > reportlab and such (http://biopython.org/wiki/Download I guess it should > after Eric's comment). My understaing is that this is a policy of thinking > twice before adding something additional as a dependency, because we would > need to care for compatibility with different networkx versions. Taking > Bio.Phylo as an example I do think it is a good policy to keep such > libraries as optional if possible. Besides, we did not particularly liked > the way digraphs are implemented in Networkx with a heavy use of > dictionaries, as this might get slow for large dictionaries. > Now to your specific points: > > >> 1. Enrichment analysis is only one of many different applications that >> can be performed with GO. Therefore, saying that features are unnecessary >> because a particular use case does not require them should not be a design >> consideration for a module that is intended for general use. Rather, a >> generic package handling ontologies should be just that: generic, and >> disengaged from any kind of application. Therefore, if your package is >> intended for biopython the use-case (enrichment analysis) should be >> decoupled from the parser + data structure. >> >> We were obviously tailoring this to our needs, but I have to disagree > with your argument. Because of the reasons above, I think that we should > use external digraph library _only_ if it is _necessary_ for the parsing > and storing and it clearly isn't. > > For the separation of parsing and enrichment, we do want to keep the > parser separate from the enrichment analysis and I thought it was quite > clear with the use of separate classes, but we are absolutely open to > discuss how to organize these modules. > > If you take the parser as a separate module - using networkx is even less > needed(really no need for a big graph manipulation lib here). > > >> 2. The graph features that you wrote in Digraph exist in networkx anyway, >> or am I missing something? So why not take advantage of nx instead of >> redoing it even if it does have many redundant (for you) graph manipulation >> & diagnostic features? Someone else may want to use these features, >> including the graphics nx provides, etc. >> > > Yes, the point is that parsing, storing a digraph is a simple thing and > there is no need to add a large library for that. If there was a digraph > library in biopython, it would be stupid not to use it, but I don't feel we > need to add a dependency here. > > >> >>> However it would be very easy to make functions for converting our >>> ontologies to networkx digraphs, either with or without gene annotations as >>> additional attributes. >>> >>> >> Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I >> agree there. >> >> > That's also exactly what is done in the Bio.Phylo. We are planning to > write a function analogous to Bio.Phylo._util.to_networkx() which would > take a simple digraph obtained from parsing an OBO file and give you a > networkx digraph with all the data for manipulation. > > > >> As for support for different types of transitivity in relations of >>> different type (as in your inference of ancestry for is_a and part_of >>> relations) we are currently not supporting it, but after thinking about it, >>> we will make a change to support this feature. Probably we will let the >>> user to (optionally) define the transitivity between relationship types >>> (i.e. is_a + part_of becomes part_of, etc). >>> >>> In general, it would be very helpful if you could give us some rough >>> idea about your expected use cases. For example: are you expecting to >>> modify the graphs in the networkx objects? What will you use the inferred >>> ancestor lists for? So that the changes we make will be as useful to the >>> community as possible. >>> >> >> >> The idea is that expected use cases should not impact the design of a >> basic parser + data structure. In my lab, we are looking at inferred >> ancestors lists to calculate semantic similarity, but it really doesn't >> matter what we (or anyone) will end up using the GO module for. If you >> provide enrichment analysis *on top* of the parser + data structure (as a >> separate module), and we provide semantic similarity (again as a separate >> module *on top* of the parser + data structure) those are nice bonuses. But >> the parser + data structure should be as general as possible. That is: >> include all the information in the OBO file, placed in a digraph structure >> that can be comprehensively interrogated, visualized and manipulated (which >> is what nx offers). >> >> I was unfortunately not very clear here. What I meant was that we were > considering what is necessary for typical uses of ontologies were parsing, > and accessing the terms. And I think that is valid in the sense that > majority of users is treating Ontologies as read-only data (not that many > biopython users are making their own ontologies, otherwise, it would have > been implemented ages ago...). > > As for the second argument: I do fully agree that there should be some > separation between ontology and annotation reading and any functionality > "on-top" of it. But I think that this would be not a reasonable thing to do > to include networkx as the main data structure. Currently there is only one > library that biopython depends on and it is numpy. I do not see networkx as > equaly important. I think that we should go the way paved by the bio.phylo > and use the simple digraph (which already holds all the information from > the OBO files afaik) for parsing output and convert it to networkx where > necessary. > > best > Bartek > -- > Bartek Wilczynski > ================== > Institute of Informatics > University of Warsaw > http://www.mimuw.edu.pl/~bartek > -- Iddo Friedberg http://iddo-friedberg.net/contact.html ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> >>----.<--.>++++++.<<<<------------------------------------. From eric.talevich at gmail.com Wed Jan 8 22:51:45 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 8 Jan 2014 14:51:45 -0800 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References:

Message-ID: On Wed, Jan 8, 2014 at 1:55 PM, Peter Cock wrote: > > > On Wednesday, January 8, 2014, Eric Talevich wrote: > >> Folks, >> >> It appears that there are two lists of Biopython dependencies on the wiki: >> http://biopython.org/wiki/List_of_applications_executed_via_Biopython >> http://biopython.org/wiki/Download#Required_Software >> >> There is potential for confusion here. Shall we merge these lists on the >> Download page? > > > Tiago started the list of applications page while updating the > buildbots - it is as noted a work in progress. Maybe it should > be just command line applications, which can then be linked > to from the downloads page? > So the list of command line applications on the Download page would be moved to the separate Wiki page? Seems reasonable if the list is long. We could follow Debian's approach of splitting the list into Depends >> (strict requirements, e.g. Python), Recommended (non-essential but >> important, e.g. numpy and reportlab), and Suggested (everything else). >> > > That split makes sense for libraries, but is it also sensible > for command line applications Biopython can call? > The sub-headings could be: - Required software - Recommended Python libraries [needed to enable significant functionality] - Additional Python libraries [used by individual Biopython modules for additional features] "In addition Biopython includes wrapper code for calling a number of third party command line tools [link]" I imagine this format would be useful to sysadmins (and maybe distro package maintainers, too). But it may be just as well to merge the "required" and "recommended" lists and just note caveats inline as is currently done. From tiagoantao at gmail.com Thu Jan 9 09:02:25 2014 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 9 Jan 2014 09:02:25 +0000 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References:

Message-ID: Hi all, What I am trying to do is to get a list of stuff that assures that biopython is tested top to bottom. This in order to assure QA for releases and a complete and stable test framework. I am also trying to create a VM machine for testing (based on CloudBioLinux - with Brad's help), so that anyone can do a full test and there is a complete VM with everything needed out-of-the-box. It so happens that I have been travelling since mid-December (back in office on Monday) and things have been mostly frozen (restarting next week). The format for the list is really not much important, but it has a completely different slant from the requirements page: it is meant for developers and testing, etc. Whatever you guys prefer is fine by me. I just plan to review the list top-to-bottom and assure that it is complete and up-to-date. Tiago On 8 January 2014 22:51, Eric Talevich wrote: > On Wed, Jan 8, 2014 at 1:55 PM, Peter Cock wrote: > >> >> >> On Wednesday, January 8, 2014, Eric Talevich wrote: >> >>> Folks, >>> >>> It appears that there are two lists of Biopython dependencies on the >>> wiki: >>> http://biopython.org/wiki/List_of_applications_executed_via_Biopython >>> http://biopython.org/wiki/Download#Required_Software >>> >>> There is potential for confusion here. Shall we merge these lists on the >>> Download page? >> >> >> Tiago started the list of applications page while updating the >> buildbots - it is as noted a work in progress. Maybe it should >> be just command line applications, which can then be linked >> to from the downloads page? >> > > So the list of command line applications on the Download page would be > moved to the separate Wiki page? Seems reasonable if the list is long. > > > We could follow Debian's approach of splitting the list into Depends >>> (strict requirements, e.g. Python), Recommended (non-essential but >>> important, e.g. numpy and reportlab), and Suggested (everything else). >>> >> >> That split makes sense for libraries, but is it also sensible >> for command line applications Biopython can call? >> > > The sub-headings could be: > - Required software > - Recommended Python libraries [needed to enable significant functionality] > - Additional Python libraries [used by individual Biopython modules for > additional features] > "In addition Biopython includes wrapper code for calling a number of third > party command line tools [link]" > > I imagine this format would be useful to sysadmins (and maybe distro > package maintainers, too). But it may be just as well to merge the > "required" and "recommended" lists and just note caveats inline as is > currently done. > -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From arklenna at gmail.com Thu Jan 9 14:50:34 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 9 Jan 2014 09:50:34 -0500 Subject: [Biopython-dev] List of Biopython dependencies In-Reply-To: References:

Message-ID: On Thu, Jan 9, 2014 at 4:02 AM, Tiago Ant?o wrote: > Hi all, > > What I am trying to do is to get a list of stuff that assures that > biopython is tested top to bottom. This in order to assure QA for releases > and a complete and stable test framework. > I am also trying to create a VM machine for testing (based on CloudBioLinux > - with Brad's help), so that anyone can do a full test and there is a > complete VM with everything needed out-of-the-box. > It so happens that I have been travelling since mid-December (back in > office on Monday) and things have been mostly frozen (restarting next > week). > > The format for the list is really not much important, but it has a > completely different slant from the requirements page: it is meant for > developers and testing, etc. Whatever you guys prefer is fine by me. I just > plan to review the list top-to-bottom and assure that it is complete and > up-to-date. > > The difference of intent is logical, but it should be possible to make a single list that fulfills both goals. The user audience list divided by requirements, recommended, etc. can also be a top-to-bottom testing list - to test, install everything from every list. It would be more error-prone to maintain two separate lists. Lenna From bartek at rezolwenta.eu.org Thu Jan 9 23:06:30 2014 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 10 Jan 2014 00:06:30 +0100 Subject: [Biopython-dev] OBO parser & DAG In-Reply-To: References:

Message-ID: Hi, I think that's a great soution. In fact Kamil has now implemented such conversion. Take a look if this suits your needs: https://github.com/tosterovic/biopython/commit/904e51303391411b42697205b09181378662807d We would be very happy to contribute this module to biopython repo, so it would be great if more people would take a look and suggest changes needed for accepting this as a part of biopython best Bartek On Wed, Jan 8, 2014 at 10:59 PM, Iddo Friedberg wrote: > So it seems like we are debating minimal external dependencies vs. > maximizing functionality. > > How about the following: the OBO file will be read into an independent, > basic digraph like Bartek's team has already constructed. > > But we will also have the ability to transfer the biopython DAG into a > networkx DAG, so that anyone wishing to play elaborate games with the > ontology structure (as we do), can do so without re-inventing the wheel. > > How does that sound? > > One thing about networkx: I still really, really like it :), and we > started writing the digraph based on it because of this page: > > http://biopython.org/wiki/Gene_Ontology#GO_Directed_Acyclic_Graph > > But the fact that this spec using networkx has been written does not have > to commit us to this particular design. > > > > On Wed, Jan 8, 2014 at 4:05 PM, Bartek Wilczynski < > bartek at rezolwenta.eu.org> wrote: > >> Hi, >> >> I'll answer below (even though I do have a bad habit of top-posting my >> answers, sorry). >> >> >> On Wed, Jan 8, 2014 at 5:55 PM, Iddo Friedberg wrote: >> >> >I wrote about not using networkx as the main data structure for >> ontologies >> >>> >>> I understand your rationale, but I disagree with it, mainly for design >>> reasons. >>> >>> Ok. I guess I was a bit too brief with my explanation. We have >> considered using networkx and decided not to use it mainly because it was >> not very useful, and implementing what was neccessary for parsing was not >> an issue for Kamil. Networkx is currently not a dependency for biopython or >> for bio.phylo and it is not even listed as an "optional software" along >> with reportlab and such (http://biopython.org/wiki/Download I guess it >> should after Eric's comment). My understaing is that this is a policy of >> thinking twice before adding something additional as a dependency, because >> we would need to care for compatibility with different networkx versions. >> Taking Bio.Phylo as an example I do think it is a good policy to keep such >> libraries as optional if possible. Besides, we did not particularly liked >> the way digraphs are implemented in Networkx with a heavy use of >> dictionaries, as this might get slow for large dictionaries. >> Now to your specific points: >> >> >>> 1. Enrichment analysis is only one of many different applications that >>> can be performed with GO. Therefore, saying that features are unnecessary >>> because a particular use case does not require them should not be a design >>> consideration for a module that is intended for general use. Rather, a >>> generic package handling ontologies should be just that: generic, and >>> disengaged from any kind of application. Therefore, if your package is >>> intended for biopython the use-case (enrichment analysis) should be >>> decoupled from the parser + data structure. >>> >>> We were obviously tailoring this to our needs, but I have to disagree >> with your argument. Because of the reasons above, I think that we should >> use external digraph library _only_ if it is _necessary_ for the parsing >> and storing and it clearly isn't. >> >> For the separation of parsing and enrichment, we do want to keep the >> parser separate from the enrichment analysis and I thought it was quite >> clear with the use of separate classes, but we are absolutely open to >> discuss how to organize these modules. >> >> If you take the parser as a separate module - using networkx is even less >> needed(really no need for a big graph manipulation lib here). >> >> >>> 2. The graph features that you wrote in Digraph exist in networkx >>> anyway, or am I missing something? So why not take advantage of nx instead >>> of redoing it even if it does have many redundant (for you) graph >>> manipulation & diagnostic features? Someone else may want to use these >>> features, including the graphics nx provides, etc. >>> >> >> Yes, the point is that parsing, storing a digraph is a simple thing and >> there is no need to add a large library for that. If there was a digraph >> library in biopython, it would be stupid not to use it, but I don't feel we >> need to add a dependency here. >> >> >>> >>>> However it would be very easy to make functions for converting our >>>> ontologies to networkx digraphs, either with or without gene annotations as >>>> additional attributes. >>>> >>>> >>> Well, the idea is actually to maintain ontologies as nx digraphs. Yes, I >>> agree there. >>> >>> >> That's also exactly what is done in the Bio.Phylo. We are planning to >> write a function analogous to Bio.Phylo._util.to_networkx() which would >> take a simple digraph obtained from parsing an OBO file and give you a >> networkx digraph with all the data for manipulation. >> >> >> >>> As for support for different types of transitivity in relations of >>>> different type (as in your inference of ancestry for is_a and part_of >>>> relations) we are currently not supporting it, but after thinking about it, >>>> we will make a change to support this feature. Probably we will let the >>>> user to (optionally) define the transitivity between relationship types >>>> (i.e. is_a + part_of becomes part_of, etc). >>>> >>>> In general, it would be very helpful if you could give us some rough >>>> idea about your expected use cases. For example: are you expecting to >>>> modify the graphs in the networkx objects? What will you use the inferred >>>> ancestor lists for? So that the changes we make will be as useful to the >>>> community as possible. >>>> >>> >>> >>> The idea is that expected use cases should not impact the design of a >>> basic parser + data structure. In my lab, we are looking at inferred >>> ancestors lists to calculate semantic similarity, but it really doesn't >>> matter what we (or anyone) will end up using the GO module for. If you >>> provide enrichment analysis *on top* of the parser + data structure (as a >>> separate module), and we provide semantic similarity (again as a separate >>> module *on top* of the parser + data structure) those are nice bonuses. But >>> the parser + data structure should be as general as possible. That is: >>> include all the information in the OBO file, placed in a digraph structure >>> that can be comprehensively interrogated, visualized and manipulated (which >>> is what nx offers). >>> >>> I was unfortunately not very clear here. What I meant was that we were >> considering what is necessary for typical uses of ontologies were parsing, >> and accessing the terms. And I think that is valid in the sense that >> majority of users is treating Ontologies as read-only data (not that many >> biopython users are making their own ontologies, otherwise, it would have >> been implemented ages ago...). >> >> As for the second argument: I do fully agree that there should be some >> separation between ontology and annotation reading and any functionality >> "on-top" of it. But I think that this would be not a reasonable thing to do >> to include networkx as the main data structure. Currently there is only one >> library that biopython depends on and it is numpy. I do not see networkx as >> equaly important. I think that we should go the way paved by the bio.phylo >> and use the simple digraph (which already holds all the information from >> the OBO files afaik) for parsing output and convert it to networkx where >> necessary. >> >> best >> Bartek >> -- >> Bartek Wilczynski >> ================== >> Institute of Informatics >> University of Warsaw >> http://www.mimuw.edu.pl/~bartek >> > > > > -- > Iddo Friedberg > http://iddo-friedberg.net/contact.html > ++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.> > ++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----. > .>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>> > >>----.<--.>++++++.<<<<------------------------------------. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From mjldehoon at yahoo.com Fri Jan 10 08:36:50 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 10 Jan 2014 00:36:50 -0800 (PST) Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52CD2948.7050102@unifi.it> Message-ID: <1389343010.49219.YahooMailBasic@web164005.mail.gq1.yahoo.com> Hi Marco, > So far the dependencies would be pyYaml (for the yaml/json > parsing, but maybe i could use the stdlib json module) and > numpy/scipy for the extraction of curve parameters. Does > this sound ok? It would be better to avoid scipy, as currently it is not used in any Biopython module. Same goes for pyYaml; keep in mind that the number of users may decrease if there are more dependencies (and it puts a burden on developers also for testing). Numpy should be OK; a quick count suggests that 12 biopython modules are already using it. > how about a "phenomics" module? In principle that sounds fine to me. Best, -Michiel. From marco.galardini at unifi.it Sat Jan 11 12:49:53 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Sat, 11 Jan 2014 13:49:53 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <1389343010.49219.YahooMailBasic@web164005.mail.gq1.yahoo.com> References: <1389343010.49219.YahooMailBasic@web164005.mail.gq1.yahoo.com> Message-ID: <52D13DF1.50308@unifi.it> On 10/01/2014 09:36, Michiel de Hoon wrote: > Hi Marco, > >> So far the dependencies would be pyYaml (for the yaml/json > > parsing, but maybe i could use the stdlib json module) and > > numpy/scipy for the extraction of curve parameters. Does > > this sound ok? > It would be better to avoid scipy, as currently it is not used in any Biopython module. > Same goes for pyYaml; keep in mind that the number of users may decrease if there are more dependencies (and it puts a burden on developers also for testing). > Numpy should be OK; a quick count suggests that 12 biopython modules are already using it. Ok, I'll try to remove all this dependencies: if scipy will not be included, then most probably also numpy won't be needed. The drawback is that it would be difficult to fit the sigmoid functions to the PM curves (so the lag, plateau and slope parameters won't be computed). > >> how about a "phenomics" module? > In principle that sounds fine to me. > > Best, > -Michiel. > I'll try to draft a first tentative version and let you know how it goes. Best, Marco From mjldehoon at yahoo.com Mon Jan 13 02:07:33 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 12 Jan 2014 18:07:33 -0800 (PST) Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52D13DF1.50308@unifi.it> Message-ID: <1389578853.40150.YahooMailBasic@web164003.mail.gq1.yahoo.com> Hi Marco, -------------------------------------------- > if scipy will not be included, then most probably also numpy won't be needed. The > drawback is that it would be difficult to fit the sigmoid functions to the PM curves > (so the lag, plateau and slope parameters won't be computed). If it is not too complicated, you may consider implementing a routine (using numpy if it helps) to fit the sigmoid functions to the PM curves. Best, -Michiel. From p.j.a.cock at googlemail.com Tue Jan 14 23:31:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 14 Jan 2014 23:31:55 +0000 Subject: [Biopython-dev] Fwd: Call for Org Admins for OBF's 2014 Google Summer of Code participation Message-ID: Would anyone like to help organise our GSoC efforts? See below... Thanks! Peter ---------- Forwarded message ---------- From: Hilmar Lapp The 2014 Google Summer of Code (GSoC) is coming up soon. The published timeline [1] puts the mentoring organization applications from Feb 3 to 14. OBF participated on behalf of our member projects from 2010-2012, and those participations were both important and successful. Through them, our projects gained new contributors, new features, and new community members. The mentors involved from our projects learned as much from the experience as the students, and formed bonds. The mentoring organization payment allowed OBF to sponsor community events and infrastructure. To participate this year, we have to designate 2-3 people as primary and backup organization administrators. This is an important role, and we are looking for people from our community to step forward to serve. An org admin?s role is in many ways that of a cat herder. The whole team of mentors and admins creates the experience for the students, but it falls on the admin to ?keep it together.? Google holds the mentoring organization, not its mentors, accountable for the actions (or non-actions) of its mentors or community, and it falls on the org admin to carry that accountability through to the org?s mentors. The org admin?s responsibilities include: ? Representing our online face to GSoC, in particular to GSoC students. ? Shepherding our mentoring organization application, and submitting it. ? Working out processes and rules for mentors as well as students that promote transparency, fairness, and protect from late-in-the-game surprises. ? Knowing GSoC rules and processes, and making sure ours are consistent with them. ? Reminding participants of rules, and enforcing them in the event it is necessary. ? Mediating, and sometimes arbitrating between students and mentors when needed. ? Ensuring that GSoC timelines are met by everyone. The person we are looking for will genuinely care about the well-being of our communities, is well organized, stays calm in email storms, communicates clearly, has good people skills, and generally is known as a good listener. If you are interested in helping us out in this role, please email us (by Jan 21, 2014) a statement at board at open-bio.org explaining how you would fit well in this role, and what your vision for our GSoC participation is. You need not be a developer or programmer to respond, but for now we do require that you have been active in some capacity in at least one of our project?s communities. Please include in your email a brief summary of such activities even if you are a core developer for one of our projects. We are looking forward to hearing from you! Hilmar Lapp, OBF President, on behalf of the OBF Board of Directors [1] http://www.google-melange.com/gsoc/events/google/gsoc2014 From saketkc at gmail.com Wed Jan 22 21:59:54 2014 From: saketkc at gmail.com (Saket Choudhary) Date: Thu, 23 Jan 2014 03:29:54 +0530 Subject: [Biopython-dev] Samtools Wrapper Message-ID: I had written a Samtools(http://samtools.sourceforge.net/) wrapper for Biopython. The initial pull request had a lot of of issues. I have tried to incorporate most of the changes as suggested by Peter. It however requires more testing. If some people on the list can help me in testing, it would help me get it merged onto the master. Pull request: https://github.com/biopython/biopython/pull/180 Repo: https://github.com/saketkc/biopython/tree/samtools_wrapper Thanks, Saket From zruan1991 at gmail.com Thu Jan 23 21:50:07 2014 From: zruan1991 at gmail.com (Zheng Ruan) Date: Thu, 23 Jan 2014 16:50:07 -0500 Subject: [Biopython-dev] Pull Request for Codon Alignment GSoC project In-Reply-To: References: