From bonnalraoul at ingm.it Thu Jul 1 04:58:56 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 1 Jul 2010 10:58:56 +0200 Subject: [BioRuby] R: [GSoC][NeXML and RDF API] Update Message-ID: <2cf53f17-b591-4f36-8f86-5f190d6e6123@ingm.it> Dear All, probably I missed something, do we plan to switch to RSpec, stick to Unit::Test or a mix? -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it > -----Messaggio originale----- > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > bounces at lists.open-bio.org] Per conto di Anurag Priyam > Inviato: gioved? 1 luglio 2010 00:07 > A: bioruby at lists.open-bio.org > Oggetto: [BioRuby] [GSoC][NeXML and RDF API] Update > > In the last week and half of this week I have: > * been able to work out an NeXML serializer - the code sits in the > master > branch[1]. In the API page[ 2 ] I have added a discussion on the > implementation. > * started working on the RDF API - i should be able to come up with > RSpecs > by the end of this week > > In the remaining part of the week I will: > * come with an RDF API implementation > * work on refactoring some of the previous code( matrix and the > sequences > part ) as Pjotr had pointed out in the last review. > > Perhaps, we can have another round of code review: for the NeXML > serializer? > This will help me allocate time in the coming weeks to fix the issues > with > the code. > > [1] http://github.com/yeban/bioruby > [2] > https://www.nescent.org/wg_phyloinformatics/NeXML_and_RDF_API_for_BioRu > by > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Thu Jul 1 06:24:29 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 1 Jul 2010 19:24:29 +0900 Subject: [BioRuby] R: [GSoC][NeXML and RDF API] Update In-Reply-To: <2cf53f17-b591-4f36-8f86-5f190d6e6123@ingm.it> References: <2cf53f17-b591-4f36-8f86-5f190d6e6123@ingm.it> Message-ID: <20100701102430.1EEC81CBC401@idnmail.gen-info.osaka-u.ac.jp> Hi, We don't "swith to" RSpec. RSpec would be used in addition to Test::Unit in some cases. In the development of Matz Ruby, both Test::Unit and RSpec are used. Test::Unit is mainly used to check functionality and regressions of each component, and to check portability running on every platform. RSpec is mainly used to describe and guarantee Ruby's specifications. For BioRuby, ideally both would be needed, but resource is limited. First priority is to write tests using Test::Unit, because it is bundled with Ruby and thus it is easy to check if all functions correctly work on variety of platforms and Ruby versions. Apart from testing, RSpec may be helpful when designing API, as working examples with documentation. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Thu, 1 Jul 2010 10:58:56 +0200 "Raoul Bonnal" wrote: > Dear All, > probably I missed something, > do we plan to switch to RSpec, stick to Unit::Test or a mix? > > -- > Raoul J.P. Bonnal > Life Science Informatics > Integrative Biology Program > Fondazione INGM > Via F. Sforza 28 > 20122 Milano, IT > phone: +39 02 006 623 26 > fax: +39 02 006 623 46 > http://www.ingm.it > > > > -----Messaggio originale----- > > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > > bounces at lists.open-bio.org] Per conto di Anurag Priyam > > Inviato: gioved? 1 luglio 2010 00:07 > > A: bioruby at lists.open-bio.org > > Oggetto: [BioRuby] [GSoC][NeXML and RDF API] Update > > > > In the last week and half of this week I have: > > * been able to work out an NeXML serializer - the code sits in the > > master > > branch[1]. In the API page[ 2 ] I have added a discussion on the > > implementation. > > * started working on the RDF API - i should be able to come up with > > RSpecs > > by the end of this week > > > > In the remaining part of the week I will: > > * come with an RDF API implementation > > * work on refactoring some of the previous code( matrix and the > > sequences > > part ) as Pjotr had pointed out in the last review. > > > > Perhaps, we can have another round of code review: for the NeXML > > serializer? > > This will help me allocate time in the coming weeks to fix the issues > > with > > the code. > > > > [1] http://github.com/yeban/bioruby > > [2] > > https://www.nescent.org/wg_phyloinformatics/NeXML_and_RDF_API_for_BioRu > > by > > > > -- > > Anurag Priyam, > > 2nd Year Undergraduate, > > Department of Mechanical Engineering, > > IIT Kharagpur. > > +91-9775550642 > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Fri Jul 2 00:49:08 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 2 Jul 2010 13:49:08 +0900 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> <20100628120005.61D751CBC32B@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100702044909.099EA1CBC4C1@idnmail.gen-info.osaka-u.ac.jp> Hi, I find good examples of alternatives for method_missing in the book "Refactoring: Ruby Edition" section 6.18 and 6.19. (http://www.amazon.com/Refactoring-Ruby-Jay-Fields/dp/0321603508 ) Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Mon, 28 Jun 2010 19:52:37 +0530 Anurag Priyam wrote: > > Please never use method_missing. It breaks error reporting and > > makes very hard to debug and maintain both library codes and > > user scripts. > > > > Hmm, I have experienced that. But the way I have used it affects only the > Bio::NeXML::Writer class, so is it not safe in this case? Anyways I will > change it as it does not offer much improvement to the code readability in > my case. I just find it exciting :). > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 08:27:11 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Fri, 2 Jul 2010 17:57:11 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100625065539.GD22887@thebird.nl> References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: > > > The idea here was to implement a type system and stick close to the class > > hierarchy followed in the schema. However, looking back, I myself do not > > find the code for the Matrix class very elegant. > > Over 3000 lines of code for an XML parser sends out alarm bells. If > you have the right testing files it should be easy to refactor. Make > it simpler. Also, when parsing this type of XML some Ruby reflection > may come in handy - I did some of that in my BioRuby GEO parser, which > lives in my GEO branch on github. You should look at each class and > see if you can refactor it down to a single solution. Just make sure > it is not at the expense of readability and understanding. > > Post us some ideas here, before you start hacking code. > > Perhaps it would be better to use Kernel.const_get and initialize the correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken for matrix cells. It would make the code a *lot* shorter. I should also do away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 10:15:12 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Fri, 2 Jul 2010 19:45:12 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: > > > Perhaps it would be better to use Kernel.const_get and initialize the > correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken > for matrix cells. It would make the code a *lot* shorter. I should also do > away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). > > Is it advisable to use Ruby's Matrix class as the base class of Bio::NeXML::Matrix? I can define methods to make it mutable. What I do not like about the matrix class is its use of Vectors in many places. I would like to redefine those methods to work with Arrays. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 10:45:33 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Fri, 2 Jul 2010 20:15:33 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: > > Perhaps it would be better to use Kernel.const_get and initialize the > correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken > for matrix cells. It would make the code a *lot* shorter. I should also do > away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). > > I meant Module#const_get :P. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From ngoto at gen-info.osaka-u.ac.jp Fri Jul 2 16:26:58 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Sat, 03 Jul 2010 05:26:58 +0900 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

Message-ID: <20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Hi Anurag, I don't understand what you want to do, and how the const_get shorten your code. Please show example code. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > > Perhaps it would be better to use Kernel.const_get and initialize the > > correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken > > for matrix cells. It would make the code a *lot* shorter. I should also do > > away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). > > > > > I meant Module#const_get :P. > > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From anurag08priyam at gmail.com Fri Jul 2 17:07:45 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 02:37:45 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: > > I don't understand what you want to do, and how the const_get shorten > your code. Please show example code. > > Taking matrix for example, in each type of Matrix I have defined an add add_row method which accepts a row which would be of the same kind( DnaSeqMatrix will take DnaSeqRow ) : def add_row( row ) raise InvalidRowException, "DnaSeqRow expected." unless row.instance_of? DnaSeqRow row_set[ row.id ] = row end If instead I define a add_row method in SeqMatrix like this: def add_row( row ) # a DnaSeqMatrix will take a DnaSeqRow type = self.class.to_s.sub( /Matrix/, 'Row' ) klass = NeXML.const_get( type ) raise InvalidRowException, "#{type} expected." unless row.instance_of? klass end This way I won't have to define add_row for each sub type of SeqRow. Similarly for others. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 17:16:28 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 02:46:28 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: Other possible solution I can come up with is to use "define_method" or "class_eval" to create methods on the lines of attr_accessor. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From ngoto at gen-info.osaka-u.ac.jp Sat Jul 3 06:40:22 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 3 Jul 2010 19:40:22 +0900 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> Hi Anurag, On Fri, 2 Jul 2010 19:45:12 +0530 Anurag Priyam wrote: > Is it advisable to use Ruby's Matrix class as the base class of > Bio::NeXML::Matrix? I can define methods to make it mutable. What I do not > like about the matrix class is its use of Vectors in many places. I would > like to redefine those methods to work with Arrays. It seems it isn't. The Ruby's Matrix class is implemented as the mathematical matrix. It has many useful mathematical matrix operations, but all elements should be Numeric values. However, it seems that NeXML matrix stores not only numeric values but also sequences or characters. In addition, the use of the name Matrix might be a source of confusion or conflicts with Ruby standard Matrix, even if in the separate name space. It may be good to rename Bio::NeXML::Matrix to another name if it isn't hard to keep consistency with the specification of NeXML. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From anurag08priyam at gmail.com Sat Jul 3 06:58:30 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 16:28:30 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> Message-ID: > In addition, the use of the name Matrix might be a source of confusion > or conflicts with Ruby standard Matrix, even if in the separate name > space. It may be good to rename Bio::NeXML::Matrix to another name > if it isn't hard to keep consistency with the specification of NeXML. > > I think they are called character state matrices in the phylo terminology. But something like CharactersStateMatrices would be two long. What about CharacterMatrices or StateMatrices? Perhaps Rutger can help me here. This thread is getting too long, I am starting a new one regarding some of my doubts related to NeXML sequence implementation. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sat Jul 3 07:13:43 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 16:43:43 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Sequences( doubts ) Message-ID: This is going to be a long mail. NeXML's characters tag serves as a storage block for sequences. Sequences can be described in NeXML in two ways, raw( with the seq tag ) and granular( with the cell tags ). NeXML offers six kind of sequences : 1. Protein( AA ) 2. DNA 3. RNA 4. Restriction 5. Standard 6. Continuous As of now, the NeXML parser just returns the sequence as a string. It should return Bio::Sequence. BioRuby already has classes to work with AA and NA sequences. I was thinking of adding classes to represent Restriction, Standard and Continuous sequences. Should I work on adding support for these as a core BioRuby classes or just as a part of NeXML lib? I will have to adapt Bio::Sequence class to recognize the new sequences. Why does the Bio::Sequence#guess method use the some 90% way of recognition between AA and NA? Why not use regexp instead? -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sat Jul 3 07:15:40 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 16:45:40 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: > > Other possible solution I can come up with is to use "define_method" or > "class_eval" to create methods on the lines of attr_accessor. > > I would like to know, what you would have thought this out? -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From pjotr.public14 at thebird.nl Sat Jul 3 10:25:41 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 3 Jul 2010 16:25:41 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: <20100703142541.GA3153@thebird.nl> On Sat, Jul 03, 2010 at 02:37:45AM +0530, Anurag Priyam wrote: > > > > I don't understand what you want to do, and how the const_get shorten > > your code. Please show example code. > > > > > Taking matrix for example, in each type of Matrix I have defined an add > add_row method which accepts a row which would be of the same kind( > DnaSeqMatrix will take DnaSeqRow ) : so, why do you need to test for the type? see below. > def add_row( row ) > raise InvalidRowException, "DnaSeqRow expected." unless > row.instance_of? DnaSeqRow > row_set[ row.id ] = row > end > > If instead I define a add_row method in SeqMatrix like this: > > def add_row( row ) > # a DnaSeqMatrix will take a DnaSeqRow > type = self.class.to_s.sub( /Matrix/, 'Row' ) > klass = NeXML.const_get( type ) > raise InvalidRowException, "#{type} expected." unless row.instance_of? > klass > end > > This way I won't have to define add_row for each sub type of SeqRow. > Similarly for others. This would be a factory, right? I think what you want to do is good in its objective - trying to shorten the implementation. But do you really need class inspection/reflection here? Asking for the class name is usually prevented by having proper attributes in the classes. That is if you use OOP. Question is whether you really require this. The problem with the code I had (and have) was the really wide and deep use of OOP classes. That led to duplication of code and little 'feel' for correctness of what is in there. Deep OOP hierarchies are evil. Duplication is ugly. Inspection/reflection is evil too - like Naohisa reacted, pretty much - it is only used in exceptional cases when there is no other elegant way of resolving issues. It can be powerful, but only use when really required, as other people often fail to understand what it does - and code should be self-documenting. I think you need to ask more fundamental questions to yourself. Why not use BioRuby basic types for most data represented by NeXML? Only use special objects when there is real added value. So DnaSeqRow would simply be a Sequence (or even list of char) and DnaSeqMatrix would be a list of Sequence. If you have further attributes create a new composite object (like SequenceFeatures, or if you think more functionally again a tuple of sequence(s) and features?). This way you don't create a hierarchy that booms into hundreds of specialized object we won't use elsewhere. To differentiate between a DnaSequence and RnaSequence you do not need different objects. Both are strings (in BioRuby). You could even settle for Ruby's primitive types and containers. Likewise, even if you need a Matrix, you don't need RnaMatrix and DnaMatrix. I am sure of that. They are only specializations in name, the code in there should be identical. If you go down the OOP route, make use of Ruby's mixin's. Search Google for "ruby mixin deep oop hierarchy". My recommendation is to refactor the library to use as primitive a type as possible, at every point. When you run into functionality that requires a more complex type, because there is no other way - that is the moment to design and add it. I don't know the full depth of the NeXML format, but I can predict it consists of primitive types in ordered ways. This can be mirrored by the implementation. If you do it like this you won't have to use inspection (like above question). OOP classes are for harnessing special functionality that go with a certain type. Do not create a type unless you need something special. You can propose changes to existing BioRuby types - in particular with the RDF implementation. I know some people will balk at this rewrite - but to be honest, if you want your library to be useful to others it needs rethinking. I would take a week out of your plan to experiment with different object models - just start with a small subset. When you think something works, roll it out all the way. That can be done quickly. Read, read, read on the Internet about object models. One thing you can consider is to use an intermediate object structure for parsing the XML into Ruby - and next fork it out into logical data structures. I do that regularly as the XML 'model' does not normally map to Ruby well. One example of mine is here http://github.com/pjotrp/swig2doc/blob/master/lib/input/doxyxmlparser.rb Doxy objects are stored in http://github.com/pjotrp/swig2doc/blob/master/lib/cobj/doxy/doxycobjs.rb Note swig2doc also contains a convenience class for using libxml2 in http://github.com/pjotrp/swig2doc/blob/master/lib/input/xmleasyreader.rb And while you are at refactoring, why not make sure the parser does not fill memory. Pj. PS. Are you using another NeXML OOP implementation as a model - Perl, Python, Java? I would like to know, so I can have a look. From pjotr.public14 at thebird.nl Sat Jul 3 10:43:32 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 3 Jul 2010 16:43:32 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] Sequences( doubts ) In-Reply-To: References: Message-ID: <20100703144332.GB3153@thebird.nl> On Sat, Jul 03, 2010 at 04:43:43PM +0530, Anurag Priyam wrote: > This is going to be a long mail. > > NeXML's characters tag serves as a storage block for sequences. Sequences > can be described in NeXML in two ways, raw( with the seq tag ) and granular( > with the cell tags ). NeXML offers six kind of sequences : > 1. Protein( AA ) > 2. DNA > 3. RNA > 4. Restriction > 5. Standard > 6. Continuous How do these sequences differ? In name only? Can you store them as tuples: (:dna,sequence) (:rna,sequence) (:re,sequence) etc. You could argue for a new SequenceType object. To store type + sequence. > As of now, the NeXML parser just returns the sequence as a string. It should > return Bio::Sequence. BioRuby already has classes to work with AA and NA > sequences. I was thinking of adding classes to represent Restriction, > Standard and Continuous sequences. Should I work on adding support for these > as a core BioRuby classes or just as a part of NeXML lib? I will have to > adapt Bio::Sequence class to recognize the new sequences. I think your library needs to return the simplest type possible. Even in standard Ruby containers (even simpler than BioRuby's types). That makes for the most flexible implementation for others to use. BioRuby's types may change in the future too - I am working on that. Your library is not really in the business of creating new types - unless you create new functionality - like an alignment algorithm, or some transformation to a new type. Better keep it simple. If I have a NeXML file containing an alignment of sequences - I expect simply to pull out those sequences with their ID's. Right? You could return a BioRuby Alignment object, but that is overkill. I can make one myself, which I want to use, my own type of MyAlignment. What I really want is a list of (id, list[nucleotide]) or (id, String) in BioRuby's case, if that is what is stored in NeXML. in pseudo code seqlist = NeXML.read(fn).fetch_alignment print seqlist.first > "id","agtct" or in the form of an iterator NeXML.read(fn).fetch_alignment.each_seq do | id, seq | do something end and likewise use cases for other scenarios. For RDF the use cases are similar, I would guess. NeXML.read(fn).fetch_alignment.to_rdf Keep it simple, again. The thing is that most people over complicate things in OOP. All, and I mean all, Bio* projects over complicate things. > Why does the Bio::Sequence#guess method use the some 90% way of recognition > between AA and NA? Why not use regexp instead? I am not a great fan of guessing formats. It is always error prone. Both amino acid sequences and nucleotide sequences can consist of a combination of shared letters. Still, I guess regex's are slower. Feel free to come with an alternative and measure how well it does. But I have trouble seeing why you need it. Pj. From chmille4 at gmail.com Sun Jul 4 09:33:51 2010 From: chmille4 at gmail.com (Chase Miller) Date: Sun, 4 Jul 2010 09:33:51 -0400 Subject: [BioRuby] Bio::Assembly Message-ID: Hi all I've worked with BioPerl in the past, but I'm considering using ruby/bioruby for a new project and have a few questions. - I can't seem to find a Bio::Assembly module for BioRuby. Has anyone done any work on this? - What's the current solution for working with assembly data in BioRuby? If nothing is out there, I may try to take a whack at this myself. Thanks, Chase From ngoto at gen-info.osaka-u.ac.jp Wed Jul 7 10:56:46 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Wed, 07 Jul 2010 10:56:46 -0400 Subject: [BioRuby] Bio::Assembly In-Reply-To: References: Message-ID: <20100707105642.0BA7.EEF6E030@gen-info.osaka-u.ac.jp> Hi Chase, As far as I know, no Ruby/BioRuby components like BioPerl's Bio::Assembly are available. Currently, sequences and qualities formatted in Fasta, FASTQ, ABI, SCF and other file formats can be treated with BioRuby. However, I don't know good ways to handle assembly output data. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > Hi all > > > I've worked with BioPerl in the past, but I'm considering using ruby/bioruby > for a new project and have a few questions. > > > - I can't seem to find a Bio::Assembly module for BioRuby. Has anyone > done any work on this? > > > - What's the current solution for working with assembly data in BioRuby? > > > > If nothing is out there, I may try to take a whack at this myself. > > Thanks, > > Chase > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Wed Jul 7 12:17:40 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 7 Jul 2010 18:17:40 +0200 Subject: [BioRuby] Bio::Assembly In-Reply-To: <20100707105642.0BA7.EEF6E030@gen-info.osaka-u.ac.jp> References: <20100707105642.0BA7.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: <20100707161740.GA12452@thebird.nl> Hi Chase, On Wed, Jul 07, 2010 at 10:56:46AM -0400, Naohisa Goto wrote: > As far as I know, no Ruby/BioRuby components like BioPerl's > Bio::Assembly are available. > > Currently, sequences and qualities formatted in Fasta, FASTQ, ABI, SCF > and other file formats can be treated with BioRuby. > However, I don't know good ways to handle assembly output data. Before rewriting from scratch, see if there are useful C/C++ libraries we can map to with SWIG (BioLib project). I can help with that. Alternatively check what is written in JAVA - JRuby makes accessing anything on the JVM rather trivial, these days. Or even interface to Perl libraries and map those to Ruby. I would start with that, then see what is a useful feature set for BioRuby. Design it in such a way that external libraries can be replaced in time, when someone feels like writing the support. We are getting BioRuby plugin support, allowing for flexible approaches to adding functionality. Pj. From pjotr.public14 at thebird.nl Wed Jul 7 13:46:05 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 7 Jul 2010 19:46:05 +0200 Subject: [BioRuby] Bio::Assembly In-Reply-To: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> Message-ID: <20100707174605.GA14014@thebird.nl> On Wed, Jul 07, 2010 at 09:27:17AM -0700, Christian Zmasek wrote: > C/C++? Perl?? Really? > > Do you think it is a good idea to introduce so many dependencies? Not at the same time ;) > While these might not be a problem for expert users, I worry that > the to more complexities are introduced the less likely the avarage > biologist with little coding experience might be tempted to use > BioRuby. It won't affect BioRuby core. Pure Ruby is great - and we should always aim for that with 'core' BioRuby. Still, we don't have enough developers to support every nook and cranny of bioinformatics. We need to get functionality in fast, when we can. If functionality exists elsewhere: use that. It does not make sense to rewrite everything from scratch. As long as we provide clear interfaces, we can always start replacing stuff with pure Ruby. If someone feels like recoding. By forcing dependencies into a 'plugin' we still keep BioRuby pure. People are free to create plugins, which may have dependencies. If you want the functionality badly enough, and you don't want to write it yourself, find the way of using the plugin. This is one of the major reasons for providing a plugin infrastructure. Which, btw, is the same plugin system that Rails uses (thanks to Raoul and Toshiaki). A plugin is not core BioRuby. BioRuby itself does not get dependencies - other than highly common libraries like libxml. We simply don't have the people to achieve everything. Not to mention that many libraries, like EMBOSS, outperform Ruby in terms of processing speed and memory consumption. When we call BLAST we don't write BLAST ourselves in Ruby. Those are also dependencies. Outside dealing with dependencies one thing we may want to think about is incompatible plugins. For example, if I were to use a plugin for the JVM, it may not work together with a plugin for standard Ruby. My take is that it does not really matter. You have to choose one or the other ;). Truth is we have too small a community to provide the luxury edition of BioRuby which can handle everything (which is also true for the other Bio* projects). See mappings and dependencies as part of the development of the ultimate BioRuby. A process, transition, evolution. Plugins make it possible. Pj. From emanuele.orlando at gmail.com Wed Jul 7 14:00:58 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Wed, 7 Jul 2010 20:00:58 +0200 Subject: [BioRuby] contribute to Bioruby Message-ID: Dear Bioruby, This is my first mail to the group. My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from three years i'm working as IT consultant. As others i love ruby and i would be happy to contribute to the development of bioruby. How can i make some contributions? Thanks Emanuele -- Emanuele Orlando http://www.emanueleorlando.com http://it.linkedin.com/in/kooru From pjotr.public14 at thebird.nl Wed Jul 7 14:36:22 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 7 Jul 2010 20:36:22 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: Message-ID: <20100707183622.GA16079@thebird.nl> Welcome Emanuele, On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > Dear Bioruby, > > This is my first mail to the group. > My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from > three years i'm working as IT consultant. > As others i love ruby and i would be happy to contribute to the development > of bioruby. How can i make some contributions? Choose a topic to work on. You can simply get a free account on github.com and clone the repository. Start coding - when it works we can easily merge it in to the main tree. If you want to discuss programming and design, just mail this list. Pj. From ngoto at gen-info.osaka-u.ac.jp Wed Jul 7 14:41:25 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Wed, 07 Jul 2010 14:41:25 -0400 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: Message-ID: <20100707144119.EC29.EEF6E030@gen-info.osaka-u.ac.jp> Hi Emanuele, Thanks. See similar topics for general introduction. http://lists.open-bio.org/pipermail/bioruby/2010-June/001319.html In addition, I'd like to suggest ChemRuby, cheminformatics library written in Ruby, and a closely related project with BioRuby. http://chemruby.org/ It was developed together with BioRuby, but have not been maintained in these days. It may also be great if you can contribute to ChemRuby. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > Dear Bioruby, > > This is my first mail to the group. > My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from > three years i'm working as IT consultant. > As others i love ruby and i would be happy to contribute to the development > of bioruby. How can i make some contributions? > > Thanks > > Emanuele > > -- > Emanuele Orlando > http://www.emanueleorlando.com > http://it.linkedin.com/in/kooru > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Wed Jul 7 16:42:16 2010 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 8 Jul 2010 05:42:16 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: <20100707183622.GA16079@thebird.nl> References: <20100707183622.GA16079@thebird.nl> Message-ID: Hi, This explanation by Jan may be useful when you try GitHub. http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html Toshiaki On 2010/07/08, at 3:36, Pjotr Prins wrote: > Welcome Emanuele, > > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: >> Dear Bioruby, >> >> This is my first mail to the group. >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from >> three years i'm working as IT consultant. >> As others i love ruby and i would be happy to contribute to the development >> of bioruby. How can i make some contributions? > > Choose a topic to work on. You can simply get a free account on > github.com and clone the repository. Start coding - when it works we > can easily merge it in to the main tree. > > If you want to discuss programming and design, just mail this list. > > Pj. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Wed Jul 7 18:40:21 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Wed, 07 Jul 2010 18:40:21 -0400 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100707184019.38B3.EEF6E030@gen-info.osaka-u.ac.jp> Hi Anurag, > I think they are called character state matrices in the phylo terminology. > But something like CharactersStateMatrices would be two long. What about > CharacterMatrices or StateMatrices? Perhaps Rutger can help me here. It is generally bad thing to abbreviate only because it is too long. Modifying the upstream original names might be a source of confusion. In this case, using CharactersStateMatrices as is is the best. If the name is expected to be frequently used by library users, short name could also be added. However, as Pjotr already mentioned, the class might not be needed, depending on the design of classes. Sorry for late response. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org From emanuele.orlando at gmail.com Thu Jul 8 05:46:19 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Thu, 8 Jul 2010 11:46:19 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: Thanks to all for the precious links. I didn't know chemruby and now i've some questions :) Its mailing list is empty. If i want discuss about programming/design of chemruby, where do i mail? On bioruby list with a specific tag (example [CHEMRUBY]) or i use chemruby list? At the beginning who has contributed for Chemruby? Priority where focused? Thanks Emanuele -- Emanuele Orlando http://www.emanueleorlando.com http://it.linkedin.com/in/kooru On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama wrote: > Hi, > > This explanation by Jan may be useful when you try GitHub. > > http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html > > Toshiaki > > On 2010/07/08, at 3:36, Pjotr Prins wrote: > > > Welcome Emanuele, > > > > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > >> Dear Bioruby, > >> > >> This is my first mail to the group. > >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and > from > >> three years i'm working as IT consultant. > >> As others i love ruby and i would be happy to contribute to the > development > >> of bioruby. How can i make some contributions? > > > > Choose a topic to work on. You can simply get a free account on > > github.com and clone the repository. Start coding - when it works we > > can easily merge it in to the main tree. > > > > If you want to discuss programming and design, just mail this list. > > > > Pj. > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From chmille4 at gmail.com Thu Jul 8 07:29:41 2010 From: chmille4 at gmail.com (Chase Miller) Date: Thu, 8 Jul 2010 07:29:41 -0400 Subject: [BioRuby] Bio::Assembly In-Reply-To: <20100707174605.GA14014@thebird.nl> References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> <20100707174605.GA14014@thebird.nl> Message-ID: Pjtor Thanks for the great ideas! The BioLib project sounds interesting. I'll have to ask my old GSOC mentor Mark Jensen about it. However, for my purposes, It may be easier to just code it up in ruby (also I'm dying to write ruby code after moving over from perl :) ). What I need is an ace parser and fairly simple scaffold and contig objects. Although slim in features, it could be a good starting point for a pure ruby assembly module, which I wouldn't mind maintaining. How closely does BioRuby like to follow the BioPerl API? I've noticed that BioRuby seems to handle file formats differently than BioPerl, with most of them being in Bio::db/. Can anyone expand on this? Is there any place I can read about the best practices for BioRuby. For example, I haven't seen any instances of using hashes to pass parameters in method calls e.g. a.parse( :file => file, :format => format ) Is this frowned upon in BioRuby? Thanks, Chase On Wed, Jul 7, 2010 at 1:46 PM, Pjotr Prins wrote: > On Wed, Jul 07, 2010 at 09:27:17AM -0700, Christian Zmasek wrote: > > C/C++? Perl?? Really? > > > > Do you think it is a good idea to introduce so many dependencies? > > Not at the same time ;) > > > While these might not be a problem for expert users, I worry that > > the to more complexities are introduced the less likely the avarage > > biologist with little coding experience might be tempted to use > > BioRuby. > > It won't affect BioRuby core. > > Pure Ruby is great - and we should always aim for that with 'core' > BioRuby. Still, we don't have enough developers to support every nook > and cranny of bioinformatics. > > We need to get functionality in fast, when we can. If functionality > exists elsewhere: use that. It does not make sense to rewrite > everything from scratch. As long as we provide clear interfaces, we > can always start replacing stuff with pure Ruby. If someone feels > like recoding. > > By forcing dependencies into a 'plugin' we still keep BioRuby pure. > People are free to create plugins, which may have dependencies. If you > want the functionality badly enough, and you don't want to write it > yourself, find the way of using the plugin. This is one of the major > reasons for providing a plugin infrastructure. Which, btw, is the > same plugin system that Rails uses (thanks to Raoul and Toshiaki). > > A plugin is not core BioRuby. BioRuby itself does not get dependencies > - other than highly common libraries like libxml. > > We simply don't have the people to achieve everything. Not to mention > that many libraries, like EMBOSS, outperform Ruby in terms of > processing speed and memory consumption. When we call BLAST we don't > write BLAST ourselves in Ruby. Those are also dependencies. > > Outside dealing with dependencies one thing we may want to think about > is incompatible plugins. For example, if I were to use a plugin > for the JVM, it may not work together with a plugin for standard Ruby. > > My take is that it does not really matter. You have to choose one or > the other ;). > > Truth is we have too small a community to provide the luxury edition > of BioRuby which can handle everything (which is also true for the > other Bio* projects). > > See mappings and dependencies as part of the development of the > ultimate BioRuby. A process, transition, evolution. Plugins make it > possible. > > Pj. > From daijiendoh at gmail.com Thu Jul 8 07:41:57 2010 From: daijiendoh at gmail.com (=?ISO-2022-JP?B?GyRCMXNGI0JnRnMbKEI=?=) Date: Thu, 8 Jul 2010 20:41:57 +0900 Subject: [BioRuby] KEGG API Message-ID: Dear All I have a question about KEGG API on BioRuby. When I use "type" method, "deprecated" message was returned. While, the method "type" is still working, refer to KEGG manual. ## Script I used is below relations = get_element_relations_by_ pathway('path:bsu00010') relations.each do |rel| puts rel.element_id1 puts rel.element_id2 puts rel.type rel.subtypes.each do |sub| puts sub.element_id puts sub.relation puts sub.type end end Result for sub.type => "sub.type was deprecated use class" How can I receive type property in the class "get_element_relations_by_pathway" ? With best wishes, Daiji Endoh From email2ants at gmail.com Thu Jul 8 08:24:41 2010 From: email2ants at gmail.com (Anthony Underwood) Date: Thu, 8 Jul 2010 13:24:41 +0100 Subject: [BioRuby] Bio::Assembly In-Reply-To: References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> <20100707174605.GA14014@thebird.nl> Message-ID: Hi Chase I for one would love to have a ruby implementation of a Bio::Assembly module. I wrote a Bio::Chromatogram module for bioruby to replicate the bioperl functionality since I can not stand writing in perl any longer!! I have thought about writing an ace parser also but have not yet had time. Count this a +1 from me! Anthony On 8 Jul 2010, at 12:29, Chase Miller wrote: > Pjtor > > Thanks for the great ideas! The BioLib project sounds interesting. I'll > have to ask my old GSOC mentor Mark Jensen about it. > > However, for my purposes, It may be easier to just code it up in ruby (also > I'm dying to write ruby code after moving over from perl :) ). What I need > is an ace parser and fairly simple scaffold and contig objects. Although > slim in features, it could be a good starting point for a pure ruby assembly > module, which I wouldn't mind maintaining. > > How closely does BioRuby like to follow the BioPerl API? > > I've noticed that BioRuby seems to handle file formats differently than > BioPerl, with most of them being in Bio::db/. Can anyone expand on this? > > Is there any place I can read about the best practices for BioRuby. For > example, I haven't seen any instances of using hashes to pass parameters in > method calls e.g. > > > a.parse( :file => file, :format => format ) > > Is this frowned upon in BioRuby? > > > Thanks, > Chase > > > On Wed, Jul 7, 2010 at 1:46 PM, Pjotr Prins wrote: > >> On Wed, Jul 07, 2010 at 09:27:17AM -0700, Christian Zmasek wrote: >>> C/C++? Perl?? Really? >>> >>> Do you think it is a good idea to introduce so many dependencies? >> >> Not at the same time ;) >> >>> While these might not be a problem for expert users, I worry that >>> the to more complexities are introduced the less likely the avarage >>> biologist with little coding experience might be tempted to use >>> BioRuby. >> >> It won't affect BioRuby core. >> >> Pure Ruby is great - and we should always aim for that with 'core' >> BioRuby. Still, we don't have enough developers to support every nook >> and cranny of bioinformatics. >> >> We need to get functionality in fast, when we can. If functionality >> exists elsewhere: use that. It does not make sense to rewrite >> everything from scratch. As long as we provide clear interfaces, we >> can always start replacing stuff with pure Ruby. If someone feels >> like recoding. >> >> By forcing dependencies into a 'plugin' we still keep BioRuby pure. >> People are free to create plugins, which may have dependencies. If you >> want the functionality badly enough, and you don't want to write it >> yourself, find the way of using the plugin. This is one of the major >> reasons for providing a plugin infrastructure. Which, btw, is the >> same plugin system that Rails uses (thanks to Raoul and Toshiaki). >> >> A plugin is not core BioRuby. BioRuby itself does not get dependencies >> - other than highly common libraries like libxml. >> >> We simply don't have the people to achieve everything. Not to mention >> that many libraries, like EMBOSS, outperform Ruby in terms of >> processing speed and memory consumption. When we call BLAST we don't >> write BLAST ourselves in Ruby. Those are also dependencies. >> >> Outside dealing with dependencies one thing we may want to think about >> is incompatible plugins. For example, if I were to use a plugin >> for the JVM, it may not work together with a plugin for standard Ruby. >> >> My take is that it does not really matter. You have to choose one or >> the other ;). >> >> Truth is we have too small a community to provide the luxury edition >> of BioRuby which can handle everything (which is also true for the >> other Bio* projects). >> >> See mappings and dependencies as part of the development of the >> ultimate BioRuby. A process, transition, evolution. Plugins make it >> possible. >> >> Pj. >> > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Thu Jul 8 09:13:03 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 8 Jul 2010 15:13:03 +0200 Subject: [BioRuby] Bio::Assembly In-Reply-To: References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> <20100707174605.GA14014@thebird.nl>

Message-ID: <20100708131303.GA22116@thebird.nl> Glad to see this excitement :). On Thu, Jul 08, 2010 at 01:24:41PM +0100, Anthony Underwood wrote: > > How closely does BioRuby like to follow the BioPerl API? Not. Though you wouldn't be the first to copy stuff. > > I've noticed that BioRuby seems to handle file formats differently than > > BioPerl, with most of them being in Bio::db/. Can anyone expand on this? > > > > Is there any place I can read about the best practices for BioRuby. For > > example, I haven't seen any instances of using hashes to pass parameters in > > method calls e.g. > > a.parse( :file => file, :format => format ) I am sure it is used and it is fine, when used for 'setting' options. Only risk is that you lose checking of the number of parameters passed. Which can lead to bugs. Otherwise I like it for being explanatory in the calling code. Mind: It can lead to 'rich' interfaces - so common in R, where one method handles many circumstances. These rich methods tend to be really ugly and hard to test (=hard to prove correct). Don't use it to replace multiple methods. Methods also are explanatory in the calling code. > > Is this frowned upon in BioRuby? I don't think so. But use it when it makes sense. Pj. From bonnalraoul at ingm.it Thu Jul 8 11:46:52 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 8 Jul 2010 17:46:52 +0200 Subject: [BioRuby] Integration ruby-ffi Message-ID: <8a0c37b7-2860-4636-ac06-b17b3cb78324@ingm.it> I think this library http://github.com/ffi/ffi is quite interesting. Now is possible for example use nokogiri from JRuby. Pjotr, have you tried to use that library for your binding purposes ? -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it From ngoto at gen-info.osaka-u.ac.jp Thu Jul 8 15:34:19 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Thu, 08 Jul 2010 15:34:19 -0400 Subject: [BioRuby] KEGG API In-Reply-To: References: Message-ID: <20100708153417.38CC.EEF6E030@gen-info.osaka-u.ac.jp> Hi, It is reproduced with SOAP4R 1.5.8, the latest release version of SOAP4R. It can not be reproduced with SOAP4R 1.5.5 bundled with Ruby 1.8.7-p299. >From the revision 1683 of SOAP4R, it does not add mappings for already defined methods. http://dev.ctor.org/soap4r/changeset/1683 In Debian (and Ubuntu) package version of Ruby 1.8.7, the patch is applied to prevent memory exhaust problem as a security fix. The workaround is to use SOAP::Mapping::Object#[]. For example, use rel["type"] instead of rel.type. api = Bio::KEGG::API.new relations = api.get_element_relations_by_pathway('path:bsu00010') relations.each do |rel| puts rel.element_id1 puts rel.element_id2 puts rel["type"] rel.subtypes.each do |sub| puts sub.element_id puts sub.relation puts sub["type"] end end > Dear All > > I have a question about KEGG API on BioRuby. > When I use "type" method, "deprecated" message was returned. > While, the method "type" is still working, refer to KEGG manual. > > ## Script I used is below > > relations = get_element_relations_by_ > pathway('path:bsu00010') > relations.each do |rel| > puts rel.element_id1 > puts rel.element_id2 > puts rel.type > rel.subtypes.each do |sub| > puts sub.element_id > puts sub.relation > puts sub.type > end > end > > Result for sub.type => "sub.type was deprecated use class" > > > How can I receive type property in the class > "get_element_relations_by_pathway" ? > > With best wishes, > > Daiji Endoh > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From pjotr.public14 at thebird.nl Thu Jul 8 16:22:47 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 8 Jul 2010 22:22:47 +0200 Subject: [BioRuby] Integration ruby-ffi In-Reply-To: <8a0c37b7-2860-4636-ac06-b17b3cb78324@ingm.it> References: <8a0c37b7-2860-4636-ac06-b17b3cb78324@ingm.it> Message-ID: <20100708202247.GA26254@thebird.nl> On Thu, Jul 08, 2010 at 05:46:52PM +0200, Raoul Bonnal wrote: > I think this library http://github.com/ffi/ffi is quite interesting. Now is possible for example use nokogiri from JRuby. > > Pjotr, have you tried to use that library for your binding purposes > ? Not yet. The purpose of BioLib is cross-language, really. Kill many birds with one stone ;). Still ffi is cool, and initially easier than SWIG. Not sure how easy it is in different deployments. Pj. From rutgeraldo at gmail.com Sat Jul 10 10:58:00 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Sat, 10 Jul 2010 15:58:00 +0100 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100707184019.38B3.EEF6E030@gen-info.osaka-u.ac.jp> References: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> <20100707184019.38B3.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: Hi, sorry for the even later response on my end... The BioRuby Matrix class is not a suitable superclass for character state matrices, which are essentially the generalized form of multiple sequence alignments (but then also allowing for other types of homologyzed data). I am tempted to suggest you make at least some (and maybe all) nexml character state matrices either inherit from an alignment class or easily convertible to it: if people parse a nexml file with a dna alignment there's a good chance they'll want to be able to use that as an alignment object elsewhere in their code. As an aside, I have no problem with using CharacterStateMatrix as a class name. I don't see people having to type it that frequently so it's not a big deal, right? Maybe I think this because my java work is starting to get to me, though :) Rutger On Wed, Jul 7, 2010 at 11:40 PM, Naohisa Goto wrote: > Hi Anurag, > > > I think they are called character state matrices in the phylo > terminology. > > But something like CharactersStateMatrices would be two long. What about > > CharacterMatrices or StateMatrices? Perhaps Rutger can help me here. > > It is generally bad thing to abbreviate only because it is too long. > Modifying the upstream original names might be a source of confusion. > > In this case, using CharactersStateMatrices as is is the best. If the > name is expected to be frequently used by library users, short name > could also be added. > > However, as Pjotr already mentioned, the class might not be needed, > depending on the design of classes. > > Sorry for late response. > > -- > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From anurag08priyam at gmail.com Sun Jul 11 02:51:03 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sun, 11 Jul 2010 12:21:03 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100703142541.GA3153@thebird.nl> References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> <20100703142541.GA3153@thebird.nl> Message-ID: > > This would be a factory, right? > > I think what you want to do is good in its objective - trying to > shorten the implementation. But do you really need > class inspection/reflection here? Asking for the class name is usually > prevented by having proper attributes in the classes. That is if you > use OOP. Question is whether you really require this. > > The problem with the code I had (and have) was the really wide and > deep use of OOP classes. That led to duplication of code and little > 'feel' for correctness of what is in there. Deep OOP hierarchies are > evil. Duplication is ugly. > > Inspection/reflection is evil too - like Naohisa reacted, pretty much > - it is only used in exceptional cases when there is no other elegant > way of resolving issues. It can be powerful, but only use when really > required, as other people often fail to understand what it does - and > code should be self-documenting. > > I think you need to ask more fundamental questions to yourself. > > Why not use BioRuby basic types for most data represented by NeXML? > Only use special objects when there is real added value. So DnaSeqRow > would simply be a Sequence (or even list of char) and DnaSeqMatrix > would be a list of Sequence. If you have further attributes create a > new composite object (like SequenceFeatures, or if you think more > functionally again a tuple of sequence(s) and features?). This way > you don't create a hierarchy that booms into hundreds of specialized > object we won't use elsewhere. To differentiate between a DnaSequence > and RnaSequence you do not need different objects. Both are strings > (in BioRuby). You could even settle for Ruby's primitive types and > containers. > > Likewise, even if you need a Matrix, you don't need RnaMatrix and > DnaMatrix. I am sure of that. They are only specializations in name, > the code in there should be identical. > > If you go down the OOP route, make use of Ruby's mixin's. Search > Google for "ruby mixin deep oop hierarchy". > > My recommendation is to refactor the library to use as primitive a > type as possible, at every point. When you run into functionality that > requires a more complex type, because there is no other way - that is > the moment to design and add it. > Point noted :). > I don't know the full depth of the NeXML format, but I can predict > it consists of primitive types in ordered ways. This can be mirrored > by the implementation. If you do it like this you won't have to use > inspection (like above question). OOP classes are for harnessing > special functionality that go with a certain type. Do not create a type > unless you need something special. > The fact that I do not know anything about bio* and phylo* also leads to some amount of confusion :P. Due to some rotten luck I was not able to confer with Rutger. I will discuss with Rutger and refactor the code keeping your suggestions in mind. > You can propose changes to existing BioRuby types - in particular > with the RDF implementation. > > I know some people will balk at this rewrite - but to be honest, if > you want your library to be useful to others it needs rethinking. I > would take a week out of your plan to experiment with different object > models - just start with a small subset. When you think something > works, roll it out all the way. That can be done quickly. Read, read, > read on the Internet about object models. > > One thing you can consider is to use an intermediate object structure > for parsing the XML into Ruby - and next fork it out into logical > data structures. I do that regularly as the XML 'model' does not > normally map to Ruby well. One example of mine is here > > http://github.com/pjotrp/swig2doc/blob/master/lib/input/doxyxmlparser.rb > > Doxy objects are stored in > > http://github.com/pjotrp/swig2doc/blob/master/lib/cobj/doxy/doxycobjs.rb > > Note swig2doc also contains a convenience class for using libxml2 in > > http://github.com/pjotrp/swig2doc/blob/master/lib/input/xmleasyreader.rb > > And while you are at refactoring, why not make sure the parser does > not fill memory. > > Pj. > > PS. Are you using another NeXML OOP implementation as a model - Perl, > Python, Java? I would like to know, so I can have a look. > > Not using as a model but I sometimes refer to the python implementation :- http://nexml.org/nexml/python/ -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From rutgeraldo at gmail.com Sun Jul 11 06:13:42 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Sun, 11 Jul 2010 11:13:42 +0100 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> <20100703142541.GA3153@thebird.nl> Message-ID: > > Not using as a model but I sometimes refer to the python > implementation :- http://nexml.org/nexml/python/ > There's also a perl and a java implementation on that same website to gain inspiration from. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From emanuele.orlando at gmail.com Thu Jul 15 08:29:57 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Thu, 15 Jul 2010 14:29:57 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: Hi, any news to my questions? :) Best regards, -- Emanuele Orlando http://www.emanueleorlando.com On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < emanuele.orlando at gmail.com> wrote: > Thanks to all for the precious links. > I didn't know chemruby and now i've some questions :) > Its mailing list is empty. If i want discuss about programming/design of > chemruby, where do i mail? On bioruby list with a specific tag (example > [CHEMRUBY]) or i use chemruby list? > At the beginning who has contributed for Chemruby? > Priority where focused? > > Thanks > Emanuele > > -- > Emanuele Orlando > http://www.emanueleorlando.com > http://it.linkedin.com/in/kooru > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama wrote: > >> Hi, >> >> This explanation by Jan may be useful when you try GitHub. >> >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html >> >> Toshiaki >> >> On 2010/07/08, at 3:36, Pjotr Prins wrote: >> >> > Welcome Emanuele, >> > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: >> >> Dear Bioruby, >> >> >> >> This is my first mail to the group. >> >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and >> from >> >> three years i'm working as IT consultant. >> >> As others i love ruby and i would be happy to contribute to the >> development >> >> of bioruby. How can i make some contributions? >> > >> > Choose a topic to work on. You can simply get a free account on >> > github.com and clone the repository. Start coding - when it works we >> > can easily merge it in to the main tree. >> > >> > If you want to discuss programming and design, just mail this list. >> > >> > Pj. >> > _______________________________________________ >> > BioRuby Project - http://www.bioruby.org/ >> > BioRuby mailing list >> > BioRuby at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioruby >> >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > > > From bonnalraoul at ingm.it Thu Jul 15 08:48:20 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 15 Jul 2010 14:48:20 +0200 Subject: [BioRuby] R: contribute to Bioruby References: <20100707183622.GA16079@thebird.nl> Message-ID: Dear Emanuele, Now we are working on: 1) creating a plugin system, something more dynamic than gem for experimenting new code and include external bindings like BioLib (cross OpenBio* projects for high performance algorithms in BioInformatics) 2) using the ActiveRDF library to query sparql, I'd like to define a "smart" way to use the sparql language. I need some time to write down a prototype I have in my mind. I think that you have already heard something about Semantic Web in you consultant experience 3) Support EMBOSS suite dynamically, reading .acd configuration files 4) How to create charts or fancy graphics from data (we need a cross platform way)? 5) samtool ? 6) update BioSQL support to latest revision please add other ideas below. I'm currently working on 1), and 2) -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it > -----Messaggio originale----- > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > bounces at lists.open-bio.org] Per conto di Emanuele Orlando > Inviato: gioved? 15 luglio 2010 14:30 > A: BioRuby ML > Oggetto: Re: [BioRuby] contribute to Bioruby > > Hi, > > any news to my questions? :) > > Best regards, > > -- > Emanuele Orlando > http://www.emanueleorlando.com > > On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < > emanuele.orlando at gmail.com> wrote: > > > Thanks to all for the precious links. > > I didn't know chemruby and now i've some questions :) > > Its mailing list is empty. If i want discuss about programming/design > of > > chemruby, where do i mail? On bioruby list with a specific tag > (example > > [CHEMRUBY]) or i use chemruby list? > > At the beginning who has contributed for Chemruby? > > Priority where focused? > > > > Thanks > > Emanuele > > > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > http://it.linkedin.com/in/kooru > > > > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama > wrote: > > > >> Hi, > >> > >> This explanation by Jan may be useful when you try GitHub. > >> > >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would- > that-work.html > >> > >> Toshiaki > >> > >> On 2010/07/08, at 3:36, Pjotr Prins wrote: > >> > >> > Welcome Emanuele, > >> > > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > >> >> Dear Bioruby, > >> >> > >> >> This is my first mail to the group. > >> >> My name is Emanuele Orlando, i'm graduate in Computation > Chemistry and > >> from > >> >> three years i'm working as IT consultant. > >> >> As others i love ruby and i would be happy to contribute to the > >> development > >> >> of bioruby. How can i make some contributions? > >> > > >> > Choose a topic to work on. You can simply get a free account on > >> > github.com and clone the repository. Start coding - when it works > we > >> > can easily merge it in to the main tree. > >> > > >> > If you want to discuss programming and design, just mail this > list. > >> > > >> > Pj. > >> > _______________________________________________ > >> > BioRuby Project - http://www.bioruby.org/ > >> > BioRuby mailing list > >> > BioRuby at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioruby > >> > >> > >> _______________________________________________ > >> BioRuby Project - http://www.bioruby.org/ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> > > > > > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From anurag08priyam at gmail.com Thu Jul 15 10:57:45 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 15 Jul 2010 20:27:45 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API Message-ID: Hello all, I have worked out an initial set of specs for the RDF API. The code is in 'rdf' branch - http://github.com/yeban/bioruby/tree/rdf. I am providing an overview here: To start with I have put the specs in bioruby/spec directory. I took the liberty of adding a rake task to execute all the specs. Most of the specs will fail as of now and some are pending. "rake spec SPEC_OPTS="--format nested" " should be good to get a rough overview of the specs. The lib itself( currently only bare class definition ) resides in bioruby/lib/bio/rdf directory and uses Bio::RDF namespace. At the core are Literal, Node, URI and classes, which form the subject, predicate, object and context of any RDF statement. An RDF statement can be created as an instance of Statement class. A collection of Statements form a Graph. An RDF graph can be queried for statements with a given subject, predicate or object. We can define new Vocabularies with the Vocabulary class. I am explaining the vocabulary class in more detail below. RDF vocabularies are defined on a namespace uri. Say, the XSD vocabulary that defines datatypes for literals. XSD is defined on " http://www.w3.org/2001/XMLSchema#" namespace with the 'xsd' prefix. So the actual URI for the curie "xsd:double" goes like " http://www.w3.org/2001/XMLSchema#double". The rational is to have such URI and curie automatically generated : xsd = Vocabulary.new "http://www.w3.org/2001/XMLSchema#" xsd[:double] I was thinking of having commonly used vocabulary defined in the lib so someone could use it out of box like: XSD[:double] or CDAO[:foo]. The rdf lib can be used by any component of BioRuby by using that object as the subject or object of an rdf statement. However, a cleaner solution would be to have an Annotatable module mixed into the classes that are likely to use the rdf lib. Annotatable would just provide a wrapper over the core rdf lib to work with rdf. To begin with I have added two functions 'annotate' and 'annotation' which create and return a rdf graph for that object respectively. The example for these functions is pending in the specs. However, I was thinking of something like: seq = Bio::Sequece.new seq.annotate do |graph| graph << [self, CDAO[:foo], 'moo' ] end seq.annotation.query :predicate => CDAO[:foo] I think with this design we can maintain loose coupling between the rdf lib and bioruby components. I have just begun creating the classes to realize the specs, so the design can still be modified completely if I am in a wrong direction. In thinking out the rdf lib, I have mostly referred to the RDF primer and Wikipedia. I might have gone wrong on some RDF concepts too. Please correct :). -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From rutgeraldo at gmail.com Thu Jul 15 11:58:44 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Thu, 15 Jul 2010 17:58:44 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: I was thinking of having commonly used vocabulary defined in the lib so > someone could use it out of box like: XSD[:double] or CDAO[:foo]. > It will probably turn out that this is useful. Perhaps also things such as Dublin Core, SKOS, DCTerms, Prism, DarwinCore I think with this design we can maintain loose coupling between the rdf lib > and bioruby components. I have just begun creating the classes to realize > the specs, so the design can still be modified completely if I am in a wrong > direction. > > In thinking out the rdf lib, I have mostly referred to the RDF primer and > Wikipedia. I might have gone wrong on some RDF concepts too. Please correct > :). > I like the design, I'm curious to hear how the experts like it. You haven't mentioned this explicitly here but I know you've been thinking about recursively nested statements, right? Rutger -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From anurag08priyam at gmail.com Fri Jul 16 22:51:23 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 17 Jul 2010 08:21:23 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: > > I like the design, I'm curious to hear how the experts like it. You > haven't mentioned this explicitly here but I know you've been thinking about > recursively nested statements, right? > That would be an RDF graph with blank node as the object of the statement at which nesting starts and the same blank node will be the subject of the nested statements. Right? I have considered that. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sat Jul 17 02:54:15 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 17 Jul 2010 12:24:15 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: > To start with I have put the specs in bioruby/spec directory. I took the > liberty of adding a rake task to execute all the specs. Most of the specs > will fail as of now and some are pending. "rake spec SPEC_OPTS="--format > nested" " should be good to get a rough overview of the specs. > I added SPEC_OPTS="--format nested" as the default option in the specs rake task. So 'rake spec' should be good now. However the 'format' option can be overridden on the command line, if anyone prefers 'specdoc'. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From pjotr.public14 at thebird.nl Sat Jul 17 05:22:30 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 17 Jul 2010 11:22:30 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: <20100717092230.GA21914@thebird.nl> Hi Anurag, I'll go over the Specs in a short time. First thing that is interesting to note is that the elaborage class hierarchy does not show in the unit tests, nor Specs. This is indicates they are not really needed. E.g. search for grep -r ProteinCellRow * grep -r RestrictionSeqRow * Does not render any tests. Which is to be expected. I know you haven't gotten round to refactoring, but you can see what I mean? When designing a class hierarchy, it can best be driven from the API. That is one reason behaviour driven testing - or even unit testing is done upfront. Meanwhile, there is a second good reason to introduce an OOP hierarchy - it is when it makes 'internal' code easier to understand. You should judge every new class you introduce based on those grounds: Does it add to, or simplify, my API? Or does it make my internal code organisation a lot easier to understand. Tick one of those two boxes to create class hierarchies. Otherwise you are best off with the most *simple* data representation. Simple, in general, is easy to understand and allows for more flexible approaches. Pj. On Thu, Jul 15, 2010 at 08:27:45PM +0530, Anurag Priyam wrote: > Hello all, > > I have worked out an initial set of specs for the RDF API. The code is in > 'rdf' branch - http://github.com/yeban/bioruby/tree/rdf. > > I am providing an overview here: > To start with I have put the specs in bioruby/spec directory. I took the > liberty of adding a rake task to execute all the specs. Most of the specs > will fail as of now and some are pending. "rake spec SPEC_OPTS="--format > nested" " should be good to get a rough overview of the specs. > > The lib itself( currently only bare class definition ) resides in > bioruby/lib/bio/rdf directory and uses Bio::RDF namespace. > > At the core are Literal, Node, URI and classes, which form the subject, > predicate, object and context of any RDF statement. An RDF statement can be > created as an instance of Statement class. A collection of Statements form a > Graph. An RDF graph can be queried for statements with a given subject, > predicate or object. We can define new Vocabularies with the Vocabulary > class. I am explaining the vocabulary class in more detail below. > > RDF vocabularies are defined on a namespace uri. Say, the XSD vocabulary > that defines datatypes for literals. XSD is defined on " > http://www.w3.org/2001/XMLSchema#" namespace with the 'xsd' prefix. So the > actual URI for the curie "xsd:double" goes like " > http://www.w3.org/2001/XMLSchema#double". The rational is to have such URI > and curie automatically generated : > > xsd = Vocabulary.new "http://www.w3.org/2001/XMLSchema#" > xsd[:double] > > I was thinking of having commonly used vocabulary defined in the lib so > someone could use it out of box like: XSD[:double] or CDAO[:foo]. > > The rdf lib can be used by any component of BioRuby by using that object as > the subject or object of an rdf statement. However, a cleaner solution would > be to have an Annotatable module mixed into the classes that are likely to > use the rdf lib. Annotatable would just provide a wrapper over the core rdf > lib to work with rdf. To begin with I have added two functions 'annotate' > and 'annotation' which create and return a rdf graph for that object > respectively. The example for these functions is pending in the specs. > However, I was thinking of something like: > > seq = Bio::Sequece.new > seq.annotate do |graph| > graph << [self, CDAO[:foo], 'moo' ] > end > > seq.annotation.query :predicate => CDAO[:foo] > > I think with this design we can maintain loose coupling between the rdf lib > and bioruby components. I have just begun creating the classes to realize > the specs, so the design can still be modified completely if I am in a wrong > direction. > > In thinking out the rdf lib, I have mostly referred to the RDF primer and > Wikipedia. I might have gone wrong on some RDF concepts too. Please correct > :). > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Sat Jul 17 06:04:17 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 17 Jul 2010 12:04:17 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: <20100717100417.GA23002@thebird.nl> Hi Anurag, On Thu, Jul 15, 2010 at 08:27:45PM +0530, Anurag Priyam wrote: > However, I was thinking of something like: > > seq = Bio::Sequece.new > seq.annotate do |graph| > graph << [self, CDAO[:foo], 'moo' ] > end > > seq.annotation.query :predicate => CDAO[:foo] > > I think with this design we can maintain loose coupling between the rdf lib > and bioruby components. I have just begun creating the classes to realize > the specs, so the design can still be modified completely if I am in a wrong > direction. I think this is the idea. RDF generator should be generic and easily used for extending existing objects. That is very good. In other words, the Sequence class should not *know* about RDF - we should not pollute existing classes (even further) with RDF knowledge, if we can avoid it. You can create a specialized RDF::Sequence, or RDF::Alignment, object, which would add certain features to a base Sequence class (without modifying the Sequence class itself, for sure). These classes should be opaque for whether we are dealing with nucleotids, or amino acids (is my opinion). So the first thing to do is to write the Specs for such a system. The current Specs are merely object invocations for RDF itself. So what I would like to see is Specs that do something real. Rather than the example.org URI, use something that is meaningful. Write Specs for using real BioRuby classes and/or NeXML classes. Others, Raoul for one, have ideas too for RDF too. So the Specs will help out with ideas. Write more directed Specs and we will discuss them. Pj. From rutgeraldo at gmail.com Sat Jul 17 07:40:48 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Sat, 17 Jul 2010 12:40:48 +0100 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: <20100717092230.GA21914@thebird.nl> References: <20100717092230.GA21914@thebird.nl> Message-ID: > > > You should judge every new class you introduce based on those > grounds: Does it add to, or simplify, my API? Or does it make my > internal code organisation a lot easier to understand. Tick one of > those two boxes to create class hierarchies. Otherwise you are best > off with the most *simple* data representation. Simple, in general, > is easy to understand and allows for more flexible approaches. > > In working in other languages it has turned out time and time again that the more closely the class hierarchy mirrors that of the NeXML schema types, the more easily instance documents can be represented, manipulated and round-tripped. This doesn't necessarily mean that the API for users needs to reflect all that, but internally it's useful. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From pjotr.public14 at thebird.nl Sun Jul 18 02:33:05 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sun, 18 Jul 2010 08:33:05 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: <20100717092230.GA21914@thebird.nl> Message-ID: <20100718063305.GB29780@thebird.nl> On Sat, Jul 17, 2010 at 12:40:48PM +0100, Rutger Vos wrote: > > > > > > You should judge every new class you introduce based on those > > grounds: Does it add to, or simplify, my API? Or does it make my > > internal code organisation a lot easier to understand. Tick one of > > those two boxes to create class hierarchies. Otherwise you are best > > off with the most *simple* data representation. Simple, in general, > > is easy to understand and allows for more flexible approaches. > > > > > In working in other languages it has turned out time and time again that the > more closely the class hierarchy mirrors that of the NeXML schema types, the > more easily instance documents can be represented, manipulated and > round-tripped. This doesn't necessarily mean that the API for users needs to > reflect all that, but internally it's useful. Not disputing that. Please read elements.rb. Pj. From ngoto at gen-info.osaka-u.ac.jp Tue Jul 20 08:35:40 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 20 Jul 2010 21:35:40 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: <20100720123541.6F2ED1CBC445@idnmail.gen-info.osaka-u.ac.jp> Hi, Sorry responding too late. ChemRuby have been developed by Nobuya Tanaka. Contact address for ChemRuby is . Subversion repository is http://tools.textdriven.com/svn/chemruby % svn co http://tools.textdriven.com/svn/chemruby I don't know he is looking at bioruby mailing list. I tried to subscribe to the mailing list chemruby-list-jp in RubyForge, but currently no response from the server. It seems the list is stopped. So, currently, >> On bioruby list with a specific tag (example [CHEMRUBY]) with Cc: staff at chemruby.org seems good. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Thu, 15 Jul 2010 14:29:57 +0200 Emanuele Orlando wrote: > Hi, > > any news to my questions? :) > > Best regards, > > -- > Emanuele Orlando > http://www.emanueleorlando.com > > On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < > emanuele.orlando at gmail.com> wrote: > > > Thanks to all for the precious links. > > I didn't know chemruby and now i've some questions :) > > Its mailing list is empty. If i want discuss about programming/design of > > chemruby, where do i mail? On bioruby list with a specific tag (example > > [CHEMRUBY]) or i use chemruby list? > > At the beginning who has contributed for Chemruby? > > Priority where focused? > > > > Thanks > > Emanuele > > > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > http://it.linkedin.com/in/kooru > > > > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama wrote: > > > >> Hi, > >> > >> This explanation by Jan may be useful when you try GitHub. > >> > >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html > >> > >> Toshiaki > >> > >> On 2010/07/08, at 3:36, Pjotr Prins wrote: > >> > >> > Welcome Emanuele, > >> > > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > >> >> Dear Bioruby, > >> >> > >> >> This is my first mail to the group. > >> >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and > >> from > >> >> three years i'm working as IT consultant. > >> >> As others i love ruby and i would be happy to contribute to the > >> development > >> >> of bioruby. How can i make some contributions? > >> > > >> > Choose a topic to work on. You can simply get a free account on > >> > github.com and clone the repository. Start coding - when it works we > >> > can easily merge it in to the main tree. > >> > > >> > If you want to discuss programming and design, just mail this list. > >> > > >> > Pj. > >> > _______________________________________________ > >> > BioRuby Project - http://www.bioruby.org/ > >> > BioRuby mailing list > >> > BioRuby at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioruby > >> > >> > >> _______________________________________________ > >> BioRuby Project - http://www.bioruby.org/ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> > > > > > > > > From ngoto at gen-info.osaka-u.ac.jp Tue Jul 20 08:59:34 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 20 Jul 2010 21:59:34 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 20 Jul 2010 21:35:40 +0900 Naohisa GOTO wrote: > I tried to subscribe to the mailing list chemruby-list-jp > in RubyForge, but currently no response from the server. > It seems the list is stopped. It still works. I've just received subscription confirmation mail, and I've subscribed to the list now. I think cross-posting is also good, because there may be few persons in the chemruby list. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From bonnalraoul at ingm.it Tue Jul 20 08:57:46 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Tue, 20 Jul 2010 14:57:46 +0200 Subject: [BioRuby] R: contribute to Bioruby References: <20100707183622.GA16079@thebird.nl> <20100720123541.6F2ED1CBC445@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <642659b2-c7b9-40cb-9e62-67cee2c49f4b@ingm.it> Dear Goto-san, Do you think there is some little task, just for staring, that could be assigned to Emanuele ? Some time ago I posted a list of tasks -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it > -----Messaggio originale----- > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > bounces at lists.open-bio.org] Per conto di Naohisa GOTO > Inviato: marted? 20 luglio 2010 14:36 > A: Emanuele Orlando > Cc: BioRuby ML; staff at tools.textdriven.com > Oggetto: Re: [BioRuby] contribute to Bioruby > > Hi, > > Sorry responding too late. > > ChemRuby have been developed by Nobuya Tanaka. > Contact address for ChemRuby is . > Subversion repository is http://tools.textdriven.com/svn/chemruby > % svn co http://tools.textdriven.com/svn/chemruby > > I don't know he is looking at bioruby mailing list. > I tried to subscribe to the mailing list chemruby-list-jp > in RubyForge, but currently no response from the server. > It seems the list is stopped. > > So, currently, > >> On bioruby list with a specific tag (example [CHEMRUBY]) > with Cc: staff at chemruby.org seems good. > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Thu, 15 Jul 2010 14:29:57 +0200 > Emanuele Orlando wrote: > > > Hi, > > > > any news to my questions? :) > > > > Best regards, > > > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > > > On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < > > emanuele.orlando at gmail.com> wrote: > > > > > Thanks to all for the precious links. > > > I didn't know chemruby and now i've some questions :) > > > Its mailing list is empty. If i want discuss about > programming/design of > > > chemruby, where do i mail? On bioruby list with a specific tag > (example > > > [CHEMRUBY]) or i use chemruby list? > > > At the beginning who has contributed for Chemruby? > > > Priority where focused? > > > > > > Thanks > > > Emanuele > > > > > > -- > > > Emanuele Orlando > > > http://www.emanueleorlando.com > > > http://it.linkedin.com/in/kooru > > > > > > > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama > wrote: > > > > > >> Hi, > > >> > > >> This explanation by Jan may be useful when you try GitHub. > > >> > > >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would- > that-work.html > > >> > > >> Toshiaki > > >> > > >> On 2010/07/08, at 3:36, Pjotr Prins wrote: > > >> > > >> > Welcome Emanuele, > > >> > > > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando > wrote: > > >> >> Dear Bioruby, > > >> >> > > >> >> This is my first mail to the group. > > >> >> My name is Emanuele Orlando, i'm graduate in Computation > Chemistry and > > >> from > > >> >> three years i'm working as IT consultant. > > >> >> As others i love ruby and i would be happy to contribute to the > > >> development > > >> >> of bioruby. How can i make some contributions? > > >> > > > >> > Choose a topic to work on. You can simply get a free account on > > >> > github.com and clone the repository. Start coding - when it > works we > > >> > can easily merge it in to the main tree. > > >> > > > >> > If you want to discuss programming and design, just mail this > list. > > >> > > > >> > Pj. > > >> > _______________________________________________ > > >> > BioRuby Project - http://www.bioruby.org/ > > >> > BioRuby mailing list > > >> > BioRuby at lists.open-bio.org > > >> > http://lists.open-bio.org/mailman/listinfo/bioruby > > >> > > >> > > >> _______________________________________________ > > >> BioRuby Project - http://www.bioruby.org/ > > >> BioRuby mailing list > > >> BioRuby at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioruby > > >> > > > > > > > > > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From emanuele.orlando at gmail.com Tue Jul 20 10:07:43 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Tue, 20 Jul 2010 16:07:43 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> References: <20100707183622.GA16079@thebird.nl> <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Thanks Naohisa, sure cross-posting is a good idea. In your opinion, there's some priority where focused? Raoul, about your list i could be interested for point 4 and 6. But i wait for your feedback :) Regards, -- Emanuele Orlando http://www.emanueleorlando.com On Tue, Jul 20, 2010 at 2:59 PM, Naohisa GOTO wrote: > Hi, > > On Tue, 20 Jul 2010 21:35:40 +0900 > Naohisa GOTO wrote: > > > I tried to subscribe to the mailing list chemruby-list-jp > > in RubyForge, but currently no response from the server. > > It seems the list is stopped. > > It still works. > I've just received subscription confirmation mail, > and I've subscribed to the list now. > > I think cross-posting is also good, because there may be > few persons in the chemruby list. > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > From ngoto at gen-info.osaka-u.ac.jp Tue Jul 20 11:23:59 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 21 Jul 2010 00:23:59 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100720152359.BDEBE1CBC3C2@idnmail.gen-info.osaka-u.ac.jp> Hi Emanuele, No priority, and please choose what you like and what you can do now. For "4) charts or fancy graphics from data", many people are interested in, but still few codes (Bio::Graphics etc). There may be many different approches. To write codes useful for biologists, knowing visualization examples in the field of biology/bioinformatics may be needed. For the purpose, reading recent research papers and/or studying what other projects do (BioConductor, BioJava, Biopython, BioPerl, etc) would be good. If you like "6) update BioSQL support to latest revision", discuss with Raoul, maintener of BioRuby BioSQL support. About ChemRuby, first, please try to use it. It is great if you can write good example scripts using BioRuby with ChemRuby. If you find bug, please fix. If you feel something missing, please add new features. Thank you, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 20 Jul 2010 16:07:43 +0200 Emanuele Orlando wrote: > Thanks Naohisa, sure cross-posting is a good idea. > In your opinion, there's some priority where focused? > Raoul, about your list i could be interested for point 4 and 6. But i wait > for your feedback :) > Regards, > -- > Emanuele Orlando > http://www.emanueleorlando.com > > > On Tue, Jul 20, 2010 at 2:59 PM, Naohisa GOTO > wrote: > > > Hi, > > > > On Tue, 20 Jul 2010 21:35:40 +0900 > > Naohisa GOTO wrote: > > > > > I tried to subscribe to the mailing list chemruby-list-jp > > > in RubyForge, but currently no response from the server. > > > It seems the list is stopped. > > > > It still works. > > I've just received subscription confirmation mail, > > and I've subscribed to the list now. > > > > I think cross-posting is also good, because there may be > > few persons in the chemruby list. > > > > Naohisa Goto > > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > From emanuele.orlando at gmail.com Tue Jul 20 17:16:24 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Tue, 20 Jul 2010 23:16:24 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: <20100720152359.BDEBE1CBC3C2@idnmail.gen-info.osaka-u.ac.jp> References: <20100707183622.GA16079@thebird.nl> <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> <20100720152359.BDEBE1CBC3C2@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Thanks for your answers. Sure i will try chemruby and i will work on it. Emanuele On Tue, Jul 20, 2010 at 5:23 PM, Naohisa GOTO wrote: > Hi Emanuele, > > No priority, and please choose what you like and what you > can do now. > > For "4) charts or fancy graphics from data", many people > are interested in, but still few codes (Bio::Graphics etc). > There may be many different approches. To write codes > useful for biologists, knowing visualization examples in > the field of biology/bioinformatics may be needed. > For the purpose, reading recent research papers and/or > studying what other projects do (BioConductor, BioJava, > Biopython, BioPerl, etc) would be good. > > If you like "6) update BioSQL support to latest revision", > discuss with Raoul, maintener of BioRuby BioSQL support. > > About ChemRuby, first, please try to use it. It is great if you > can write good example scripts using BioRuby with ChemRuby. > If you find bug, please fix. If you feel something missing, > please add new features. > > Thank you, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Tue, 20 Jul 2010 16:07:43 +0200 > Emanuele Orlando wrote: > > > Thanks Naohisa, sure cross-posting is a good idea. > > In your opinion, there's some priority where focused? > > Raoul, about your list i could be interested for point 4 and 6. But i > wait > > for your feedback :) > > Regards, > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > > > > > On Tue, Jul 20, 2010 at 2:59 PM, Naohisa GOTO > > wrote: > > > > > Hi, > > > > > > On Tue, 20 Jul 2010 21:35:40 +0900 > > > Naohisa GOTO wrote: > > > > > > > I tried to subscribe to the mailing list chemruby-list-jp > > > > in RubyForge, but currently no response from the server. > > > > It seems the list is stopped. > > > > > > It still works. > > > I've just received subscription confirmation mail, > > > and I've subscribed to the list now. > > > > > > I think cross-posting is also good, because there may be > > > few persons in the chemruby list. > > > > > > Naohisa Goto > > > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > > > > -- Emanuele Orlando http://www.emanueleorlando.com From yannick.wurm at unil.ch Fri Jul 23 06:39:10 2010 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Fri, 23 Jul 2010 12:39:10 +0200 Subject: [BioRuby] bioruby vs bioX In-Reply-To: References: Message-ID: Dear List, Here's a thought for a rainy morning. Thanks to new technologies, many biologists end up with large amounts of data & need to figure out a way to script things. They can be caricatured into a few categories: - many attempt PERL because it's the only language they (or their boss) have heard of - others attempt to use R because that's what they learned in their undergraduate biostatistics course - others yet figure out that Python or Ruby are modern alternatives to Perl.... but I think most end up using Python, mostly because they find many examples of biopython code online. Thus most newcomers to bioruby are not newbie biologists but computer geeks that know that ruby is great & need to tackle something biological. I think we're really missing out on the newbie "I'm a biologist & I need to script" market. Yes, there are a few resources, Eg: - Jan's article: http://www.biomedcentral.com/1471-2105/10/221 (are you planning a followup where you show some of bioruby... say to parse blast results & retrieve the corresponding sequences from genbank)? - a few wonderful but still randomly scattered blog posts - http://bioruby.open-bio.org/wiki/SampleCodes - http://bioruby.open-bio.org/wiki/Tutorial - and an almost pathetically empty http://bioruby.open-bio.org/wiki/HOWTOs But only few of these are "biologist non-programmer newbie-proof". And there is no central place to point a complete newbie. Contrast that with the amount of information and ***code that works right away even if you don't understand the details*** found: - in the Biopython cookbook (yes it's ugly, but it does contain example code for most newcomer's questions) http://www.biopython.org/DIST/docs/tutorial/Tutorial.html - on the Scriptome Perl "illegible one-liners that people use": http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/ It is clear that we have a lot of potential. I wonder if proposals for contributions (such as Emanuele's) could not be geared towards improving our newbie-accessibility? I don't like having to point people towards Python/Biopython instead of ruby/Bioruby. Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html From sararayburn at gmail.com Sat Jul 24 23:55:10 2010 From: sararayburn at gmail.com (Sara Rayburn) Date: Sat, 24 Jul 2010 22:55:10 -0500 Subject: [BioRuby] [GSoC] Progress Update Message-ID: <67536FCE-BAC9-4406-ADA9-877FA070253F@gmail.com> Hello, Here is an update on the status of my project implementing speciation/duplication inference algorithms for BioRuby. The SDI algorithm is implemented, tested, and working for binary gene and species trees. This week I've ironed out some performance bottlenecks so that the algorithm executes almost as quickly as the Java implementation for very large trees. I have also completed the implementation of an extension to the algorithm that finds the gene tree rooting that minimizes the number of duplications inferred in the tree. Upcoming work will include extending the algorithm to support trees with more than two children per node. For more details, a full update is on the github wiki[1] for my branch along with a tutorial describing how to use the algorithms. [1] http://wiki.githubcom/srayburn/bioruby/ Thanks, Sara Rayburn From anurag08priyam at gmail.com Sun Jul 25 13:47:54 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sun, 25 Jul 2010 23:17:54 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: <20100717100417.GA23002@thebird.nl> References: <20100717100417.GA23002@thebird.nl> Message-ID: > So the first thing to do is to write the Specs for such a system. The > current Specs are merely object invocations for RDF itself. > > So what I would like to see is Specs that do something real. Rather > than the example.org URI, use something that is meaningful. Write > Specs for using real BioRuby classes and/or NeXML classes. > > Pjotr, have a look at spec/rdf/graph.rb. It should look more meaningful now. However specs for graph does not answer how the rdf api is to be integrated with the rest of BioRuby; more on that in my next mail. > Others, Raoul for one, have ideas too for RDF too. So the Specs will > help out with ideas. > I am interested in working with Raoul at his SPARQL project. -- Anurag Priyam, 3rd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sun Jul 25 14:10:38 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sun, 25 Jul 2010 23:40:38 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: > The rdf lib can be used by any component of BioRuby by using that object as > the subject or object of an rdf statement. However, a cleaner solution would > be to have an Annotatable module mixed into the classes that are likely to > use the rdf lib. Annotatable would just provide a wrapper over the core rdf > lib to work with rdf. To begin with I have added two functions 'annotate' > and 'annotation' which create and return a rdf graph for that object > respectively. The example for these functions is pending in the specs. > However, I was thinking of something like: > > seq = Bio::Sequece.new > seq.annotate do |graph| > graph << [self, CDAO[:foo], 'moo' ] > end > > seq.annotation.query :predicate => CDAO[:foo] > Here is what I have done: I would like you all to have a rough overview of graph.rb, mixins/enumerable.rb and mixins/queryable.rb http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/graph.rb http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/enumerable.rb http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/queryable.rb So a graph contains RDF::Statements that can be enumerated and queried over in various movies. Then have a look at the Annotatable module: http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/annotatable.rb To make a class annotatable this module just needs to be included. Say: class Bio::NeXML::Otu include Bio::RDF::Annotatable end The idea is to add an instance variable( @graph ) to Otu that stores a RDF::Graph object and delegate methods that begin with 'rdf_' to @graph. This way all the API defined for a Graph is available to an Annotatable object( by prefixing the method names with an 'rdf_' ) otu = Bio::NeXML::Otu.new( 't1') otu.annotate do |g| g << [ otu, CDAO[ :label ], "XXX" ] g << [ otu, CDAO[ :discoverer ], "Moo" ] end otu.rdf_query( :predicate => CDAO[ :label ] ) { |s| puts s.subject } is same as otu.annotation.query(:predicate => CDAO[ :label ]) { |s| puts s.subject } -- Anurag Priyam, 3rd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sun Jul 25 16:27:12 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Mon, 26 Jul 2010 01:57:12 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References:

Message-ID: > > > I would like you all to have a rough overview of graph.rb, > mixins/enumerable.rb and mixins/queryable.rb > As in to get an idea of what the API has to offer. Just reading the examples in the doc or sifting through method names should be good enough. > http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/graph.rb > http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/enumerable.rb > http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/queryable.rb > > A thing to notice here would be that Enumerable and Queryable mixins provide most of the functionality to a Graph object. Both these modules build on top of 'each' iterator. > So a graph contains RDF::Statements that can be enumerated and queried over > in various movies. > I really do no know how the word *movies* got in when I wanted to write *ways*. Maybe, because I was discussing the Inception movie with my friend while composing the last mail :P. I am sorry. > > Then have a look at the Annotatable module: > http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/annotatable.rb > > To make a class annotatable this module just needs to be included. Say: > class Bio::NeXML::Otu > include Bio::RDF::Annotatable > end > > With the above snippet we have all the annotation functionality available to any class. > The idea is to add an instance variable( @graph ) to Otu that stores a > RDF::Graph object and delegate methods that begin with 'rdf_' to @graph. > This way all the API defined for a Graph is available to an Annotatable > object( by prefixing the method names with an 'rdf_' ) > > otu = Bio::NeXML::Otu.new( 't1') > otu.annotate do |g| > g << [ otu, CDAO[ :label ], "XXX" ] > g << [ otu, CDAO[ :discoverer ], "Moo" ] > end > > otu.rdf_query( :predicate => CDAO[ :label ] ) { |s| puts s.subject } > > is same as > > otu.annotation.query(:predicate => CDAO[ :label ]) { |s| puts s.subject } > Another solution could be to define an 'each' iterator in Annotatable and mixin the Enumerable and Queryable( mentioned above ) module. With the former solution the advantage is that all the RDF related methods for a class start with 'rdf_'; sounds more contextual. While with the later solution chances are quite good that somebody would override the 'each' iterator or any other method defined by the RDF API. I find delegation to be quite an elegant solution in this case, though the method I have employed for delegation might not be good( method_missing ). A problem that I am facing with delegation is testing. Should I write tests( specs, in this case ) for each possible 'rdf_*' function that can be called on Annotatable? it "should delegate Annotatable#query to Graph#query" These tests will become very redundant, essentially doing the same thing again and again. I was thinking in this direction: http://github.com/yeban/bioruby/blob/rdf/spec/rdf/annotatable.rb#L44 Suggestions or pointers? -- Anurag Priyam, 3rd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From bonnalraoul at ingm.it Wed Jul 28 09:35:10 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Wed, 28 Jul 2010 15:35:10 +0200 Subject: [BioRuby] R Message-ID: <6b40bc67-39c3-4a72-abc5-e0bf023e81c4@ingm.it> Hello, Take a look here, http://github.com/clbustos Very interesting projects about R. I like very much the TCP/IP implementation I'm thinking to distribute programming, detaching & marshalling. -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it From cjfields at illinois.edu Wed Jul 28 11:11:13 2010 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 28 Jul 2010 10:11:13 -0500 Subject: [BioRuby] R In-Reply-To: <6b40bc67-39c3-4a72-abc5-e0bf023e81c4@ingm.it> References: <6b40bc67-39c3-4a72-abc5-e0bf023e81c4@ingm.it> Message-ID: <2074E611-C45C-457B-B2C5-42C5537E8167@illinois.edu> Know if the calls made via C bindings or callouts to the R command line? chris On Jul 28, 2010, at 8:35 AM, Raoul Bonnal wrote: > Hello, > Take a look here, http://github.com/clbustos > Very interesting projects about R. > I like very much the TCP/IP implementation I'm thinking to distribute programming, detaching & marshalling. > > -- > Raoul J.P. Bonnal > Life Science Informatics > Integrative Biology Program > Fondazione INGM > Via F. Sforza 28 > 20122 Milano, IT > phone: +39 02 006 623 26 > fax: +39 02 006 623 46 > http://www.ingm.it > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From bonnalraoul at ingm.it Wed Jul 28 12:07:53 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Wed, 28 Jul 2010 18:07:53 +0200 Subject: [BioRuby] R: R References: <6b40bc67-39c3-4a72-abc5-e0bf023e81c4@ingm.it> <2074E611-C45C-457B-B2C5-42C5537E8167@illinois.edu> Message-ID: <9d733263-0ccb-47de-a0b1-dff1e99de22e@ingm.it> Hi Chris, > -----Messaggio originale----- > Da: Chris Fields [mailto:cjfields at illinois.edu] > Inviato: mercoled? 28 luglio 2010 17:11 > A: Raoul Bonnal > Cc: 'BioRuby ML' > Oggetto: Re: [BioRuby] R > > Know if the calls made via C bindings or callouts to the R command > line? Which project are you talking about? I did a quick check to the code, I suppose no C bindings but I'm not sure. RinRuby: from my feeling it opens a socket on the R command and pipe the R's command to it, locally or remotely, but I think the features are limited. Rserve-Ruby-client: it's pure Ruby implementation of the TCP/IP RServe protocol, just sending R command by network evaluating on server side and then converting in ruby objects on the client side. The main implementation is in Java and works very well. The windows version of the "server" RServe sucks in important features, concurrency and others... On nix you have authentication, detaching, session, etc... Then there is a statistical library which I'd like to try. -- Ra > > chris > > On Jul 28, 2010, at 8:35 AM, Raoul Bonnal wrote: > > > Hello, > > Take a look here, http://github.com/clbustos > > Very interesting projects about R. > > I like very much the TCP/IP implementation I'm thinking to distribute > programming, detaching & marshalling. > > > > -- > > Raoul J.P. Bonnal > > Life Science Informatics > > Integrative Biology Program > > Fondazione INGM > > Via F. Sforza 28 > > 20122 Milano, IT > > phone: +39 02 006 623 26 > > fax: +39 02 006 623 46 > > http://www.ingm.it > > > > > > > > > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby From anurag08priyam at gmail.com Thu Jul 29 12:45:11 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 29 Jul 2010 22:15:11 +0530 Subject: [BioRuby] [GSoC] ORM styled NeXML Message-ID: An object invocation of NeXML is very much ORM like. An Otus and a Tree class could very well be represented as: class Otus include Mapper property :id, :label has_n :trees end class Trees include Mapper property :id, :label belongs_to :otus end And, one would be able to do following: otus1.trees << trees1 otus1.trees = [ trees2, trees3] otus1.trees # => trees1, trees2, trees3 trees1.otus # => otus1 NeXML::Mapper module defines the magic methods property, has_n, and belongs_to which will use reflection to define the needed methods. The above representation of Otus and Trees class feels very succinct to me. It could be due to my Rails/Merb background but, others can see it as a DSL. What is the take of others on this coding style( including acceptance in BioRuby ) ? -- Anurag Priyam, 3rd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From jan.aerts at gmail.com Thu Jul 29 13:38:24 2010 From: jan.aerts at gmail.com (Jan Aerts) Date: Thu, 29 Jul 2010 19:38:24 +0200 Subject: [BioRuby] [GSoC] ORM styled NeXML In-Reply-To: References: Message-ID: Hi Anurag, It seems to me that this way of interaction with the NeXML data is a very good option. The Ruby community is used to this type of representation so I would definitely favour something like it. jan. (on house-hunt in Belgium at the moment; apologies if it takes long to respond) On 29 July 2010 18:45, Anurag Priyam wrote: > An object invocation of NeXML is very much ORM like. An Otus and a Tree > class could very well be represented as: > > class Otus > include Mapper > > property :id, :label > has_n :trees > end > > class Trees > include Mapper > > property :id, :label > belongs_to :otus > end > > And, one would be able to do following: > > otus1.trees << trees1 > otus1.trees = [ trees2, trees3] > otus1.trees > # => trees1, trees2, trees3 > trees1.otus > # => otus1 > > NeXML::Mapper module defines the magic methods property, has_n, and > belongs_to which will use reflection to define the needed methods. > > The above representation of Otus and Trees class feels very succinct to me. > It could be due to my Rails/Merb background but, others can see it as a DSL. > > What is the take of others on this coding style( including acceptance in > BioRuby ) ? > > -- > Anurag Priyam, > 3rd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > From rutgeraldo at gmail.com Thu Jul 29 15:55:36 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Thu, 29 Jul 2010 20:55:36 +0100 Subject: [BioRuby] [GSoC] ORM styled NeXML In-Reply-To: References:

Message-ID: > > > It seems to me that this way of interaction with the NeXML data is a very > good option. The Ruby community is used to this type of representation so I > would definitely favour something like it. > If this is a pattern that the ruby community is happy with then I'm obviously for it. > jan. (on house-hunt in Belgium at the moment; apologies if it takes long to > respond) > And not just anywhere in Belgium, I hear, but in http://www.hoegaarden.com:-) -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From bonnalraoul at ingm.it Fri Jul 30 07:28:05 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Fri, 30 Jul 2010 13:28:05 +0200 Subject: [BioRuby] Indexing fasta file with Ruby 1.9.1 Message-ID: <30c617b1-dcc4-42b9-9ffe-498fc663708b@ingm.it> Hi all, I'd like to index a fasta file and retrieve a bunch of sequences: The dataset: >cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 UGAGGUAGUAGGUUGUAUAGUU >cel-let-7* MIMAT0015091 Caenorhabditis elegans let-7* UGAACUAUGCAAUUUUCUACCUUAC >cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 UCCCUGAGACCUCAAGUGUGA >cel-lin-4* MIMAT0015092 Caenorhabditis elegans lin-4* ACACCUGGGCUCUCCGGGUAC >cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 UGGAAUGUAAAGAAGUAUGUA ... If you want to try curl -O ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz I tried br_bioflat.rb --makeindex mirna-mature-idx mature.fa but I get this messages: >bta-miR-3604 MIMAT0016939 Bos taurus miR-3604 UAACCAAUGUGCAGACUACUGU ===end=== This entry shall be incorrectly indexed. Caught error: # in "mature.fa" position 1178667 ===begin=== >bta-miR-3596 MIMAT0016940 Bos taurus miR-3596 AACCACACAACCUACUACCUCA ===end=== This entry shall be incorrectly indexed. Caught error: # in "mature.fa" position 1178737 ===begin=== >ath-miR854e MIMAT0004283 Arabidopsis thaliana miR854e GAUGAGGAUAGGGAGGAGGAG ===end=== This entry shall be incorrectly indexed. Caught error: # in "mature.fa" position 1178814 ===begin=== >cre-miR1161b MIMAT0005413 Chlamydomonas reinhardtii miR1161b UACUGGAGUUCUCAACAGC ===end=== NOTE: It works with 1.8.7 -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it From ngoto at gen-info.osaka-u.ac.jp Fri Jul 30 10:10:23 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 30 Jul 2010 23:10:23 +0900 Subject: [BioRuby] [GSoC] ORM styled NeXML In-Reply-To: References: Message-ID: <20100730141023.CE1751CBC4AD@idnmail.gen-info.osaka-u.ac.jp> Hi Anurag, On Thu, 29 Jul 2010 22:15:11 +0530 Anurag Priyam wrote: > An object invocation of NeXML is very much ORM like. An Otus and a Tree > class could very well be represented as: > > class Otus > include Mapper > > property :id, :label > has_n :trees > end > > class Trees > include Mapper > > property :id, :label > belongs_to :otus > end > > And, one would be able to do following: > > otus1.trees << trees1 > otus1.trees = [ trees2, trees3] > otus1.trees > # => trees1, trees2, trees3 > trees1.otus > # => otus1 I think the "trees=" should wipe out existing values and then overwrite with the given values, as "=" operator normally does, but the above "trees=" seems to act like Array#concat. I expect: otus1.trees << trees1 otus1.trees # => [ trees1 ] otus1.trees = [ trees2, trees3 ] otus1.trees # => [ trees2, trees3 ] otus1.trees << trees4 otus1.trees # => [ tree2, tree3, tree4 ] tree1.otus # => [] tree2.otus # => [ otus1 ] I also think that the "trees" and "outs" methods should always return an Array, even when with zero or one value. > NeXML::Mapper module defines the magic methods property, has_n, and > belongs_to which will use reflection to define the needed methods. > > The above representation of Otus and Trees class feels very succinct to me. > It could be due to my Rails/Merb background but, others can see it as a DSL. > > What is the take of others on this coding style( including acceptance in > BioRuby ) ? I think DSL-like codings are welcomed if their maintenance is easy. In addition, please don't forget to write reference manual for the methods by using RDoc. One way I know is to write dummy method definitions. For example, # Trees in the Otu. # --- # *Returns*:: Array containing Trees objects def trees if false #dummy for RDoc Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From ngoto at gen-info.osaka-u.ac.jp Fri Jul 30 12:28:12 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Sat, 31 Jul 2010 01:28:12 +0900 Subject: [BioRuby] [GSoC] ORM styled NeXML In-Reply-To: <20100730141023.CE1751CBC4AD@idnmail.gen-info.osaka-u.ac.jp> References: <20100730141023.CE1751CBC4AD@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100731012812.3A6E.EEF6E030@gen-info.osaka-u.ac.jp> > # Trees in the Otu. > # --- > # *Returns*:: Array containing Trees objects > def trees if false #dummy for RDoc def trees; end if false #dummy for RDoc Sorry for mistake. -- Naohisa Goto From bonnalraoul at ingm.it Thu Jul 1 08:58:56 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 1 Jul 2010 10:58:56 +0200 Subject: [BioRuby] R: [GSoC][NeXML and RDF API] Update Message-ID: <2cf53f17-b591-4f36-8f86-5f190d6e6123@ingm.it> Dear All, probably I missed something, do we plan to switch to RSpec, stick to Unit::Test or a mix? -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it > -----Messaggio originale----- > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > bounces at lists.open-bio.org] Per conto di Anurag Priyam > Inviato: gioved? 1 luglio 2010 00:07 > A: bioruby at lists.open-bio.org > Oggetto: [BioRuby] [GSoC][NeXML and RDF API] Update > > In the last week and half of this week I have: > * been able to work out an NeXML serializer - the code sits in the > master > branch[1]. In the API page[ 2 ] I have added a discussion on the > implementation. > * started working on the RDF API - i should be able to come up with > RSpecs > by the end of this week > > In the remaining part of the week I will: > * come with an RDF API implementation > * work on refactoring some of the previous code( matrix and the > sequences > part ) as Pjotr had pointed out in the last review. > > Perhaps, we can have another round of code review: for the NeXML > serializer? > This will help me allocate time in the coming weeks to fix the issues > with > the code. > > [1] http://github.com/yeban/bioruby > [2] > https://www.nescent.org/wg_phyloinformatics/NeXML_and_RDF_API_for_BioRu > by > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Thu Jul 1 10:24:29 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Thu, 1 Jul 2010 19:24:29 +0900 Subject: [BioRuby] R: [GSoC][NeXML and RDF API] Update In-Reply-To: <2cf53f17-b591-4f36-8f86-5f190d6e6123@ingm.it> References: <2cf53f17-b591-4f36-8f86-5f190d6e6123@ingm.it> Message-ID: <20100701102430.1EEC81CBC401@idnmail.gen-info.osaka-u.ac.jp> Hi, We don't "swith to" RSpec. RSpec would be used in addition to Test::Unit in some cases. In the development of Matz Ruby, both Test::Unit and RSpec are used. Test::Unit is mainly used to check functionality and regressions of each component, and to check portability running on every platform. RSpec is mainly used to describe and guarantee Ruby's specifications. For BioRuby, ideally both would be needed, but resource is limited. First priority is to write tests using Test::Unit, because it is bundled with Ruby and thus it is easy to check if all functions correctly work on variety of platforms and Ruby versions. Apart from testing, RSpec may be helpful when designing API, as working examples with documentation. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Thu, 1 Jul 2010 10:58:56 +0200 "Raoul Bonnal" wrote: > Dear All, > probably I missed something, > do we plan to switch to RSpec, stick to Unit::Test or a mix? > > -- > Raoul J.P. Bonnal > Life Science Informatics > Integrative Biology Program > Fondazione INGM > Via F. Sforza 28 > 20122 Milano, IT > phone: +39 02 006 623 26 > fax: +39 02 006 623 46 > http://www.ingm.it > > > > -----Messaggio originale----- > > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > > bounces at lists.open-bio.org] Per conto di Anurag Priyam > > Inviato: gioved? 1 luglio 2010 00:07 > > A: bioruby at lists.open-bio.org > > Oggetto: [BioRuby] [GSoC][NeXML and RDF API] Update > > > > In the last week and half of this week I have: > > * been able to work out an NeXML serializer - the code sits in the > > master > > branch[1]. In the API page[ 2 ] I have added a discussion on the > > implementation. > > * started working on the RDF API - i should be able to come up with > > RSpecs > > by the end of this week > > > > In the remaining part of the week I will: > > * come with an RDF API implementation > > * work on refactoring some of the previous code( matrix and the > > sequences > > part ) as Pjotr had pointed out in the last review. > > > > Perhaps, we can have another round of code review: for the NeXML > > serializer? > > This will help me allocate time in the coming weeks to fix the issues > > with > > the code. > > > > [1] http://github.com/yeban/bioruby > > [2] > > https://www.nescent.org/wg_phyloinformatics/NeXML_and_RDF_API_for_BioRu > > by > > > > -- > > Anurag Priyam, > > 2nd Year Undergraduate, > > Department of Mechanical Engineering, > > IIT Kharagpur. > > +91-9775550642 > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Fri Jul 2 04:49:08 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 2 Jul 2010 13:49:08 +0900 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> <20100628120005.61D751CBC32B@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100702044909.099EA1CBC4C1@idnmail.gen-info.osaka-u.ac.jp> Hi, I find good examples of alternatives for method_missing in the book "Refactoring: Ruby Edition" section 6.18 and 6.19. (http://www.amazon.com/Refactoring-Ruby-Jay-Fields/dp/0321603508 ) Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Mon, 28 Jun 2010 19:52:37 +0530 Anurag Priyam wrote: > > Please never use method_missing. It breaks error reporting and > > makes very hard to debug and maintain both library codes and > > user scripts. > > > > Hmm, I have experienced that. But the way I have used it affects only the > Bio::NeXML::Writer class, so is it not safe in this case? Anyways I will > change it as it does not offer much improvement to the code readability in > my case. I just find it exciting :). > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 12:27:11 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Fri, 2 Jul 2010 17:57:11 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100625065539.GD22887@thebird.nl> References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: > > > The idea here was to implement a type system and stick close to the class > > hierarchy followed in the schema. However, looking back, I myself do not > > find the code for the Matrix class very elegant. > > Over 3000 lines of code for an XML parser sends out alarm bells. If > you have the right testing files it should be easy to refactor. Make > it simpler. Also, when parsing this type of XML some Ruby reflection > may come in handy - I did some of that in my BioRuby GEO parser, which > lives in my GEO branch on github. You should look at each class and > see if you can refactor it down to a single solution. Just make sure > it is not at the expense of readability and understanding. > > Post us some ideas here, before you start hacking code. > > Perhaps it would be better to use Kernel.const_get and initialize the correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken for matrix cells. It would make the code a *lot* shorter. I should also do away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 14:15:12 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Fri, 2 Jul 2010 19:45:12 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: > > > Perhaps it would be better to use Kernel.const_get and initialize the > correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken > for matrix cells. It would make the code a *lot* shorter. I should also do > away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). > > Is it advisable to use Ruby's Matrix class as the base class of Bio::NeXML::Matrix? I can define methods to make it mutable. What I do not like about the matrix class is its use of Vectors in many places. I would like to redefine those methods to work with Arrays. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 14:45:33 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Fri, 2 Jul 2010 20:15:33 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: > > Perhaps it would be better to use Kernel.const_get and initialize the > correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken > for matrix cells. It would make the code a *lot* shorter. I should also do > away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). > > I meant Module#const_get :P. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From ngoto at gen-info.osaka-u.ac.jp Fri Jul 2 20:26:58 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Sat, 03 Jul 2010 05:26:58 +0900 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

Message-ID: <20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Hi Anurag, I don't understand what you want to do, and how the const_get shorten your code. Please show example code. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > > Perhaps it would be better to use Kernel.const_get and initialize the > > correct type. So, if I have a DnaMatrix i would use DnaSequence or DnaToken > > for matrix cells. It would make the code a *lot* shorter. I should also do > > away with the type hierarchy of Rows( DnaSeqRow, RnaSeqRow and others ). > > > > > I meant Module#const_get :P. > > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From anurag08priyam at gmail.com Fri Jul 2 21:07:45 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 02:37:45 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: > > I don't understand what you want to do, and how the const_get shorten > your code. Please show example code. > > Taking matrix for example, in each type of Matrix I have defined an add add_row method which accepts a row which would be of the same kind( DnaSeqMatrix will take DnaSeqRow ) : def add_row( row ) raise InvalidRowException, "DnaSeqRow expected." unless row.instance_of? DnaSeqRow row_set[ row.id ] = row end If instead I define a add_row method in SeqMatrix like this: def add_row( row ) # a DnaSeqMatrix will take a DnaSeqRow type = self.class.to_s.sub( /Matrix/, 'Row' ) klass = NeXML.const_get( type ) raise InvalidRowException, "#{type} expected." unless row.instance_of? klass end This way I won't have to define add_row for each sub type of SeqRow. Similarly for others. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Fri Jul 2 21:16:28 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 02:46:28 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: Other possible solution I can come up with is to use "define_method" or "class_eval" to create methods on the lines of attr_accessor. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From ngoto at gen-info.osaka-u.ac.jp Sat Jul 3 10:40:22 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 3 Jul 2010 19:40:22 +0900 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> Message-ID: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> Hi Anurag, On Fri, 2 Jul 2010 19:45:12 +0530 Anurag Priyam wrote: > Is it advisable to use Ruby's Matrix class as the base class of > Bio::NeXML::Matrix? I can define methods to make it mutable. What I do not > like about the matrix class is its use of Vectors in many places. I would > like to redefine those methods to work with Arrays. It seems it isn't. The Ruby's Matrix class is implemented as the mathematical matrix. It has many useful mathematical matrix operations, but all elements should be Numeric values. However, it seems that NeXML matrix stores not only numeric values but also sequences or characters. In addition, the use of the name Matrix might be a source of confusion or conflicts with Ruby standard Matrix, even if in the separate name space. It may be good to rename Bio::NeXML::Matrix to another name if it isn't hard to keep consistency with the specification of NeXML. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From anurag08priyam at gmail.com Sat Jul 3 10:58:30 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 16:28:30 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> References: <20100624135411.GA14658@thebird.nl> <20100625065539.GD22887@thebird.nl> <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> Message-ID: > In addition, the use of the name Matrix might be a source of confusion > or conflicts with Ruby standard Matrix, even if in the separate name > space. It may be good to rename Bio::NeXML::Matrix to another name > if it isn't hard to keep consistency with the specification of NeXML. > > I think they are called character state matrices in the phylo terminology. But something like CharactersStateMatrices would be two long. What about CharacterMatrices or StateMatrices? Perhaps Rutger can help me here. This thread is getting too long, I am starting a new one regarding some of my doubts related to NeXML sequence implementation. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sat Jul 3 11:13:43 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 16:43:43 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Sequences( doubts ) Message-ID: This is going to be a long mail. NeXML's characters tag serves as a storage block for sequences. Sequences can be described in NeXML in two ways, raw( with the seq tag ) and granular( with the cell tags ). NeXML offers six kind of sequences : 1. Protein( AA ) 2. DNA 3. RNA 4. Restriction 5. Standard 6. Continuous As of now, the NeXML parser just returns the sequence as a string. It should return Bio::Sequence. BioRuby already has classes to work with AA and NA sequences. I was thinking of adding classes to represent Restriction, Standard and Continuous sequences. Should I work on adding support for these as a core BioRuby classes or just as a part of NeXML lib? I will have to adapt Bio::Sequence class to recognize the new sequences. Why does the Bio::Sequence#guess method use the some 90% way of recognition between AA and NA? Why not use regexp instead? -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sat Jul 3 11:15:40 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 3 Jul 2010 16:45:40 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: > > Other possible solution I can come up with is to use "define_method" or > "class_eval" to create methods on the lines of attr_accessor. > > I would like to know, what you would have thought this out? -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From pjotr.public14 at thebird.nl Sat Jul 3 14:25:41 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 3 Jul 2010 16:25:41 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: <20100703142541.GA3153@thebird.nl> On Sat, Jul 03, 2010 at 02:37:45AM +0530, Anurag Priyam wrote: > > > > I don't understand what you want to do, and how the const_get shorten > > your code. Please show example code. > > > > > Taking matrix for example, in each type of Matrix I have defined an add > add_row method which accepts a row which would be of the same kind( > DnaSeqMatrix will take DnaSeqRow ) : so, why do you need to test for the type? see below. > def add_row( row ) > raise InvalidRowException, "DnaSeqRow expected." unless > row.instance_of? DnaSeqRow > row_set[ row.id ] = row > end > > If instead I define a add_row method in SeqMatrix like this: > > def add_row( row ) > # a DnaSeqMatrix will take a DnaSeqRow > type = self.class.to_s.sub( /Matrix/, 'Row' ) > klass = NeXML.const_get( type ) > raise InvalidRowException, "#{type} expected." unless row.instance_of? > klass > end > > This way I won't have to define add_row for each sub type of SeqRow. > Similarly for others. This would be a factory, right? I think what you want to do is good in its objective - trying to shorten the implementation. But do you really need class inspection/reflection here? Asking for the class name is usually prevented by having proper attributes in the classes. That is if you use OOP. Question is whether you really require this. The problem with the code I had (and have) was the really wide and deep use of OOP classes. That led to duplication of code and little 'feel' for correctness of what is in there. Deep OOP hierarchies are evil. Duplication is ugly. Inspection/reflection is evil too - like Naohisa reacted, pretty much - it is only used in exceptional cases when there is no other elegant way of resolving issues. It can be powerful, but only use when really required, as other people often fail to understand what it does - and code should be self-documenting. I think you need to ask more fundamental questions to yourself. Why not use BioRuby basic types for most data represented by NeXML? Only use special objects when there is real added value. So DnaSeqRow would simply be a Sequence (or even list of char) and DnaSeqMatrix would be a list of Sequence. If you have further attributes create a new composite object (like SequenceFeatures, or if you think more functionally again a tuple of sequence(s) and features?). This way you don't create a hierarchy that booms into hundreds of specialized object we won't use elsewhere. To differentiate between a DnaSequence and RnaSequence you do not need different objects. Both are strings (in BioRuby). You could even settle for Ruby's primitive types and containers. Likewise, even if you need a Matrix, you don't need RnaMatrix and DnaMatrix. I am sure of that. They are only specializations in name, the code in there should be identical. If you go down the OOP route, make use of Ruby's mixin's. Search Google for "ruby mixin deep oop hierarchy". My recommendation is to refactor the library to use as primitive a type as possible, at every point. When you run into functionality that requires a more complex type, because there is no other way - that is the moment to design and add it. I don't know the full depth of the NeXML format, but I can predict it consists of primitive types in ordered ways. This can be mirrored by the implementation. If you do it like this you won't have to use inspection (like above question). OOP classes are for harnessing special functionality that go with a certain type. Do not create a type unless you need something special. You can propose changes to existing BioRuby types - in particular with the RDF implementation. I know some people will balk at this rewrite - but to be honest, if you want your library to be useful to others it needs rethinking. I would take a week out of your plan to experiment with different object models - just start with a small subset. When you think something works, roll it out all the way. That can be done quickly. Read, read, read on the Internet about object models. One thing you can consider is to use an intermediate object structure for parsing the XML into Ruby - and next fork it out into logical data structures. I do that regularly as the XML 'model' does not normally map to Ruby well. One example of mine is here http://github.com/pjotrp/swig2doc/blob/master/lib/input/doxyxmlparser.rb Doxy objects are stored in http://github.com/pjotrp/swig2doc/blob/master/lib/cobj/doxy/doxycobjs.rb Note swig2doc also contains a convenience class for using libxml2 in http://github.com/pjotrp/swig2doc/blob/master/lib/input/xmleasyreader.rb And while you are at refactoring, why not make sure the parser does not fill memory. Pj. PS. Are you using another NeXML OOP implementation as a model - Perl, Python, Java? I would like to know, so I can have a look. From pjotr.public14 at thebird.nl Sat Jul 3 14:43:32 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 3 Jul 2010 16:43:32 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] Sequences( doubts ) In-Reply-To: References: Message-ID: <20100703144332.GB3153@thebird.nl> On Sat, Jul 03, 2010 at 04:43:43PM +0530, Anurag Priyam wrote: > This is going to be a long mail. > > NeXML's characters tag serves as a storage block for sequences. Sequences > can be described in NeXML in two ways, raw( with the seq tag ) and granular( > with the cell tags ). NeXML offers six kind of sequences : > 1. Protein( AA ) > 2. DNA > 3. RNA > 4. Restriction > 5. Standard > 6. Continuous How do these sequences differ? In name only? Can you store them as tuples: (:dna,sequence) (:rna,sequence) (:re,sequence) etc. You could argue for a new SequenceType object. To store type + sequence. > As of now, the NeXML parser just returns the sequence as a string. It should > return Bio::Sequence. BioRuby already has classes to work with AA and NA > sequences. I was thinking of adding classes to represent Restriction, > Standard and Continuous sequences. Should I work on adding support for these > as a core BioRuby classes or just as a part of NeXML lib? I will have to > adapt Bio::Sequence class to recognize the new sequences. I think your library needs to return the simplest type possible. Even in standard Ruby containers (even simpler than BioRuby's types). That makes for the most flexible implementation for others to use. BioRuby's types may change in the future too - I am working on that. Your library is not really in the business of creating new types - unless you create new functionality - like an alignment algorithm, or some transformation to a new type. Better keep it simple. If I have a NeXML file containing an alignment of sequences - I expect simply to pull out those sequences with their ID's. Right? You could return a BioRuby Alignment object, but that is overkill. I can make one myself, which I want to use, my own type of MyAlignment. What I really want is a list of (id, list[nucleotide]) or (id, String) in BioRuby's case, if that is what is stored in NeXML. in pseudo code seqlist = NeXML.read(fn).fetch_alignment print seqlist.first > "id","agtct" or in the form of an iterator NeXML.read(fn).fetch_alignment.each_seq do | id, seq | do something end and likewise use cases for other scenarios. For RDF the use cases are similar, I would guess. NeXML.read(fn).fetch_alignment.to_rdf Keep it simple, again. The thing is that most people over complicate things in OOP. All, and I mean all, Bio* projects over complicate things. > Why does the Bio::Sequence#guess method use the some 90% way of recognition > between AA and NA? Why not use regexp instead? I am not a great fan of guessing formats. It is always error prone. Both amino acid sequences and nucleotide sequences can consist of a combination of shared letters. Still, I guess regex's are slower. Feel free to come with an alternative and measure how well it does. But I have trouble seeing why you need it. Pj. From chmille4 at gmail.com Sun Jul 4 13:33:51 2010 From: chmille4 at gmail.com (Chase Miller) Date: Sun, 4 Jul 2010 09:33:51 -0400 Subject: [BioRuby] Bio::Assembly Message-ID: Hi all I've worked with BioPerl in the past, but I'm considering using ruby/bioruby for a new project and have a few questions. - I can't seem to find a Bio::Assembly module for BioRuby. Has anyone done any work on this? - What's the current solution for working with assembly data in BioRuby? If nothing is out there, I may try to take a whack at this myself. Thanks, Chase From ngoto at gen-info.osaka-u.ac.jp Wed Jul 7 14:56:46 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Wed, 07 Jul 2010 10:56:46 -0400 Subject: [BioRuby] Bio::Assembly In-Reply-To: References: Message-ID: <20100707105642.0BA7.EEF6E030@gen-info.osaka-u.ac.jp> Hi Chase, As far as I know, no Ruby/BioRuby components like BioPerl's Bio::Assembly are available. Currently, sequences and qualities formatted in Fasta, FASTQ, ABI, SCF and other file formats can be treated with BioRuby. However, I don't know good ways to handle assembly output data. Thanks, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > Hi all > > > I've worked with BioPerl in the past, but I'm considering using ruby/bioruby > for a new project and have a few questions. > > > - I can't seem to find a Bio::Assembly module for BioRuby. Has anyone > done any work on this? > > > - What's the current solution for working with assembly data in BioRuby? > > > > If nothing is out there, I may try to take a whack at this myself. > > Thanks, > > Chase > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Wed Jul 7 16:17:40 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 7 Jul 2010 18:17:40 +0200 Subject: [BioRuby] Bio::Assembly In-Reply-To: <20100707105642.0BA7.EEF6E030@gen-info.osaka-u.ac.jp> References: <20100707105642.0BA7.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: <20100707161740.GA12452@thebird.nl> Hi Chase, On Wed, Jul 07, 2010 at 10:56:46AM -0400, Naohisa Goto wrote: > As far as I know, no Ruby/BioRuby components like BioPerl's > Bio::Assembly are available. > > Currently, sequences and qualities formatted in Fasta, FASTQ, ABI, SCF > and other file formats can be treated with BioRuby. > However, I don't know good ways to handle assembly output data. Before rewriting from scratch, see if there are useful C/C++ libraries we can map to with SWIG (BioLib project). I can help with that. Alternatively check what is written in JAVA - JRuby makes accessing anything on the JVM rather trivial, these days. Or even interface to Perl libraries and map those to Ruby. I would start with that, then see what is a useful feature set for BioRuby. Design it in such a way that external libraries can be replaced in time, when someone feels like writing the support. We are getting BioRuby plugin support, allowing for flexible approaches to adding functionality. Pj. From pjotr.public14 at thebird.nl Wed Jul 7 17:46:05 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 7 Jul 2010 19:46:05 +0200 Subject: [BioRuby] Bio::Assembly In-Reply-To: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> Message-ID: <20100707174605.GA14014@thebird.nl> On Wed, Jul 07, 2010 at 09:27:17AM -0700, Christian Zmasek wrote: > C/C++? Perl?? Really? > > Do you think it is a good idea to introduce so many dependencies? Not at the same time ;) > While these might not be a problem for expert users, I worry that > the to more complexities are introduced the less likely the avarage > biologist with little coding experience might be tempted to use > BioRuby. It won't affect BioRuby core. Pure Ruby is great - and we should always aim for that with 'core' BioRuby. Still, we don't have enough developers to support every nook and cranny of bioinformatics. We need to get functionality in fast, when we can. If functionality exists elsewhere: use that. It does not make sense to rewrite everything from scratch. As long as we provide clear interfaces, we can always start replacing stuff with pure Ruby. If someone feels like recoding. By forcing dependencies into a 'plugin' we still keep BioRuby pure. People are free to create plugins, which may have dependencies. If you want the functionality badly enough, and you don't want to write it yourself, find the way of using the plugin. This is one of the major reasons for providing a plugin infrastructure. Which, btw, is the same plugin system that Rails uses (thanks to Raoul and Toshiaki). A plugin is not core BioRuby. BioRuby itself does not get dependencies - other than highly common libraries like libxml. We simply don't have the people to achieve everything. Not to mention that many libraries, like EMBOSS, outperform Ruby in terms of processing speed and memory consumption. When we call BLAST we don't write BLAST ourselves in Ruby. Those are also dependencies. Outside dealing with dependencies one thing we may want to think about is incompatible plugins. For example, if I were to use a plugin for the JVM, it may not work together with a plugin for standard Ruby. My take is that it does not really matter. You have to choose one or the other ;). Truth is we have too small a community to provide the luxury edition of BioRuby which can handle everything (which is also true for the other Bio* projects). See mappings and dependencies as part of the development of the ultimate BioRuby. A process, transition, evolution. Plugins make it possible. Pj. From emanuele.orlando at gmail.com Wed Jul 7 18:00:58 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Wed, 7 Jul 2010 20:00:58 +0200 Subject: [BioRuby] contribute to Bioruby Message-ID: Dear Bioruby, This is my first mail to the group. My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from three years i'm working as IT consultant. As others i love ruby and i would be happy to contribute to the development of bioruby. How can i make some contributions? Thanks Emanuele -- Emanuele Orlando http://www.emanueleorlando.com http://it.linkedin.com/in/kooru From pjotr.public14 at thebird.nl Wed Jul 7 18:36:22 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Wed, 7 Jul 2010 20:36:22 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: Message-ID: <20100707183622.GA16079@thebird.nl> Welcome Emanuele, On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > Dear Bioruby, > > This is my first mail to the group. > My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from > three years i'm working as IT consultant. > As others i love ruby and i would be happy to contribute to the development > of bioruby. How can i make some contributions? Choose a topic to work on. You can simply get a free account on github.com and clone the repository. Start coding - when it works we can easily merge it in to the main tree. If you want to discuss programming and design, just mail this list. Pj. From ngoto at gen-info.osaka-u.ac.jp Wed Jul 7 18:41:25 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Wed, 07 Jul 2010 14:41:25 -0400 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: Message-ID: <20100707144119.EC29.EEF6E030@gen-info.osaka-u.ac.jp> Hi Emanuele, Thanks. See similar topics for general introduction. http://lists.open-bio.org/pipermail/bioruby/2010-June/001319.html In addition, I'd like to suggest ChemRuby, cheminformatics library written in Ruby, and a closely related project with BioRuby. http://chemruby.org/ It was developed together with BioRuby, but have not been maintained in these days. It may also be great if you can contribute to ChemRuby. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > Dear Bioruby, > > This is my first mail to the group. > My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from > three years i'm working as IT consultant. > As others i love ruby and i would be happy to contribute to the development > of bioruby. How can i make some contributions? > > Thanks > > Emanuele > > -- > Emanuele Orlando > http://www.emanueleorlando.com > http://it.linkedin.com/in/kooru > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Wed Jul 7 20:42:16 2010 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 8 Jul 2010 05:42:16 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: <20100707183622.GA16079@thebird.nl> References: <20100707183622.GA16079@thebird.nl> Message-ID: Hi, This explanation by Jan may be useful when you try GitHub. http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html Toshiaki On 2010/07/08, at 3:36, Pjotr Prins wrote: > Welcome Emanuele, > > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: >> Dear Bioruby, >> >> This is my first mail to the group. >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and from >> three years i'm working as IT consultant. >> As others i love ruby and i would be happy to contribute to the development >> of bioruby. How can i make some contributions? > > Choose a topic to work on. You can simply get a free account on > github.com and clone the repository. Start coding - when it works we > can easily merge it in to the main tree. > > If you want to discuss programming and design, just mail this list. > > Pj. > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ngoto at gen-info.osaka-u.ac.jp Wed Jul 7 22:40:21 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Wed, 07 Jul 2010 18:40:21 -0400 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100707184019.38B3.EEF6E030@gen-info.osaka-u.ac.jp> Hi Anurag, > I think they are called character state matrices in the phylo terminology. > But something like CharactersStateMatrices would be two long. What about > CharacterMatrices or StateMatrices? Perhaps Rutger can help me here. It is generally bad thing to abbreviate only because it is too long. Modifying the upstream original names might be a source of confusion. In this case, using CharactersStateMatrices as is is the best. If the name is expected to be frequently used by library users, short name could also be added. However, as Pjotr already mentioned, the class might not be needed, depending on the design of classes. Sorry for late response. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org From emanuele.orlando at gmail.com Thu Jul 8 09:46:19 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Thu, 8 Jul 2010 11:46:19 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: Thanks to all for the precious links. I didn't know chemruby and now i've some questions :) Its mailing list is empty. If i want discuss about programming/design of chemruby, where do i mail? On bioruby list with a specific tag (example [CHEMRUBY]) or i use chemruby list? At the beginning who has contributed for Chemruby? Priority where focused? Thanks Emanuele -- Emanuele Orlando http://www.emanueleorlando.com http://it.linkedin.com/in/kooru On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama wrote: > Hi, > > This explanation by Jan may be useful when you try GitHub. > > http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html > > Toshiaki > > On 2010/07/08, at 3:36, Pjotr Prins wrote: > > > Welcome Emanuele, > > > > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > >> Dear Bioruby, > >> > >> This is my first mail to the group. > >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and > from > >> three years i'm working as IT consultant. > >> As others i love ruby and i would be happy to contribute to the > development > >> of bioruby. How can i make some contributions? > > > > Choose a topic to work on. You can simply get a free account on > > github.com and clone the repository. Start coding - when it works we > > can easily merge it in to the main tree. > > > > If you want to discuss programming and design, just mail this list. > > > > Pj. > > _______________________________________________ > > BioRuby Project - http://www.bioruby.org/ > > BioRuby mailing list > > BioRuby at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From chmille4 at gmail.com Thu Jul 8 11:29:41 2010 From: chmille4 at gmail.com (Chase Miller) Date: Thu, 8 Jul 2010 07:29:41 -0400 Subject: [BioRuby] Bio::Assembly In-Reply-To: <20100707174605.GA14014@thebird.nl> References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> <20100707174605.GA14014@thebird.nl> Message-ID: Pjtor Thanks for the great ideas! The BioLib project sounds interesting. I'll have to ask my old GSOC mentor Mark Jensen about it. However, for my purposes, It may be easier to just code it up in ruby (also I'm dying to write ruby code after moving over from perl :) ). What I need is an ace parser and fairly simple scaffold and contig objects. Although slim in features, it could be a good starting point for a pure ruby assembly module, which I wouldn't mind maintaining. How closely does BioRuby like to follow the BioPerl API? I've noticed that BioRuby seems to handle file formats differently than BioPerl, with most of them being in Bio::db/. Can anyone expand on this? Is there any place I can read about the best practices for BioRuby. For example, I haven't seen any instances of using hashes to pass parameters in method calls e.g. a.parse( :file => file, :format => format ) Is this frowned upon in BioRuby? Thanks, Chase On Wed, Jul 7, 2010 at 1:46 PM, Pjotr Prins wrote: > On Wed, Jul 07, 2010 at 09:27:17AM -0700, Christian Zmasek wrote: > > C/C++? Perl?? Really? > > > > Do you think it is a good idea to introduce so many dependencies? > > Not at the same time ;) > > > While these might not be a problem for expert users, I worry that > > the to more complexities are introduced the less likely the avarage > > biologist with little coding experience might be tempted to use > > BioRuby. > > It won't affect BioRuby core. > > Pure Ruby is great - and we should always aim for that with 'core' > BioRuby. Still, we don't have enough developers to support every nook > and cranny of bioinformatics. > > We need to get functionality in fast, when we can. If functionality > exists elsewhere: use that. It does not make sense to rewrite > everything from scratch. As long as we provide clear interfaces, we > can always start replacing stuff with pure Ruby. If someone feels > like recoding. > > By forcing dependencies into a 'plugin' we still keep BioRuby pure. > People are free to create plugins, which may have dependencies. If you > want the functionality badly enough, and you don't want to write it > yourself, find the way of using the plugin. This is one of the major > reasons for providing a plugin infrastructure. Which, btw, is the > same plugin system that Rails uses (thanks to Raoul and Toshiaki). > > A plugin is not core BioRuby. BioRuby itself does not get dependencies > - other than highly common libraries like libxml. > > We simply don't have the people to achieve everything. Not to mention > that many libraries, like EMBOSS, outperform Ruby in terms of > processing speed and memory consumption. When we call BLAST we don't > write BLAST ourselves in Ruby. Those are also dependencies. > > Outside dealing with dependencies one thing we may want to think about > is incompatible plugins. For example, if I were to use a plugin > for the JVM, it may not work together with a plugin for standard Ruby. > > My take is that it does not really matter. You have to choose one or > the other ;). > > Truth is we have too small a community to provide the luxury edition > of BioRuby which can handle everything (which is also true for the > other Bio* projects). > > See mappings and dependencies as part of the development of the > ultimate BioRuby. A process, transition, evolution. Plugins make it > possible. > > Pj. > From daijiendoh at gmail.com Thu Jul 8 11:41:57 2010 From: daijiendoh at gmail.com (=?ISO-2022-JP?B?GyRCMXNGI0JnRnMbKEI=?=) Date: Thu, 8 Jul 2010 20:41:57 +0900 Subject: [BioRuby] KEGG API Message-ID: Dear All I have a question about KEGG API on BioRuby. When I use "type" method, "deprecated" message was returned. While, the method "type" is still working, refer to KEGG manual. ## Script I used is below relations = get_element_relations_by_ pathway('path:bsu00010') relations.each do |rel| puts rel.element_id1 puts rel.element_id2 puts rel.type rel.subtypes.each do |sub| puts sub.element_id puts sub.relation puts sub.type end end Result for sub.type => "sub.type was deprecated use class" How can I receive type property in the class "get_element_relations_by_pathway" ? With best wishes, Daiji Endoh From email2ants at gmail.com Thu Jul 8 12:24:41 2010 From: email2ants at gmail.com (Anthony Underwood) Date: Thu, 8 Jul 2010 13:24:41 +0100 Subject: [BioRuby] Bio::Assembly In-Reply-To: References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> <20100707174605.GA14014@thebird.nl> Message-ID: Hi Chase I for one would love to have a ruby implementation of a Bio::Assembly module. I wrote a Bio::Chromatogram module for bioruby to replicate the bioperl functionality since I can not stand writing in perl any longer!! I have thought about writing an ace parser also but have not yet had time. Count this a +1 from me! Anthony On 8 Jul 2010, at 12:29, Chase Miller wrote: > Pjtor > > Thanks for the great ideas! The BioLib project sounds interesting. I'll > have to ask my old GSOC mentor Mark Jensen about it. > > However, for my purposes, It may be easier to just code it up in ruby (also > I'm dying to write ruby code after moving over from perl :) ). What I need > is an ace parser and fairly simple scaffold and contig objects. Although > slim in features, it could be a good starting point for a pure ruby assembly > module, which I wouldn't mind maintaining. > > How closely does BioRuby like to follow the BioPerl API? > > I've noticed that BioRuby seems to handle file formats differently than > BioPerl, with most of them being in Bio::db/. Can anyone expand on this? > > Is there any place I can read about the best practices for BioRuby. For > example, I haven't seen any instances of using hashes to pass parameters in > method calls e.g. > > > a.parse( :file => file, :format => format ) > > Is this frowned upon in BioRuby? > > > Thanks, > Chase > > > On Wed, Jul 7, 2010 at 1:46 PM, Pjotr Prins wrote: > >> On Wed, Jul 07, 2010 at 09:27:17AM -0700, Christian Zmasek wrote: >>> C/C++? Perl?? Really? >>> >>> Do you think it is a good idea to introduce so many dependencies? >> >> Not at the same time ;) >> >>> While these might not be a problem for expert users, I worry that >>> the to more complexities are introduced the less likely the avarage >>> biologist with little coding experience might be tempted to use >>> BioRuby. >> >> It won't affect BioRuby core. >> >> Pure Ruby is great - and we should always aim for that with 'core' >> BioRuby. Still, we don't have enough developers to support every nook >> and cranny of bioinformatics. >> >> We need to get functionality in fast, when we can. If functionality >> exists elsewhere: use that. It does not make sense to rewrite >> everything from scratch. As long as we provide clear interfaces, we >> can always start replacing stuff with pure Ruby. If someone feels >> like recoding. >> >> By forcing dependencies into a 'plugin' we still keep BioRuby pure. >> People are free to create plugins, which may have dependencies. If you >> want the functionality badly enough, and you don't want to write it >> yourself, find the way of using the plugin. This is one of the major >> reasons for providing a plugin infrastructure. Which, btw, is the >> same plugin system that Rails uses (thanks to Raoul and Toshiaki). >> >> A plugin is not core BioRuby. BioRuby itself does not get dependencies >> - other than highly common libraries like libxml. >> >> We simply don't have the people to achieve everything. Not to mention >> that many libraries, like EMBOSS, outperform Ruby in terms of >> processing speed and memory consumption. When we call BLAST we don't >> write BLAST ourselves in Ruby. Those are also dependencies. >> >> Outside dealing with dependencies one thing we may want to think about >> is incompatible plugins. For example, if I were to use a plugin >> for the JVM, it may not work together with a plugin for standard Ruby. >> >> My take is that it does not really matter. You have to choose one or >> the other ;). >> >> Truth is we have too small a community to provide the luxury edition >> of BioRuby which can handle everything (which is also true for the >> other Bio* projects). >> >> See mappings and dependencies as part of the development of the >> ultimate BioRuby. A process, transition, evolution. Plugins make it >> possible. >> >> Pj. >> > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Thu Jul 8 13:13:03 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 8 Jul 2010 15:13:03 +0200 Subject: [BioRuby] Bio::Assembly In-Reply-To: References: <9EF0453B5FBDC34C92D85CBFABA4AE4D02DCD477DE@MAIL07.burnham.org> <20100707174605.GA14014@thebird.nl>

Message-ID: <20100708131303.GA22116@thebird.nl> Glad to see this excitement :). On Thu, Jul 08, 2010 at 01:24:41PM +0100, Anthony Underwood wrote: > > How closely does BioRuby like to follow the BioPerl API? Not. Though you wouldn't be the first to copy stuff. > > I've noticed that BioRuby seems to handle file formats differently than > > BioPerl, with most of them being in Bio::db/. Can anyone expand on this? > > > > Is there any place I can read about the best practices for BioRuby. For > > example, I haven't seen any instances of using hashes to pass parameters in > > method calls e.g. > > a.parse( :file => file, :format => format ) I am sure it is used and it is fine, when used for 'setting' options. Only risk is that you lose checking of the number of parameters passed. Which can lead to bugs. Otherwise I like it for being explanatory in the calling code. Mind: It can lead to 'rich' interfaces - so common in R, where one method handles many circumstances. These rich methods tend to be really ugly and hard to test (=hard to prove correct). Don't use it to replace multiple methods. Methods also are explanatory in the calling code. > > Is this frowned upon in BioRuby? I don't think so. But use it when it makes sense. Pj. From bonnalraoul at ingm.it Thu Jul 8 15:46:52 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 8 Jul 2010 17:46:52 +0200 Subject: [BioRuby] Integration ruby-ffi Message-ID: <8a0c37b7-2860-4636-ac06-b17b3cb78324@ingm.it> I think this library http://github.com/ffi/ffi is quite interesting. Now is possible for example use nokogiri from JRuby. Pjotr, have you tried to use that library for your binding purposes ? -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it From ngoto at gen-info.osaka-u.ac.jp Thu Jul 8 19:34:19 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa Goto) Date: Thu, 08 Jul 2010 15:34:19 -0400 Subject: [BioRuby] KEGG API In-Reply-To: References: Message-ID: <20100708153417.38CC.EEF6E030@gen-info.osaka-u.ac.jp> Hi, It is reproduced with SOAP4R 1.5.8, the latest release version of SOAP4R. It can not be reproduced with SOAP4R 1.5.5 bundled with Ruby 1.8.7-p299. >From the revision 1683 of SOAP4R, it does not add mappings for already defined methods. http://dev.ctor.org/soap4r/changeset/1683 In Debian (and Ubuntu) package version of Ruby 1.8.7, the patch is applied to prevent memory exhaust problem as a security fix. The workaround is to use SOAP::Mapping::Object#[]. For example, use rel["type"] instead of rel.type. api = Bio::KEGG::API.new relations = api.get_element_relations_by_pathway('path:bsu00010') relations.each do |rel| puts rel.element_id1 puts rel.element_id2 puts rel["type"] rel.subtypes.each do |sub| puts sub.element_id puts sub.relation puts sub["type"] end end > Dear All > > I have a question about KEGG API on BioRuby. > When I use "type" method, "deprecated" message was returned. > While, the method "type" is still working, refer to KEGG manual. > > ## Script I used is below > > relations = get_element_relations_by_ > pathway('path:bsu00010') > relations.each do |rel| > puts rel.element_id1 > puts rel.element_id2 > puts rel.type > rel.subtypes.each do |sub| > puts sub.element_id > puts sub.relation > puts sub.type > end > end > > Result for sub.type => "sub.type was deprecated use class" > > > How can I receive type property in the class > "get_element_relations_by_pathway" ? > > With best wishes, > > Daiji Endoh > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From pjotr.public14 at thebird.nl Thu Jul 8 20:22:47 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Thu, 8 Jul 2010 22:22:47 +0200 Subject: [BioRuby] Integration ruby-ffi In-Reply-To: <8a0c37b7-2860-4636-ac06-b17b3cb78324@ingm.it> References: <8a0c37b7-2860-4636-ac06-b17b3cb78324@ingm.it> Message-ID: <20100708202247.GA26254@thebird.nl> On Thu, Jul 08, 2010 at 05:46:52PM +0200, Raoul Bonnal wrote: > I think this library http://github.com/ffi/ffi is quite interesting. Now is possible for example use nokogiri from JRuby. > > Pjotr, have you tried to use that library for your binding purposes > ? Not yet. The purpose of BioLib is cross-language, really. Kill many birds with one stone ;). Still ffi is cool, and initially easier than SWIG. Not sure how easy it is in different deployments. Pj. From rutgeraldo at gmail.com Sat Jul 10 14:58:00 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Sat, 10 Jul 2010 15:58:00 +0100 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100707184019.38B3.EEF6E030@gen-info.osaka-u.ac.jp> References: <20100703104022.A55801CBC512@idnmail.gen-info.osaka-u.ac.jp> <20100707184019.38B3.EEF6E030@gen-info.osaka-u.ac.jp> Message-ID: Hi, sorry for the even later response on my end... The BioRuby Matrix class is not a suitable superclass for character state matrices, which are essentially the generalized form of multiple sequence alignments (but then also allowing for other types of homologyzed data). I am tempted to suggest you make at least some (and maybe all) nexml character state matrices either inherit from an alignment class or easily convertible to it: if people parse a nexml file with a dna alignment there's a good chance they'll want to be able to use that as an alignment object elsewhere in their code. As an aside, I have no problem with using CharacterStateMatrix as a class name. I don't see people having to type it that frequently so it's not a big deal, right? Maybe I think this because my java work is starting to get to me, though :) Rutger On Wed, Jul 7, 2010 at 11:40 PM, Naohisa Goto wrote: > Hi Anurag, > > > I think they are called character state matrices in the phylo > terminology. > > But something like CharactersStateMatrices would be two long. What about > > CharacterMatrices or StateMatrices? Perhaps Rutger can help me here. > > It is generally bad thing to abbreviate only because it is too long. > Modifying the upstream original names might be a source of confusion. > > In this case, using CharactersStateMatrices as is is the best. If the > name is expected to be frequently used by library users, short name > could also be added. > > However, as Pjotr already mentioned, the class might not be needed, > depending on the design of classes. > > Sorry for late response. > > -- > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ngoto at bioruby.org > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From anurag08priyam at gmail.com Sun Jul 11 06:51:03 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sun, 11 Jul 2010 12:21:03 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: <20100703142541.GA3153@thebird.nl> References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> <20100703142541.GA3153@thebird.nl> Message-ID: > > This would be a factory, right? > > I think what you want to do is good in its objective - trying to > shorten the implementation. But do you really need > class inspection/reflection here? Asking for the class name is usually > prevented by having proper attributes in the classes. That is if you > use OOP. Question is whether you really require this. > > The problem with the code I had (and have) was the really wide and > deep use of OOP classes. That led to duplication of code and little > 'feel' for correctness of what is in there. Deep OOP hierarchies are > evil. Duplication is ugly. > > Inspection/reflection is evil too - like Naohisa reacted, pretty much > - it is only used in exceptional cases when there is no other elegant > way of resolving issues. It can be powerful, but only use when really > required, as other people often fail to understand what it does - and > code should be self-documenting. > > I think you need to ask more fundamental questions to yourself. > > Why not use BioRuby basic types for most data represented by NeXML? > Only use special objects when there is real added value. So DnaSeqRow > would simply be a Sequence (or even list of char) and DnaSeqMatrix > would be a list of Sequence. If you have further attributes create a > new composite object (like SequenceFeatures, or if you think more > functionally again a tuple of sequence(s) and features?). This way > you don't create a hierarchy that booms into hundreds of specialized > object we won't use elsewhere. To differentiate between a DnaSequence > and RnaSequence you do not need different objects. Both are strings > (in BioRuby). You could even settle for Ruby's primitive types and > containers. > > Likewise, even if you need a Matrix, you don't need RnaMatrix and > DnaMatrix. I am sure of that. They are only specializations in name, > the code in there should be identical. > > If you go down the OOP route, make use of Ruby's mixin's. Search > Google for "ruby mixin deep oop hierarchy". > > My recommendation is to refactor the library to use as primitive a > type as possible, at every point. When you run into functionality that > requires a more complex type, because there is no other way - that is > the moment to design and add it. > Point noted :). > I don't know the full depth of the NeXML format, but I can predict > it consists of primitive types in ordered ways. This can be mirrored > by the implementation. If you do it like this you won't have to use > inspection (like above question). OOP classes are for harnessing > special functionality that go with a certain type. Do not create a type > unless you need something special. > The fact that I do not know anything about bio* and phylo* also leads to some amount of confusion :P. Due to some rotten luck I was not able to confer with Rutger. I will discuss with Rutger and refactor the code keeping your suggestions in mind. > You can propose changes to existing BioRuby types - in particular > with the RDF implementation. > > I know some people will balk at this rewrite - but to be honest, if > you want your library to be useful to others it needs rethinking. I > would take a week out of your plan to experiment with different object > models - just start with a small subset. When you think something > works, roll it out all the way. That can be done quickly. Read, read, > read on the Internet about object models. > > One thing you can consider is to use an intermediate object structure > for parsing the XML into Ruby - and next fork it out into logical > data structures. I do that regularly as the XML 'model' does not > normally map to Ruby well. One example of mine is here > > http://github.com/pjotrp/swig2doc/blob/master/lib/input/doxyxmlparser.rb > > Doxy objects are stored in > > http://github.com/pjotrp/swig2doc/blob/master/lib/cobj/doxy/doxycobjs.rb > > Note swig2doc also contains a convenience class for using libxml2 in > > http://github.com/pjotrp/swig2doc/blob/master/lib/input/xmleasyreader.rb > > And while you are at refactoring, why not make sure the parser does > not fill memory. > > Pj. > > PS. Are you using another NeXML OOP implementation as a model - Perl, > Python, Java? I would like to know, so I can have a look. > > Not using as a model but I sometimes refer to the python implementation :- http://nexml.org/nexml/python/ -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From rutgeraldo at gmail.com Sun Jul 11 10:13:42 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Sun, 11 Jul 2010 11:13:42 +0100 Subject: [BioRuby] [GSoC][NeXML and RDF API] Code Review. In-Reply-To: References:

<20100703052657.15C0.EEF6E030@gen-info.osaka-u.ac.jp> <20100703142541.GA3153@thebird.nl> Message-ID: > > Not using as a model but I sometimes refer to the python > implementation :- http://nexml.org/nexml/python/ > There's also a perl and a java implementation on that same website to gain inspiration from. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From emanuele.orlando at gmail.com Thu Jul 15 12:29:57 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Thu, 15 Jul 2010 14:29:57 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: Hi, any news to my questions? :) Best regards, -- Emanuele Orlando http://www.emanueleorlando.com On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < emanuele.orlando at gmail.com> wrote: > Thanks to all for the precious links. > I didn't know chemruby and now i've some questions :) > Its mailing list is empty. If i want discuss about programming/design of > chemruby, where do i mail? On bioruby list with a specific tag (example > [CHEMRUBY]) or i use chemruby list? > At the beginning who has contributed for Chemruby? > Priority where focused? > > Thanks > Emanuele > > -- > Emanuele Orlando > http://www.emanueleorlando.com > http://it.linkedin.com/in/kooru > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama wrote: > >> Hi, >> >> This explanation by Jan may be useful when you try GitHub. >> >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html >> >> Toshiaki >> >> On 2010/07/08, at 3:36, Pjotr Prins wrote: >> >> > Welcome Emanuele, >> > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: >> >> Dear Bioruby, >> >> >> >> This is my first mail to the group. >> >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and >> from >> >> three years i'm working as IT consultant. >> >> As others i love ruby and i would be happy to contribute to the >> development >> >> of bioruby. How can i make some contributions? >> > >> > Choose a topic to work on. You can simply get a free account on >> > github.com and clone the repository. Start coding - when it works we >> > can easily merge it in to the main tree. >> > >> > If you want to discuss programming and design, just mail this list. >> > >> > Pj. >> > _______________________________________________ >> > BioRuby Project - http://www.bioruby.org/ >> > BioRuby mailing list >> > BioRuby at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioruby >> >> >> _______________________________________________ >> BioRuby Project - http://www.bioruby.org/ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > > > From bonnalraoul at ingm.it Thu Jul 15 12:48:20 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Thu, 15 Jul 2010 14:48:20 +0200 Subject: [BioRuby] R: contribute to Bioruby References: <20100707183622.GA16079@thebird.nl> Message-ID: Dear Emanuele, Now we are working on: 1) creating a plugin system, something more dynamic than gem for experimenting new code and include external bindings like BioLib (cross OpenBio* projects for high performance algorithms in BioInformatics) 2) using the ActiveRDF library to query sparql, I'd like to define a "smart" way to use the sparql language. I need some time to write down a prototype I have in my mind. I think that you have already heard something about Semantic Web in you consultant experience 3) Support EMBOSS suite dynamically, reading .acd configuration files 4) How to create charts or fancy graphics from data (we need a cross platform way)? 5) samtool ? 6) update BioSQL support to latest revision please add other ideas below. I'm currently working on 1), and 2) -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it > -----Messaggio originale----- > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > bounces at lists.open-bio.org] Per conto di Emanuele Orlando > Inviato: gioved? 15 luglio 2010 14:30 > A: BioRuby ML > Oggetto: Re: [BioRuby] contribute to Bioruby > > Hi, > > any news to my questions? :) > > Best regards, > > -- > Emanuele Orlando > http://www.emanueleorlando.com > > On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < > emanuele.orlando at gmail.com> wrote: > > > Thanks to all for the precious links. > > I didn't know chemruby and now i've some questions :) > > Its mailing list is empty. If i want discuss about programming/design > of > > chemruby, where do i mail? On bioruby list with a specific tag > (example > > [CHEMRUBY]) or i use chemruby list? > > At the beginning who has contributed for Chemruby? > > Priority where focused? > > > > Thanks > > Emanuele > > > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > http://it.linkedin.com/in/kooru > > > > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama > wrote: > > > >> Hi, > >> > >> This explanation by Jan may be useful when you try GitHub. > >> > >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would- > that-work.html > >> > >> Toshiaki > >> > >> On 2010/07/08, at 3:36, Pjotr Prins wrote: > >> > >> > Welcome Emanuele, > >> > > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > >> >> Dear Bioruby, > >> >> > >> >> This is my first mail to the group. > >> >> My name is Emanuele Orlando, i'm graduate in Computation > Chemistry and > >> from > >> >> three years i'm working as IT consultant. > >> >> As others i love ruby and i would be happy to contribute to the > >> development > >> >> of bioruby. How can i make some contributions? > >> > > >> > Choose a topic to work on. You can simply get a free account on > >> > github.com and clone the repository. Start coding - when it works > we > >> > can easily merge it in to the main tree. > >> > > >> > If you want to discuss programming and design, just mail this > list. > >> > > >> > Pj. > >> > _______________________________________________ > >> > BioRuby Project - http://www.bioruby.org/ > >> > BioRuby mailing list > >> > BioRuby at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioruby > >> > >> > >> _______________________________________________ > >> BioRuby Project - http://www.bioruby.org/ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> > > > > > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From anurag08priyam at gmail.com Thu Jul 15 14:57:45 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Thu, 15 Jul 2010 20:27:45 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API Message-ID: Hello all, I have worked out an initial set of specs for the RDF API. The code is in 'rdf' branch - http://github.com/yeban/bioruby/tree/rdf. I am providing an overview here: To start with I have put the specs in bioruby/spec directory. I took the liberty of adding a rake task to execute all the specs. Most of the specs will fail as of now and some are pending. "rake spec SPEC_OPTS="--format nested" " should be good to get a rough overview of the specs. The lib itself( currently only bare class definition ) resides in bioruby/lib/bio/rdf directory and uses Bio::RDF namespace. At the core are Literal, Node, URI and classes, which form the subject, predicate, object and context of any RDF statement. An RDF statement can be created as an instance of Statement class. A collection of Statements form a Graph. An RDF graph can be queried for statements with a given subject, predicate or object. We can define new Vocabularies with the Vocabulary class. I am explaining the vocabulary class in more detail below. RDF vocabularies are defined on a namespace uri. Say, the XSD vocabulary that defines datatypes for literals. XSD is defined on " http://www.w3.org/2001/XMLSchema#" namespace with the 'xsd' prefix. So the actual URI for the curie "xsd:double" goes like " http://www.w3.org/2001/XMLSchema#double". The rational is to have such URI and curie automatically generated : xsd = Vocabulary.new "http://www.w3.org/2001/XMLSchema#" xsd[:double] I was thinking of having commonly used vocabulary defined in the lib so someone could use it out of box like: XSD[:double] or CDAO[:foo]. The rdf lib can be used by any component of BioRuby by using that object as the subject or object of an rdf statement. However, a cleaner solution would be to have an Annotatable module mixed into the classes that are likely to use the rdf lib. Annotatable would just provide a wrapper over the core rdf lib to work with rdf. To begin with I have added two functions 'annotate' and 'annotation' which create and return a rdf graph for that object respectively. The example for these functions is pending in the specs. However, I was thinking of something like: seq = Bio::Sequece.new seq.annotate do |graph| graph << [self, CDAO[:foo], 'moo' ] end seq.annotation.query :predicate => CDAO[:foo] I think with this design we can maintain loose coupling between the rdf lib and bioruby components. I have just begun creating the classes to realize the specs, so the design can still be modified completely if I am in a wrong direction. In thinking out the rdf lib, I have mostly referred to the RDF primer and Wikipedia. I might have gone wrong on some RDF concepts too. Please correct :). -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From rutgeraldo at gmail.com Thu Jul 15 15:58:44 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Thu, 15 Jul 2010 17:58:44 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: I was thinking of having commonly used vocabulary defined in the lib so > someone could use it out of box like: XSD[:double] or CDAO[:foo]. > It will probably turn out that this is useful. Perhaps also things such as Dublin Core, SKOS, DCTerms, Prism, DarwinCore I think with this design we can maintain loose coupling between the rdf lib > and bioruby components. I have just begun creating the classes to realize > the specs, so the design can still be modified completely if I am in a wrong > direction. > > In thinking out the rdf lib, I have mostly referred to the RDF primer and > Wikipedia. I might have gone wrong on some RDF concepts too. Please correct > :). > I like the design, I'm curious to hear how the experts like it. You haven't mentioned this explicitly here but I know you've been thinking about recursively nested statements, right? Rutger -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From anurag08priyam at gmail.com Sat Jul 17 02:51:23 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 17 Jul 2010 08:21:23 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: > > I like the design, I'm curious to hear how the experts like it. You > haven't mentioned this explicitly here but I know you've been thinking about > recursively nested statements, right? > That would be an RDF graph with blank node as the object of the statement at which nesting starts and the same blank node will be the subject of the nested statements. Right? I have considered that. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sat Jul 17 06:54:15 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sat, 17 Jul 2010 12:24:15 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: > To start with I have put the specs in bioruby/spec directory. I took the > liberty of adding a rake task to execute all the specs. Most of the specs > will fail as of now and some are pending. "rake spec SPEC_OPTS="--format > nested" " should be good to get a rough overview of the specs. > I added SPEC_OPTS="--format nested" as the default option in the specs rake task. So 'rake spec' should be good now. However the 'format' option can be overridden on the command line, if anyone prefers 'specdoc'. -- Anurag Priyam, 2nd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From pjotr.public14 at thebird.nl Sat Jul 17 09:22:30 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 17 Jul 2010 11:22:30 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: <20100717092230.GA21914@thebird.nl> Hi Anurag, I'll go over the Specs in a short time. First thing that is interesting to note is that the elaborage class hierarchy does not show in the unit tests, nor Specs. This is indicates they are not really needed. E.g. search for grep -r ProteinCellRow * grep -r RestrictionSeqRow * Does not render any tests. Which is to be expected. I know you haven't gotten round to refactoring, but you can see what I mean? When designing a class hierarchy, it can best be driven from the API. That is one reason behaviour driven testing - or even unit testing is done upfront. Meanwhile, there is a second good reason to introduce an OOP hierarchy - it is when it makes 'internal' code easier to understand. You should judge every new class you introduce based on those grounds: Does it add to, or simplify, my API? Or does it make my internal code organisation a lot easier to understand. Tick one of those two boxes to create class hierarchies. Otherwise you are best off with the most *simple* data representation. Simple, in general, is easy to understand and allows for more flexible approaches. Pj. On Thu, Jul 15, 2010 at 08:27:45PM +0530, Anurag Priyam wrote: > Hello all, > > I have worked out an initial set of specs for the RDF API. The code is in > 'rdf' branch - http://github.com/yeban/bioruby/tree/rdf. > > I am providing an overview here: > To start with I have put the specs in bioruby/spec directory. I took the > liberty of adding a rake task to execute all the specs. Most of the specs > will fail as of now and some are pending. "rake spec SPEC_OPTS="--format > nested" " should be good to get a rough overview of the specs. > > The lib itself( currently only bare class definition ) resides in > bioruby/lib/bio/rdf directory and uses Bio::RDF namespace. > > At the core are Literal, Node, URI and classes, which form the subject, > predicate, object and context of any RDF statement. An RDF statement can be > created as an instance of Statement class. A collection of Statements form a > Graph. An RDF graph can be queried for statements with a given subject, > predicate or object. We can define new Vocabularies with the Vocabulary > class. I am explaining the vocabulary class in more detail below. > > RDF vocabularies are defined on a namespace uri. Say, the XSD vocabulary > that defines datatypes for literals. XSD is defined on " > http://www.w3.org/2001/XMLSchema#" namespace with the 'xsd' prefix. So the > actual URI for the curie "xsd:double" goes like " > http://www.w3.org/2001/XMLSchema#double". The rational is to have such URI > and curie automatically generated : > > xsd = Vocabulary.new "http://www.w3.org/2001/XMLSchema#" > xsd[:double] > > I was thinking of having commonly used vocabulary defined in the lib so > someone could use it out of box like: XSD[:double] or CDAO[:foo]. > > The rdf lib can be used by any component of BioRuby by using that object as > the subject or object of an rdf statement. However, a cleaner solution would > be to have an Annotatable module mixed into the classes that are likely to > use the rdf lib. Annotatable would just provide a wrapper over the core rdf > lib to work with rdf. To begin with I have added two functions 'annotate' > and 'annotation' which create and return a rdf graph for that object > respectively. The example for these functions is pending in the specs. > However, I was thinking of something like: > > seq = Bio::Sequece.new > seq.annotate do |graph| > graph << [self, CDAO[:foo], 'moo' ] > end > > seq.annotation.query :predicate => CDAO[:foo] > > I think with this design we can maintain loose coupling between the rdf lib > and bioruby components. I have just begun creating the classes to realize > the specs, so the design can still be modified completely if I am in a wrong > direction. > > In thinking out the rdf lib, I have mostly referred to the RDF primer and > Wikipedia. I might have gone wrong on some RDF concepts too. Please correct > :). > > -- > Anurag Priyam, > 2nd Year Undergraduate, > Department of Mechanical Engineering, > IIT Kharagpur. > +91-9775550642 > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From pjotr.public14 at thebird.nl Sat Jul 17 10:04:17 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sat, 17 Jul 2010 12:04:17 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: <20100717100417.GA23002@thebird.nl> Hi Anurag, On Thu, Jul 15, 2010 at 08:27:45PM +0530, Anurag Priyam wrote: > However, I was thinking of something like: > > seq = Bio::Sequece.new > seq.annotate do |graph| > graph << [self, CDAO[:foo], 'moo' ] > end > > seq.annotation.query :predicate => CDAO[:foo] > > I think with this design we can maintain loose coupling between the rdf lib > and bioruby components. I have just begun creating the classes to realize > the specs, so the design can still be modified completely if I am in a wrong > direction. I think this is the idea. RDF generator should be generic and easily used for extending existing objects. That is very good. In other words, the Sequence class should not *know* about RDF - we should not pollute existing classes (even further) with RDF knowledge, if we can avoid it. You can create a specialized RDF::Sequence, or RDF::Alignment, object, which would add certain features to a base Sequence class (without modifying the Sequence class itself, for sure). These classes should be opaque for whether we are dealing with nucleotids, or amino acids (is my opinion). So the first thing to do is to write the Specs for such a system. The current Specs are merely object invocations for RDF itself. So what I would like to see is Specs that do something real. Rather than the example.org URI, use something that is meaningful. Write Specs for using real BioRuby classes and/or NeXML classes. Others, Raoul for one, have ideas too for RDF too. So the Specs will help out with ideas. Write more directed Specs and we will discuss them. Pj. From rutgeraldo at gmail.com Sat Jul 17 11:40:48 2010 From: rutgeraldo at gmail.com (Rutger Vos) Date: Sat, 17 Jul 2010 12:40:48 +0100 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: <20100717092230.GA21914@thebird.nl> References: <20100717092230.GA21914@thebird.nl> Message-ID: > > > You should judge every new class you introduce based on those > grounds: Does it add to, or simplify, my API? Or does it make my > internal code organisation a lot easier to understand. Tick one of > those two boxes to create class hierarchies. Otherwise you are best > off with the most *simple* data representation. Simple, in general, > is easy to understand and allows for more flexible approaches. > > In working in other languages it has turned out time and time again that the more closely the class hierarchy mirrors that of the NeXML schema types, the more easily instance documents can be represented, manipulated and round-tripped. This doesn't necessarily mean that the API for users needs to reflect all that, but internally it's useful. -- Dr. Rutger A. Vos School of Biological Sciences Philip Lyle Building, Level 4 University of Reading Reading RG6 6BX United Kingdom Tel: +44 (0) 118 378 7535 http://www.nexml.org http://rutgervos.blogspot.com From pjotr.public14 at thebird.nl Sun Jul 18 06:33:05 2010 From: pjotr.public14 at thebird.nl (Pjotr Prins) Date: Sun, 18 Jul 2010 08:33:05 +0200 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: <20100717092230.GA21914@thebird.nl> Message-ID: <20100718063305.GB29780@thebird.nl> On Sat, Jul 17, 2010 at 12:40:48PM +0100, Rutger Vos wrote: > > > > > > You should judge every new class you introduce based on those > > grounds: Does it add to, or simplify, my API? Or does it make my > > internal code organisation a lot easier to understand. Tick one of > > those two boxes to create class hierarchies. Otherwise you are best > > off with the most *simple* data representation. Simple, in general, > > is easy to understand and allows for more flexible approaches. > > > > > In working in other languages it has turned out time and time again that the > more closely the class hierarchy mirrors that of the NeXML schema types, the > more easily instance documents can be represented, manipulated and > round-tripped. This doesn't necessarily mean that the API for users needs to > reflect all that, but internally it's useful. Not disputing that. Please read elements.rb. Pj. From ngoto at gen-info.osaka-u.ac.jp Tue Jul 20 12:35:40 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 20 Jul 2010 21:35:40 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: <20100720123541.6F2ED1CBC445@idnmail.gen-info.osaka-u.ac.jp> Hi, Sorry responding too late. ChemRuby have been developed by Nobuya Tanaka. Contact address for ChemRuby is . Subversion repository is http://tools.textdriven.com/svn/chemruby % svn co http://tools.textdriven.com/svn/chemruby I don't know he is looking at bioruby mailing list. I tried to subscribe to the mailing list chemruby-list-jp in RubyForge, but currently no response from the server. It seems the list is stopped. So, currently, >> On bioruby list with a specific tag (example [CHEMRUBY]) with Cc: staff at chemruby.org seems good. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Thu, 15 Jul 2010 14:29:57 +0200 Emanuele Orlando wrote: > Hi, > > any news to my questions? :) > > Best regards, > > -- > Emanuele Orlando > http://www.emanueleorlando.com > > On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < > emanuele.orlando at gmail.com> wrote: > > > Thanks to all for the precious links. > > I didn't know chemruby and now i've some questions :) > > Its mailing list is empty. If i want discuss about programming/design of > > chemruby, where do i mail? On bioruby list with a specific tag (example > > [CHEMRUBY]) or i use chemruby list? > > At the beginning who has contributed for Chemruby? > > Priority where focused? > > > > Thanks > > Emanuele > > > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > http://it.linkedin.com/in/kooru > > > > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama wrote: > > > >> Hi, > >> > >> This explanation by Jan may be useful when you try GitHub. > >> > >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would-that-work.html > >> > >> Toshiaki > >> > >> On 2010/07/08, at 3:36, Pjotr Prins wrote: > >> > >> > Welcome Emanuele, > >> > > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando wrote: > >> >> Dear Bioruby, > >> >> > >> >> This is my first mail to the group. > >> >> My name is Emanuele Orlando, i'm graduate in Computation Chemistry and > >> from > >> >> three years i'm working as IT consultant. > >> >> As others i love ruby and i would be happy to contribute to the > >> development > >> >> of bioruby. How can i make some contributions? > >> > > >> > Choose a topic to work on. You can simply get a free account on > >> > github.com and clone the repository. Start coding - when it works we > >> > can easily merge it in to the main tree. > >> > > >> > If you want to discuss programming and design, just mail this list. > >> > > >> > Pj. > >> > _______________________________________________ > >> > BioRuby Project - http://www.bioruby.org/ > >> > BioRuby mailing list > >> > BioRuby at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioruby > >> > >> > >> _______________________________________________ > >> BioRuby Project - http://www.bioruby.org/ > >> BioRuby mailing list > >> BioRuby at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioruby > >> > > > > > > > > From ngoto at gen-info.osaka-u.ac.jp Tue Jul 20 12:59:34 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 20 Jul 2010 21:59:34 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> Message-ID: <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 20 Jul 2010 21:35:40 +0900 Naohisa GOTO wrote: > I tried to subscribe to the mailing list chemruby-list-jp > in RubyForge, but currently no response from the server. > It seems the list is stopped. It still works. I've just received subscription confirmation mail, and I've subscribed to the list now. I think cross-posting is also good, because there may be few persons in the chemruby list. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From bonnalraoul at ingm.it Tue Jul 20 12:57:46 2010 From: bonnalraoul at ingm.it (Raoul Bonnal) Date: Tue, 20 Jul 2010 14:57:46 +0200 Subject: [BioRuby] R: contribute to Bioruby References: <20100707183622.GA16079@thebird.nl> <20100720123541.6F2ED1CBC445@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <642659b2-c7b9-40cb-9e62-67cee2c49f4b@ingm.it> Dear Goto-san, Do you think there is some little task, just for staring, that could be assigned to Emanuele ? Some time ago I posted a list of tasks -- Raoul J.P. Bonnal Life Science Informatics Integrative Biology Program Fondazione INGM Via F. Sforza 28 20122 Milano, IT phone: +39 02 006 623 26 fax: +39 02 006 623 46 http://www.ingm.it > -----Messaggio originale----- > Da: bioruby-bounces at lists.open-bio.org [mailto:bioruby- > bounces at lists.open-bio.org] Per conto di Naohisa GOTO > Inviato: marted? 20 luglio 2010 14:36 > A: Emanuele Orlando > Cc: BioRuby ML; staff at tools.textdriven.com > Oggetto: Re: [BioRuby] contribute to Bioruby > > Hi, > > Sorry responding too late. > > ChemRuby have been developed by Nobuya Tanaka. > Contact address for ChemRuby is . > Subversion repository is http://tools.textdriven.com/svn/chemruby > % svn co http://tools.textdriven.com/svn/chemruby > > I don't know he is looking at bioruby mailing list. > I tried to subscribe to the mailing list chemruby-list-jp > in RubyForge, but currently no response from the server. > It seems the list is stopped. > > So, currently, > >> On bioruby list with a specific tag (example [CHEMRUBY]) > with Cc: staff at chemruby.org seems good. > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Thu, 15 Jul 2010 14:29:57 +0200 > Emanuele Orlando wrote: > > > Hi, > > > > any news to my questions? :) > > > > Best regards, > > > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > > > On Thu, Jul 8, 2010 at 11:46 AM, Emanuele Orlando < > > emanuele.orlando at gmail.com> wrote: > > > > > Thanks to all for the precious links. > > > I didn't know chemruby and now i've some questions :) > > > Its mailing list is empty. If i want discuss about > programming/design of > > > chemruby, where do i mail? On bioruby list with a specific tag > (example > > > [CHEMRUBY]) or i use chemruby list? > > > At the beginning who has contributed for Chemruby? > > > Priority where focused? > > > > > > Thanks > > > Emanuele > > > > > > -- > > > Emanuele Orlando > > > http://www.emanueleorlando.com > > > http://it.linkedin.com/in/kooru > > > > > > > > > On Wed, Jul 7, 2010 at 10:42 PM, Toshiaki Katayama > wrote: > > > > > >> Hi, > > >> > > >> This explanation by Jan may be useful when you try GitHub. > > >> > > >> http://saaientist.blogspot.com/2008/06/bioruby-with-git-how-would- > that-work.html > > >> > > >> Toshiaki > > >> > > >> On 2010/07/08, at 3:36, Pjotr Prins wrote: > > >> > > >> > Welcome Emanuele, > > >> > > > >> > On Wed, Jul 07, 2010 at 08:00:58PM +0200, Emanuele Orlando > wrote: > > >> >> Dear Bioruby, > > >> >> > > >> >> This is my first mail to the group. > > >> >> My name is Emanuele Orlando, i'm graduate in Computation > Chemistry and > > >> from > > >> >> three years i'm working as IT consultant. > > >> >> As others i love ruby and i would be happy to contribute to the > > >> development > > >> >> of bioruby. How can i make some contributions? > > >> > > > >> > Choose a topic to work on. You can simply get a free account on > > >> > github.com and clone the repository. Start coding - when it > works we > > >> > can easily merge it in to the main tree. > > >> > > > >> > If you want to discuss programming and design, just mail this > list. > > >> > > > >> > Pj. > > >> > _______________________________________________ > > >> > BioRuby Project - http://www.bioruby.org/ > > >> > BioRuby mailing list > > >> > BioRuby at lists.open-bio.org > > >> > http://lists.open-bio.org/mailman/listinfo/bioruby > > >> > > >> > > >> _______________________________________________ > > >> BioRuby Project - http://www.bioruby.org/ > > >> BioRuby mailing list > > >> BioRuby at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioruby > > >> > > > > > > > > > > > > > _______________________________________________ > BioRuby Project - http://www.bioruby.org/ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From emanuele.orlando at gmail.com Tue Jul 20 14:07:43 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Tue, 20 Jul 2010 16:07:43 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> References: <20100707183622.GA16079@thebird.nl> <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Thanks Naohisa, sure cross-posting is a good idea. In your opinion, there's some priority where focused? Raoul, about your list i could be interested for point 4 and 6. But i wait for your feedback :) Regards, -- Emanuele Orlando http://www.emanueleorlando.com On Tue, Jul 20, 2010 at 2:59 PM, Naohisa GOTO wrote: > Hi, > > On Tue, 20 Jul 2010 21:35:40 +0900 > Naohisa GOTO wrote: > > > I tried to subscribe to the mailing list chemruby-list-jp > > in RubyForge, but currently no response from the server. > > It seems the list is stopped. > > It still works. > I've just received subscription confirmation mail, > and I've subscribed to the list now. > > I think cross-posting is also good, because there may be > few persons in the chemruby list. > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > From ngoto at gen-info.osaka-u.ac.jp Tue Jul 20 15:23:59 2010 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 21 Jul 2010 00:23:59 +0900 Subject: [BioRuby] contribute to Bioruby In-Reply-To: References: <20100707183622.GA16079@thebird.nl> <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <20100720152359.BDEBE1CBC3C2@idnmail.gen-info.osaka-u.ac.jp> Hi Emanuele, No priority, and please choose what you like and what you can do now. For "4) charts or fancy graphics from data", many people are interested in, but still few codes (Bio::Graphics etc). There may be many different approches. To write codes useful for biologists, knowing visualization examples in the field of biology/bioinformatics may be needed. For the purpose, reading recent research papers and/or studying what other projects do (BioConductor, BioJava, Biopython, BioPerl, etc) would be good. If you like "6) update BioSQL support to latest revision", discuss with Raoul, maintener of BioRuby BioSQL support. About ChemRuby, first, please try to use it. It is great if you can write good example scripts using BioRuby with ChemRuby. If you find bug, please fix. If you feel something missing, please add new features. Thank you, Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Tue, 20 Jul 2010 16:07:43 +0200 Emanuele Orlando wrote: > Thanks Naohisa, sure cross-posting is a good idea. > In your opinion, there's some priority where focused? > Raoul, about your list i could be interested for point 4 and 6. But i wait > for your feedback :) > Regards, > -- > Emanuele Orlando > http://www.emanueleorlando.com > > > On Tue, Jul 20, 2010 at 2:59 PM, Naohisa GOTO > wrote: > > > Hi, > > > > On Tue, 20 Jul 2010 21:35:40 +0900 > > Naohisa GOTO wrote: > > > > > I tried to subscribe to the mailing list chemruby-list-jp > > > in RubyForge, but currently no response from the server. > > > It seems the list is stopped. > > > > It still works. > > I've just received subscription confirmation mail, > > and I've subscribed to the list now. > > > > I think cross-posting is also good, because there may be > > few persons in the chemruby list. > > > > Naohisa Goto > > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > From emanuele.orlando at gmail.com Tue Jul 20 21:16:24 2010 From: emanuele.orlando at gmail.com (Emanuele Orlando) Date: Tue, 20 Jul 2010 23:16:24 +0200 Subject: [BioRuby] contribute to Bioruby In-Reply-To: <20100720152359.BDEBE1CBC3C2@idnmail.gen-info.osaka-u.ac.jp> References: <20100707183622.GA16079@thebird.nl> <20100720125935.08F1D1CBC445@idnmail.gen-info.osaka-u.ac.jp> <20100720152359.BDEBE1CBC3C2@idnmail.gen-info.osaka-u.ac.jp> Message-ID: Thanks for your answers. Sure i will try chemruby and i will work on it. Emanuele On Tue, Jul 20, 2010 at 5:23 PM, Naohisa GOTO wrote: > Hi Emanuele, > > No priority, and please choose what you like and what you > can do now. > > For "4) charts or fancy graphics from data", many people > are interested in, but still few codes (Bio::Graphics etc). > There may be many different approches. To write codes > useful for biologists, knowing visualization examples in > the field of biology/bioinformatics may be needed. > For the purpose, reading recent research papers and/or > studying what other projects do (BioConductor, BioJava, > Biopython, BioPerl, etc) would be good. > > If you like "6) update BioSQL support to latest revision", > discuss with Raoul, maintener of BioRuby BioSQL support. > > About ChemRuby, first, please try to use it. It is great if you > can write good example scripts using BioRuby with ChemRuby. > If you find bug, please fix. If you feel something missing, > please add new features. > > Thank you, > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Tue, 20 Jul 2010 16:07:43 +0200 > Emanuele Orlando wrote: > > > Thanks Naohisa, sure cross-posting is a good idea. > > In your opinion, there's some priority where focused? > > Raoul, about your list i could be interested for point 4 and 6. But i > wait > > for your feedback :) > > Regards, > > -- > > Emanuele Orlando > > http://www.emanueleorlando.com > > > > > > On Tue, Jul 20, 2010 at 2:59 PM, Naohisa GOTO > > wrote: > > > > > Hi, > > > > > > On Tue, 20 Jul 2010 21:35:40 +0900 > > > Naohisa GOTO wrote: > > > > > > > I tried to subscribe to the mailing list chemruby-list-jp > > > > in RubyForge, but currently no response from the server. > > > > It seems the list is stopped. > > > > > > It still works. > > > I've just received subscription confirmation mail, > > > and I've subscribed to the list now. > > > > > > I think cross-posting is also good, because there may be > > > few persons in the chemruby list. > > > > > > Naohisa Goto > > > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > > > > -- Emanuele Orlando http://www.emanueleorlando.com From yannick.wurm at unil.ch Fri Jul 23 10:39:10 2010 From: yannick.wurm at unil.ch (Yannick Wurm) Date: Fri, 23 Jul 2010 12:39:10 +0200 Subject: [BioRuby] bioruby vs bioX In-Reply-To: References: Message-ID: Dear List, Here's a thought for a rainy morning. Thanks to new technologies, many biologists end up with large amounts of data & need to figure out a way to script things. They can be caricatured into a few categories: - many attempt PERL because it's the only language they (or their boss) have heard of - others attempt to use R because that's what they learned in their undergraduate biostatistics course - others yet figure out that Python or Ruby are modern alternatives to Perl.... but I think most end up using Python, mostly because they find many examples of biopython code online. Thus most newcomers to bioruby are not newbie biologists but computer geeks that know that ruby is great & need to tackle something biological. I think we're really missing out on the newbie "I'm a biologist & I need to script" market. Yes, there are a few resources, Eg: - Jan's article: http://www.biomedcentral.com/1471-2105/10/221 (are you planning a followup where you show some of bioruby... say to parse blast results & retrieve the corresponding sequences from genbank)? - a few wonderful but still randomly scattered blog posts - http://bioruby.open-bio.org/wiki/SampleCodes - http://bioruby.open-bio.org/wiki/Tutorial - and an almost pathetically empty http://bioruby.open-bio.org/wiki/HOWTOs But only few of these are "biologist non-programmer newbie-proof". And there is no central place to point a complete newbie. Contrast that with the amount of information and ***code that works right away even if you don't understand the details*** found: - in the Biopython cookbook (yes it's ugly, but it does contain example code for most newcomer's questions) http://www.biopython.org/DIST/docs/tutorial/Tutorial.html - on the Scriptome Perl "illegible one-liners that people use": http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/ It is clear that we have a lot of potential. I wonder if proposals for contributions (such as Emanuele's) could not be geared towards improving our newbie-accessibility? I don't like having to point people towards Python/Biopython instead of ruby/Bioruby. Yannick -------------------------------------------- yannick . wurm @ unil . ch Ant Genomics, Ecology & Evolution @ Lausanne http://www.unil.ch/dee/page28685_fr.html From sararayburn at gmail.com Sun Jul 25 03:55:10 2010 From: sararayburn at gmail.com (Sara Rayburn) Date: Sat, 24 Jul 2010 22:55:10 -0500 Subject: [BioRuby] [GSoC] Progress Update Message-ID: <67536FCE-BAC9-4406-ADA9-877FA070253F@gmail.com> Hello, Here is an update on the status of my project implementing speciation/duplication inference algorithms for BioRuby. The SDI algorithm is implemented, tested, and working for binary gene and species trees. This week I've ironed out some performance bottlenecks so that the algorithm executes almost as quickly as the Java implementation for very large trees. I have also completed the implementation of an extension to the algorithm that finds the gene tree rooting that minimizes the number of duplications inferred in the tree. Upcoming work will include extending the algorithm to support trees with more than two children per node. For more details, a full update is on the github wiki[1] for my branch along with a tutorial describing how to use the algorithms. [1] http://wiki.githubcom/srayburn/bioruby/ Thanks, Sara Rayburn From anurag08priyam at gmail.com Sun Jul 25 17:47:54 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sun, 25 Jul 2010 23:17:54 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: <20100717100417.GA23002@thebird.nl> References: <20100717100417.GA23002@thebird.nl> Message-ID: > So the first thing to do is to write the Specs for such a system. The > current Specs are merely object invocations for RDF itself. > > So what I would like to see is Specs that do something real. Rather > than the example.org URI, use something that is meaningful. Write > Specs for using real BioRuby classes and/or NeXML classes. > > Pjotr, have a look at spec/rdf/graph.rb. It should look more meaningful now. However specs for graph does not answer how the rdf api is to be integrated with the rest of BioRuby; more on that in my next mail. > Others, Raoul for one, have ideas too for RDF too. So the Specs will > help out with ideas. > I am interested in working with Raoul at his SPARQL project. -- Anurag Priyam, 3rd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sun Jul 25 18:10:38 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Sun, 25 Jul 2010 23:40:38 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: Message-ID: > The rdf lib can be used by any component of BioRuby by using that object as > the subject or object of an rdf statement. However, a cleaner solution would > be to have an Annotatable module mixed into the classes that are likely to > use the rdf lib. Annotatable would just provide a wrapper over the core rdf > lib to work with rdf. To begin with I have added two functions 'annotate' > and 'annotation' which create and return a rdf graph for that object > respectively. The example for these functions is pending in the specs. > However, I was thinking of something like: > > seq = Bio::Sequece.new > seq.annotate do |graph| > graph << [self, CDAO[:foo], 'moo' ] > end > > seq.annotation.query :predicate => CDAO[:foo] > Here is what I have done: I would like you all to have a rough overview of graph.rb, mixins/enumerable.rb and mixins/queryable.rb http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/graph.rb http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/enumerable.rb http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/queryable.rb So a graph contains RDF::Statements that can be enumerated and queried over in various movies. Then have a look at the Annotatable module: http://github.com/yeban/bioruby/blob/rdf/lib/bio/rdf/mixins/annotatable.rb To make a class annotatable this module just needs to be included. Say: class Bio::NeXML::Otu include Bio::RDF::Annotatable end The idea is to add an instance variable( @graph ) to Otu that stores a RDF::Graph object and delegate methods that begin with 'rdf_' to @graph. This way all the API defined for a Graph is available to an Annotatable object( by prefixing the method names with an 'rdf_' ) otu = Bio::NeXML::Otu.new( 't1') otu.annotate do |g| g << [ otu, CDAO[ :label ], "XXX" ] g << [ otu, CDAO[ :discoverer ], "Moo" ] end otu.rdf_query( :predicate => CDAO[ :label ] ) { |s| puts s.subject } is same as otu.annotation.query(:predicate => CDAO[ :label ]) { |s| puts s.subject } -- Anurag Priyam, 3rd Year Undergraduate, Department of Mechanical Engineering, IIT Kharagpur. +91-9775550642 From anurag08priyam at gmail.com Sun Jul 25 20:27:12 2010 From: anurag08priyam at gmail.com (Anurag Priyam) Date: Mon, 26 Jul 2010 01:57:12 +0530 Subject: [BioRuby] [GSoC][NeXML and RDF API] RDF API In-Reply-To: References: