From rozziite at gmail.com Fri Jun 5 10:56:02 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 5 Jun 2009 10:56:02 -0400 Subject: [BioRuby] GSOC: phyloXML for BioRuby: should pluralize attribute names which hold arrays? Message-ID: <4057d3bf0906050756t3b059281id3b84d532b3a2c15@mail.gmail.com> Hi all, Distribution element of phyloXML consists of [0..1], [0..*] and [0..*] tags. When mapping it to a class I have a temptation to call the class attributes in plural form for point and polygon, since there can be several such tags included in the Distribution. Like this: Distribution class: - desc (string) - points [] (Array of Point objects) - polygons [] (Array of Polygon objects) If I were to follow such convention throughout all classes, some plural forms might sound a bit awkward. For example: PhyloXMLNode class: - confidences [] (Array of Confidence objects) - taxonomies [] (array of Taxonomy objects) - sequences [] (Array of Sequence objects) - events (Events objectS) - distributions [] (Array of Distribution objects) - references [] (Reference object) - properties [] (Property object) Confidences sound a bit weird (but then again, I am not a native English speaker). Events are plural, but its not an array of objects. The reason I am bringing this up, is because if the attributes which hold arrays would be plural it would be easier to remember that they are arrays, and not forget to add index in brackets (which I myself forget fairly often when writing unit tests), for example node.sequences[0].name instead of node.sequence[0].name What do you think? Diana From ngoto at gen-info.osaka-u.ac.jp Sun Jun 7 02:28:53 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 7 Jun 2009 15:28:53 +0900 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Mapping sequence In-Reply-To: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> References: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> Message-ID: <20090607062854.BB86C1CBC552@idnmail.gen-info.osaka-u.ac.jp> Hi, sorry for delay. On Sat, 30 May 2009 17:27:52 -0400 Diana Jaunzeikare wrote: > Hi all, > > So I looked more carefully at the sequence element of phyloXML and it > consists of information which cannot be mapped to Bio::Sequence object. I > suggest to have a sequence class which closely resembles phyloXML structure > and then have a method to extract relevant elements return Bio::Sequence > object. What do you think? In this case, the method to convert from Bio::Sequence to the phyloXML sequence class is also needed. If some of the attributes are really essential and not specific to phyloXML but will be needed from other data types, it is also possible to add new attributes to Bio::Sequence. > Here on the left i listed phyloXML sequence tag elements and after the arrow > -> the possible corresponding attribute of Bio::Sequence > * type > ** rna, dna -> Bio::Sequence::NA -> molecule type > ** aa -> Bio::Sequence::AA > * id_source (string ?) -> id_namespace > * id_ref (string ) -> entry_id > * symbol (string ?) > * accession > ** source (example: "UniProtKB") -> > ** id (example: "P17304") -> primary_accession > * name (string ) > * location (string ? ) > * mol_seq (string) -> seq / Bio::Sequence::NA/AA > * uri > ** desc (string) > ** type (string ) > ** uri > > * annotation [] > ** ref > ** source > ** evidence > ** type > ** desc > ** confidence > ** property [] > ** uri > > * domain_architecture > ** length > ** domain [] > *** from > *** to > *** confidence > *** id The annotations and domain architecture could be mapped to the features in Bio::Sequence. But, in some cases, it is difficult to be mapped, depending on the vocabulary used in the annotations/domain_architecture. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From georgkam at gmail.com Tue Jun 9 08:26:45 2009 From: georgkam at gmail.com (George Githinji) Date: Tue, 9 Jun 2009 15:26:45 +0300 Subject: [BioRuby] Problem with Bio::GFF::GFF2 Message-ID: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> Hi all, I am try to parse a GFF file. The file looks like this ##gff-version 2 ##source-version bepipred-1.0b ##date 2009-06-09 ##Type Protein seq1 # seqname source feature start end score N/A ? # --------------------------------------------------------------------------- seq1 bepipred-1.0b epitope 1 1 0.173 . . . seq1 bepipred-1.0b epitope 2 2 -0.043 . . . seq1 bepipred-1.0b epitope 3 3 -0.014 . . . seq1 bepipred-1.0b epitope 4 4 0.144 . . . seq1 bepipred-1.0b epitope 5 5 0.250 . . . seq1 bepipred-1.0b epitope 6 6 0.218 . . . ....truncated and i have written the following lines with an aim of extracting the start, end and score attributes. but before that i wanted to know whether the full attributes are available. so i did the following. require 'rubygems' require 'bio' bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) bep_gff.records.each do |record| puts record.attributes_to_hash.inspect end However, i get empty hashes. Any ideas? Thank you -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ From ngoto at gen-info.osaka-u.ac.jp Tue Jun 9 09:44:19 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 9 Jun 2009 22:44:19 +0900 Subject: [BioRuby] Problem with Bio::GFF::GFF2 In-Reply-To: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> References: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> Message-ID: <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> Hi George, On Tue, 9 Jun 2009 15:26:45 +0300 George Githinji wrote: > Hi all, > I am try to parse a GFF file. The file looks like this > > ##gff-version 2 > ##source-version bepipred-1.0b > ##date 2009-06-09 > ##Type Protein seq1 > # seqname source feature start end score N/A ? > # > --------------------------------------------------------------------------- > seq1 bepipred-1.0b epitope 1 1 0.173 . . . > seq1 bepipred-1.0b epitope 2 2 -0.043 . . . > seq1 bepipred-1.0b epitope 3 3 -0.014 . . . > seq1 bepipred-1.0b epitope 4 4 0.144 . . . > seq1 bepipred-1.0b epitope 5 5 0.250 . . . > seq1 bepipred-1.0b epitope 6 6 0.218 . . . > > ....truncated The above GFF records do not contain any "attributes". The field definition of each GFF line is: [attributes] [comments] When talking about GFF, the word "attributes" points the "attributes" field in each GFF line. See the GFF2 specifications document for details. http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml > and i have written the following lines with an aim of extracting the start, > end and score attributes. but before that i wanted to know whether the full > attributes are available. so i did the following. > > require 'rubygems' > require 'bio' > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) > > bep_gff.records.each do |record| > puts record.attributes_to_hash.inspect > end > > However, i get empty hashes. > Any ideas? Because the Bio::GFF2::Record#attributes_to_hash method returns "attributes" as a hash, and all "attributes" field in the above GFF2 records are empty, showing empty hashes is logically right. If you really want a hash, adding each field into a hash would be the easiest way. For example, bep_gff.records.each do |record| h = {} h['seqname'] = record.seqname h['source'] = record.source h['feature'] = record.feature h['start'] = record.start h['end'] = record.end h['score'] = record.score h['strand'] = record.strand h['frame'] = record.frame h['attributes'] = record.attributes_to_hash p h end Bio::GFF2::Record have seqname, source, feature, start, end, score, strand, frame attributes(so called in the Ruby language), which are inherited from Bio::GFF::Record class. Normally, it is natural using the above attributes(in Ruby) directly without creating a hash. Note that using attributes_to_hash may lost some data when there are two or more values with the same tag name in an "attributes" field. When creating new data, in case using "attributes" extensively, GFF3 is recommended, because the design of GFF2 attributes is somehow broken. > Thank you > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ Your blog is nice! -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From georgkam at gmail.com Tue Jun 9 10:24:38 2009 From: georgkam at gmail.com (George Githinji) Date: Tue, 9 Jun 2009 17:24:38 +0300 Subject: [BioRuby] Problem with Bio::GFF::GFF2 In-Reply-To: <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> References: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <55915f820906090724i37419d65hea6f6e36260c1f42@mail.gmail.com> Thank you so much Naohisa for the excellent explanation!! however bep_gff.records.each do |record| p record.seqname end returns "seq1 bepipred-1.0b epitope 1 1 0.173 . . ." which is not what is intended and record.score, record.start etc all return nil. :( On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO wrote: > Hi George, > > On Tue, 9 Jun 2009 15:26:45 +0300 > George Githinji wrote: > > > Hi all, > > I am try to parse a GFF file. The file looks like this > > > > ##gff-version 2 > > ##source-version bepipred-1.0b > > ##date 2009-06-09 > > ##Type Protein seq1 > > # seqname source feature start end score N/A > ? > > # > > > --------------------------------------------------------------------------- > > seq1 bepipred-1.0b epitope 1 1 0.173 . . . > > seq1 bepipred-1.0b epitope 2 2 -0.043 . . . > > seq1 bepipred-1.0b epitope 3 3 -0.014 . . . > > seq1 bepipred-1.0b epitope 4 4 0.144 . . . > > seq1 bepipred-1.0b epitope 5 5 0.250 . . . > > seq1 bepipred-1.0b epitope 6 6 0.218 . . . > > > > ....truncated > > The above GFF records do not contain any "attributes". > The field definition of each GFF line is: > > [attributes] [comments] > > When talking about GFF, the word "attributes" points the > "attributes" field in each GFF line. > > See the GFF2 specifications document for details. > http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml > > > and i have written the following lines with an aim of extracting the > start, > > end and score attributes. but before that i wanted to know whether the > full > > attributes are available. so i did the following. > > > > require 'rubygems' > > require 'bio' > > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) > > > > bep_gff.records.each do |record| > > puts record.attributes_to_hash.inspect > > end > > > > However, i get empty hashes. > > Any ideas? > > Because the Bio::GFF2::Record#attributes_to_hash method returns > "attributes" as a hash, and all "attributes" field in the above > GFF2 records are empty, showing empty hashes is logically right. > > If you really want a hash, adding each field into a hash would > be the easiest way. For example, > > bep_gff.records.each do |record| > h = {} > h['seqname'] = record.seqname > h['source'] = record.source > h['feature'] = record.feature > h['start'] = record.start > h['end'] = record.end > h['score'] = record.score > h['strand'] = record.strand > h['frame'] = record.frame > h['attributes'] = record.attributes_to_hash > p h > end > > Bio::GFF2::Record have seqname, source, feature, start, end, > score, strand, frame attributes(so called in the Ruby language), > which are inherited from Bio::GFF::Record class. > Normally, it is natural using the above attributes(in Ruby) > directly without creating a hash. > > Note that using attributes_to_hash may lost some data when > there are two or more values with the same tag name in an > "attributes" field. > > When creating new data, in case using "attributes" extensively, > GFF3 is recommended, because the design of GFF2 attributes is > somehow broken. > > > Thank you > > > > > > -- > > --------------- > > Sincerely > > George > > > > Skype: george_g2 > > Blog: http://biorelated.wordpress.com/ > > Your blog is nice! > > -- > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ From czmasek at burnham.org Tue Jun 9 14:10:40 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 9 Jun 2009 11:10:40 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: should pluralize attribute names which hold arrays? In-Reply-To: <4057d3bf0906050756t3b059281id3b84d532b3a2c15@mail.gmail.com> References: <4057d3bf0906050756t3b059281id3b84d532b3a2c15@mail.gmail.com> Message-ID: <4A2EA5A0.2070502@burnham.org> Hi, Diana: Indeed some of these plurals sound a little weird. Part of it is due to English issues, and also because for some elements, it is unusual to have more than one. For example, taxonomies: in the vast majority of cases, each node is associated with one taxonomy, yet phyloxml allows more than one taxonomy per node. That being said, I would agree with you and others, and conclude that using the plurals might be more appropriate, even though a little unexpected. Christian Diana Jaunzeikare wrote: > Hi all, > > Distribution element of phyloXML consists of [0..1], > [0..*] and [0..*] tags. When mapping it to a class I > have a temptation to call the class attributes in plural form for > point and polygon, since there can be several such tags included in > the Distribution. Like this: > > > Distribution class: > > * desc (string) > * points [] (Array of Point objects) > * polygons [] (Array of Polygon objects) > > > If I were to follow such convention throughout all classes, some > plural forms might sound a bit awkward. For example: > > PhyloXMLNode class: > > * confidences [] (Array of Confidence objects) > * taxonomies [] (array of Taxonomy objects) > * sequences [] (Array of Sequence objects) > * events (Events objectS) > * distributions [] (Array of Distribution objects) > * references [] (Reference object) > * properties [] (Property object) > > Confidences sound a bit weird (but then again, I am not a native > English speaker). Events are plural, but its not an array of objects. > > The reason I am bringing this up, is because if the attributes which > hold arrays would be plural it would be easier to remember that they > are arrays, and not forget to add index in brackets (which I myself > forget fairly often when writing unit tests), for example > > node.sequences[0].name > > instead of > > node.sequence[0].name > > What do you think? > > Diana From czmasek at burnham.org Tue Jun 9 15:18:20 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 9 Jun 2009 12:18:20 -0700 Subject: [BioRuby] [Wg-phyloinformatics] GSOC: phyloXML for BioRuby: Mapping sequence In-Reply-To: <20090607062854.BB86C1CBC552@idnmail.gen-info.osaka-u.ac.jp> References: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> <20090607062854.BB86C1CBC552@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4A2EB57C.6090100@burnham.org> Hi: Thank you for the detailed comments. I think this is a very crucial point, since sequence and taxonomy are the two most important elements. At this point, I would recommend to create a special class for phyloxml-sequence, and add methods/constructors to it which make transferring to and from Bio::Sequence easy. But I can definitely see the advantages of directly using Bio::Sequence, too. Also, please don't forget that, should a consensus/strong opinion emerge, we could also add features to the phyloxml-sequence definition to make it match BioRuby and BioPython sequence better. Christian Naohisa GOTO wrote: > Hi, > > sorry for delay. > > On Sat, 30 May 2009 17:27:52 -0400 > Diana Jaunzeikare wrote: > > >> Hi all, >> >> So I looked more carefully at the sequence element of phyloXML and it >> consists of information which cannot be mapped to Bio::Sequence object. I >> suggest to have a sequence class which closely resembles phyloXML structure >> and then have a method to extract relevant elements return Bio::Sequence >> object. What do you think? >> > > In this case, the method to convert from Bio::Sequence to the > phyloXML sequence class is also needed. > > If some of the attributes are really essential and not specific > to phyloXML but will be needed from other data types, it is > also possible to add new attributes to Bio::Sequence. > > >> Here on the left i listed phyloXML sequence tag elements and after the arrow >> -> the possible corresponding attribute of Bio::Sequence >> * type >> ** rna, dna -> Bio::Sequence::NA -> molecule type >> ** aa -> Bio::Sequence::AA >> * id_source (string ?) -> id_namespace >> * id_ref (string ) -> entry_id >> id_source and id_ref are actually used to describe relations between sequences, for example to describe orthology-relationships. >> * symbol (string ?) >> * accession >> ** source (example: "UniProtKB") -> >> ** id (example: "P17304") -> primary_accession >> ** source -> id_namespace ** id -> primary_accession (or entry_id) >> * name (string ) >> * location (string ? ) >> * mol_seq (string) -> seq / Bio::Sequence::NA/AA >> * uri >> ** desc (string) >> ** type (string ) >> ** uri >> >> * annotation [] >> ** ref >> ** source >> ** evidence >> ** type >> ** desc >> ** confidence >> ** property [] >> ** uri >> >> * domain_architecture >> ** length >> ** domain [] >> *** from >> *** to >> *** confidence >> *** id >> > > The annotations and domain architecture could be mapped to the features > in Bio::Sequence. But, in some cases, it is difficult to be mapped, > depending on the vocabulary used in the annotations/domain_architecture. > > From ngoto at gen-info.osaka-u.ac.jp Wed Jun 10 02:14:30 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 10 Jun 2009 15:14:30 +0900 Subject: [BioRuby] Problem with Bio::GFF::GFF2 In-Reply-To: <55915f820906090724i37419d65hea6f6e36260c1f42@mail.gmail.com> References: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> <55915f820906090724i37419d65hea6f6e36260c1f42@mail.gmail.com> Message-ID: <20090610061431.02FB21CBC56B@idnmail.gen-info.osaka-u.ac.jp> On Tue, 9 Jun 2009 17:24:38 +0300 George Githinji wrote: > Thank you so much Naohisa for the excellent explanation!! > however > > bep_gff.records.each do |record| > p record.seqname > end > > returns > "seq1 bepipred-1.0b epitope 1 1 0.173 . . ." > > > which is not what is intended and > record.score, record.start etc all return nil. It seems this is NOT a valid GFF2 format. In GFF formats, delimiter must be a TAB ("\t" in Ruby). However, in above data, it seems that characters between "seq1" and "bepipred-1.0b" entry may be white spaces (" " in Ruby), instead of a TAB. Copy-and-paste from terminal or web browser, or autocomlete function in a text editor or wordprocessor can often create such kind of degenerated data. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > :( > > > > > > On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO > wrote: > > > Hi George, > > > > On Tue, 9 Jun 2009 15:26:45 +0300 > > George Githinji wrote: > > > > > Hi all, > > > I am try to parse a GFF file. The file looks like this > > > > > > ##gff-version 2 > > > ##source-version bepipred-1.0b > > > ##date 2009-06-09 > > > ##Type Protein seq1 > > > # seqname source feature start end score N/A > > ? > > > # > > > > > --------------------------------------------------------------------------- > > > seq1 bepipred-1.0b epitope 1 1 0.173 . . . > > > seq1 bepipred-1.0b epitope 2 2 -0.043 . . . > > > seq1 bepipred-1.0b epitope 3 3 -0.014 . . . > > > seq1 bepipred-1.0b epitope 4 4 0.144 . . . > > > seq1 bepipred-1.0b epitope 5 5 0.250 . . . > > > seq1 bepipred-1.0b epitope 6 6 0.218 . . . > > > > > > ....truncated > > > > The above GFF records do not contain any "attributes". > > The field definition of each GFF line is: > > > > [attributes] [comments] > > > > When talking about GFF, the word "attributes" points the > > "attributes" field in each GFF line. > > > > See the GFF2 specifications document for details. > > http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml > > > > > and i have written the following lines with an aim of extracting the > > start, > > > end and score attributes. but before that i wanted to know whether the > > full > > > attributes are available. so i did the following. > > > > > > require 'rubygems' > > > require 'bio' > > > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) > > > > > > bep_gff.records.each do |record| > > > puts record.attributes_to_hash.inspect > > > end > > > > > > However, i get empty hashes. > > > Any ideas? > > > > Because the Bio::GFF2::Record#attributes_to_hash method returns > > "attributes" as a hash, and all "attributes" field in the above > > GFF2 records are empty, showing empty hashes is logically right. > > > > If you really want a hash, adding each field into a hash would > > be the easiest way. For example, > > > > bep_gff.records.each do |record| > > h = {} > > h['seqname'] = record.seqname > > h['source'] = record.source > > h['feature'] = record.feature > > h['start'] = record.start > > h['end'] = record.end > > h['score'] = record.score > > h['strand'] = record.strand > > h['frame'] = record.frame > > h['attributes'] = record.attributes_to_hash > > p h > > end > > > > Bio::GFF2::Record have seqname, source, feature, start, end, > > score, strand, frame attributes(so called in the Ruby language), > > which are inherited from Bio::GFF::Record class. > > Normally, it is natural using the above attributes(in Ruby) > > directly without creating a hash. > > > > Note that using attributes_to_hash may lost some data when > > there are two or more values with the same tag name in an > > "attributes" field. > > > > When creating new data, in case using "attributes" extensively, > > GFF3 is recommended, because the design of GFF2 attributes is > > somehow broken. > > > > > Thank you > > > > > > > > > -- > > > --------------- > > > Sincerely > > > George > > > > > > Skype: george_g2 > > > Blog: http://biorelated.wordpress.com/ > > > > Your blog is nice! > > > > -- > > Naohisa Goto > > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ > From jan.aerts at gmail.com Fri Jun 12 05:53:08 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Fri, 12 Jun 2009 10:53:08 +0100 Subject: [BioRuby] locus mixin Message-ID: <4c7507a70906120253l166e052m42ff7df7c8864df2@mail.gmail.com> What do people think about adding a IsLocus mixin to bioruby? For a lot of my work I need to check if genes or polymorphisms or clones or ... overlap. I use the IsLocus mixin to get that done. Any object that has a chromosome, start and stop can have the module mixed in. Some of the methods as I have them defined locally: module IsLocus def range return Range.new(self.start, self.stop) end def overlaps?(other_locus) return false if self.chromosome != other_locus.chromosome if self.range.overlaps?(other_locus.range) return true end return false end def contained_by?(other_locus) return false if self.chromosome != other_locus.chromosome if self.range.contained_by?(other_locus.range) return true end return false end def contains?(other_locus) return false if self.chromosome != other_locus.chromosome if self.range.contains?(other_locus.range) return true end return false end def to_s return self.chromosome + ':' + self.range.to_s end def to_gff3 return [self.chromosome, self.class.name, self.start, self.stop, '.', '.', '.', 'ID=' + self.id.to_s].join("\t") end def to_bed if self.respond_to?(:name) return [self.chromosome, self.start, self.stop, self.name].join("\t") else return [self.chromosome, self.start, self.stop, self.class.name + '_' + self.id.to_s].join("\t") end end # The following makes it possible to call Gene#to_bed which would dump all Gene objects in BED format def self.included mod class << mod def to_bed output = Array.new output.push("track name='#{self.name}' description='#{self.name}'") self.all.each do |record| output.push record.to_bed end return output.join("\n") end end end end Let me know what you think, jan. From rozziite at gmail.com Fri Jun 12 11:25:40 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 12 Jun 2009 11:25:40 -0400 Subject: [BioRuby] Bioruby unit tests Message-ID: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> Hi all, I am working on implementing phyloxml support and I was running only my unit tests to test my code. Then yesterday I ran all of the unit tests and it gave me errors (when I first cloned it did not gave me any errors). I don't think i changed anything in any other file than lib/bio/db/phyloxml.rb and test/unit/bio/db/test_phyloxml.rb Here is the output of test/runner.rb. Looks all of the errors are of the same kind. diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb Loaded suite . Started .........................................................................................EE...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE........................................................................................................................................................................................................................................EEEEEE...................................................................E........E.................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEE............................................................................................................................... Finished in 176.329241 seconds. 1) Error: test_output_embl(Bio::FuncTestSequenceOutputEMBL): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:263:in `initialize' ./lib/bio/sequence.rb:263:in `new' ./lib/bio/sequence.rb:263:in `auto' ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' 2) Error: test_output_fasta(Bio::FuncTestSequenceOutputEMBL): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:263:in `initialize' ./lib/bio/sequence.rb:263:in `new' ./lib/bio/sequence.rb:263:in `auto' ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' 3) Error: test_alignment(Bio::TestAlignmentMultiFastaFormat): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:89:in `each' ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:61:in `alignment' ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' 4) Error: test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:89:in `each' ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:61:in `alignment' ./test/unit/bio/appl/mafft/test_report.rb:57:in `test_determine_seq_method' 5) Error: test_const_version(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' 6) Error: test_gff_version(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' 7) Error: test_records(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' 8) Error: test_sequence_regions(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' [....] 2175 tests, 5180 assertions, 0 failures, 54 errors Any ideas? Code is available here http://github.com/latvianlinuxgirl/bioruby/tree/dev Diana From czmasek at burnham.org Fri Jun 12 17:13:13 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Fri, 12 Jun 2009 14:13:13 -0700 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> Message-ID: <4A32C4E9.7050000@burnham.org> Hi: I usually have one test fail (I don't remember which one though) but not that many. Which version of ruby are you using? Christian Diana Jaunzeikare wrote: > Hi all, > > I am working on implementing phyloxml support and I was running only > my unit tests to test my code. Then yesterday I ran all of the unit > tests and it gave me errors (when I first cloned it did not gave me > any errors). I don't think i changed anything in any other file than > lib/bio/db/phyloxml.rb and test/unit/bio/db/test_phyloxml.rb > > Here is the output of test/runner.rb. Looks all of the errors are of > the same kind. > > diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb > Loaded suite . > Started > .........................................................................................EE...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE........................................................................................................................................................................................................................................EEEEEE...................................................................E........E.................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEE............................................................................................................................... > Finished in 176.329241 seconds. > > 1) Error: > test_output_embl(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 2) Error: > test_output_fasta(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 3) Error: > test_alignment(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' > > 4) Error: > test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:57:in > `test_determine_seq_method' > > 5) Error: > test_const_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 6) Error: > test_gff_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 7) Error: > test_records(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 8) Error: > test_sequence_regions(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > [....] > 2175 tests, 5180 assertions, 0 failures, 54 errors > > > Any ideas? Code is available here > http://github.com/latvianlinuxgirl/bioruby/tree/dev > > Diana From rozziite at gmail.com Fri Jun 12 17:23:35 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 12 Jun 2009 17:23:35 -0400 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <4A32C4E9.7050000@burnham.org> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> <4A32C4E9.7050000@burnham.org> Message-ID: <4057d3bf0906121423l6277e57ao5e27ceaef0f88fec@mail.gmail.com> I am using ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux] Diana On Fri, Jun 12, 2009 at 5:13 PM, Christian M Zmasek wrote: > Hi: > > I usually have one test fail (I don't remember which one though) but not > that many. > Which version of ruby are you using? > > Christian > > > > Diana Jaunzeikare wrote: > >> Hi all, >> >> I am working on implementing phyloxml support and I was running only my >> unit tests to test my code. Then yesterday I ran all of the unit tests and >> it gave me errors (when I first cloned it did not gave me any errors). I >> don't think i changed anything in any other file than lib/bio/db/phyloxml.rb >> and test/unit/bio/db/test_phyloxml.rb >> >> Here is the output of test/runner.rb. Looks all of the errors are of the >> same kind. >> diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb >> Loaded suite . >> Started >> >> .........................................................................................EE...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE........................................................................................................................................................................................................................................EEEEEE...................................................................E........E.................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEE............................................................................................................................... >> Finished in 176.329241 seconds. >> >> 1) Error: >> test_output_embl(Bio::FuncTestSequenceOutputEMBL): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:263:in `initialize' >> ./lib/bio/sequence.rb:263:in `new' >> ./lib/bio/sequence.rb:263:in `auto' >> ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' >> >> 2) Error: >> test_output_fasta(Bio::FuncTestSequenceOutputEMBL): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:263:in `initialize' >> ./lib/bio/sequence.rb:263:in `new' >> ./lib/bio/sequence.rb:263:in `auto' >> ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' >> >> 3) Error: >> test_alignment(Bio::TestAlignmentMultiFastaFormat): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:89:in `each' >> ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:61:in `alignment' >> ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' >> >> 4) Error: >> test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:89:in `each' >> ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:61:in `alignment' >> ./test/unit/bio/appl/mafft/test_report.rb:57:in >> `test_determine_seq_method' >> >> 5) Error: >> test_const_version(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> >> 6) Error: >> test_gff_version(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> >> 7) Error: >> test_records(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> >> 8) Error: >> test_sequence_regions(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> [....] >> 2175 tests, 5180 assertions, 0 failures, 54 errors >> >> >> Any ideas? Code is available here >> http://github.com/latvianlinuxgirl/bioruby/tree/dev >> Diana >> > > > From ngoto at gen-info.osaka-u.ac.jp Sat Jun 13 00:47:02 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 13 Jun 2009 13:47:02 +0900 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> Message-ID: <20090613044703.7200F1CBC4B9@idnmail.gen-info.osaka-u.ac.jp> Hi Diana, This is because your original Sequence class definition in line 269 in lib/bio/db/phyloxml.rb violates BioRuby's Bio::Sequence. The PhyloXML Sequence class (and Events, Date, Id, Uri, etc) should be defined inside the Bio::PhyloXML namespace. For example, module Bio class PhyloXML class Sequence #... end class Events #... end class Date #... end end end In this case, Bio::PhyloXML::Sequence is different from Bio::Sequence. Be careful that the name Date is already used by Ruby's standard bundled library (require 'date'), althogh you can distinguish it by using ::Date and Bio::PhyloXML::Date. I also recommend that PhyloXMLTree and PhyloXMLNode are located inside the Bio::PhyloXML namespace (this means Bio::PhyloXML::PhyloXMLTree and Bio::PhyloXML::PhyloXMLNode. If possible, to rename to Bio::PhyloXML::Tree and Bio::PhyloXML::Node may be a good choice.) Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Fri, 12 Jun 2009 11:25:40 -0400 Diana Jaunzeikare wrote: > Hi all, > > I am working on implementing phyloxml support and I was running only my unit > tests to test my code. Then yesterday I ran all of the unit tests and it > gave me errors (when I first cloned it did not gave me any errors). I don't > think i changed anything in any other file than lib/bio/db/phyloxml.rb and > test/unit/bio/db/test_phyloxml.rb > > Here is the output of test/runner.rb. Looks all of the errors are of the > same kind. > > diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb > Loaded suite . > Started > .........................................................................................EE............................................... (snip) > Finished in 176.329241 seconds. > > 1) Error: > test_output_embl(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 2) Error: > test_output_fasta(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 3) Error: > test_alignment(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' > > 4) Error: > test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:57:in > `test_determine_seq_method' > > 5) Error: > test_const_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 6) Error: > test_gff_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 7) Error: > test_records(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 8) Error: > test_sequence_regions(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > [....] > 2175 tests, 5180 assertions, 0 failures, 54 errors > > > Any ideas? Code is available here > http://github.com/latvianlinuxgirl/bioruby/tree/dev > > Diana > From ngoto at gen-info.osaka-u.ac.jp Sat Jun 13 02:05:01 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 13 Jun 2009 15:05:01 +0900 Subject: [BioRuby] locus mixin In-Reply-To: <4c7507a70906120253l166e052m42ff7df7c8864df2@mail.gmail.com> References: <4c7507a70906120253l166e052m42ff7df7c8864df2@mail.gmail.com> Message-ID: <20090613060502.F2CF51CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Hi, On Fri, 12 Jun 2009 10:53:08 +0100 Jan Aerts wrote: > What do people think about adding a IsLocus mixin to bioruby? For a lot of > my work I need to check if genes or polymorphisms or clones or ... overlap. > I use the IsLocus mixin to get that done. Any object that has a chromosome, > start and stop can have the module mixed in. Some of the methods as I have > them defined locally: What classes are considered the module to be mixed in? In BioRuby, as far as I know, there are currently no classes which have all of these methods simultaneously. I think only putting a mixin is not a good way. It is better to prepare some classes which can handle real data (which can probably be downloaded from famous genome/expression data repositories) and can perform typical tasks conveniently. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > module IsLocus > def range > return Range.new(self.start, self.stop) > end > > def overlaps?(other_locus) > return false if self.chromosome != other_locus.chromosome > > if self.range.overlaps?(other_locus.range) > return true > end > > return false > end > > def contained_by?(other_locus) > return false if self.chromosome != other_locus.chromosome > > if self.range.contained_by?(other_locus.range) > return true > end > > return false > end > > def contains?(other_locus) > return false if self.chromosome != other_locus.chromosome > > if self.range.contains?(other_locus.range) > return true > end > > return false > end > > def to_s > return self.chromosome + ':' + self.range.to_s > end > > def to_gff3 > return [self.chromosome, self.class.name, self.start, self.stop, '.', > '.', '.', 'ID=' + self.id.to_s].join("\t") > end > > def to_bed > if self.respond_to?(:name) > return [self.chromosome, self.start, self.stop, self.name].join("\t") > else > return [self.chromosome, self.start, self.stop, self.class.name + '_' > + self.id.to_s].join("\t") > end > end > > # The following makes it possible to call Gene#to_bed which would dump all > Gene objects in BED format > def self.included mod > class << mod > def to_bed > output = Array.new > output.push("track name='#{self.name}' description='#{self.name}'") > self.all.each do |record| > output.push record.to_bed > end > return output.join("\n") > end > end > end > end > > Let me know what you think, > jan. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From rozziite at gmail.com Sun Jun 14 11:22:29 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sun, 14 Jun 2009 11:22:29 -0400 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <20090613044703.7200F1CBC4B9@idnmail.gen-info.osaka-u.ac.jp> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> <20090613044703.7200F1CBC4B9@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4057d3bf0906140822i588e798bvea0f4e13d8c34b67@mail.gmail.com> Thanks! This worked like a charm. Now all the tests pass. Diana On Sat, Jun 13, 2009 at 12:47 AM, Naohisa GOTO wrote: > Hi Diana, > > This is because your original Sequence class definition in line 269 > in lib/bio/db/phyloxml.rb violates BioRuby's Bio::Sequence. > > The PhyloXML Sequence class (and Events, Date, Id, Uri, etc) > should be defined inside the Bio::PhyloXML namespace. > > For example, > > module Bio > class PhyloXML > class Sequence > #... > end > class Events > #... > end > class Date > #... > end > end > end > > In this case, Bio::PhyloXML::Sequence is different from > Bio::Sequence. > > Be careful that the name Date is already used by Ruby's > standard bundled library (require 'date'), althogh you can > distinguish it by using ::Date and Bio::PhyloXML::Date. > > I also recommend that PhyloXMLTree and PhyloXMLNode are > located inside the Bio::PhyloXML namespace (this means > Bio::PhyloXML::PhyloXMLTree and Bio::PhyloXML::PhyloXMLNode. > If possible, to rename to Bio::PhyloXML::Tree and > Bio::PhyloXML::Node may be a good choice.) > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Fri, 12 Jun 2009 11:25:40 -0400 > Diana Jaunzeikare wrote: > > > Hi all, > > > > I am working on implementing phyloxml support and I was running only my > unit > > tests to test my code. Then yesterday I ran all of the unit tests and it > > gave me errors (when I first cloned it did not gave me any errors). I > don't > > think i changed anything in any other file than lib/bio/db/phyloxml.rb > and > > test/unit/bio/db/test_phyloxml.rb > > > > Here is the output of test/runner.rb. Looks all of the errors are of the > > same kind. > > > > diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb > > Loaded suite . > > Started > > > .........................................................................................EE............................................... > (snip) > > Finished in 176.329241 seconds. > > > > 1) Error: > > test_output_embl(Bio::FuncTestSequenceOutputEMBL): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:263:in `initialize' > > ./lib/bio/sequence.rb:263:in `new' > > ./lib/bio/sequence.rb:263:in `auto' > > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > > > 2) Error: > > test_output_fasta(Bio::FuncTestSequenceOutputEMBL): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:263:in `initialize' > > ./lib/bio/sequence.rb:263:in `new' > > ./lib/bio/sequence.rb:263:in `auto' > > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > > > 3) Error: > > test_alignment(Bio::TestAlignmentMultiFastaFormat): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:89:in `each' > > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > > ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' > > > > 4) Error: > > test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:89:in `each' > > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > > ./test/unit/bio/appl/mafft/test_report.rb:57:in > > `test_determine_seq_method' > > > > 5) Error: > > test_const_version(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > > > 6) Error: > > test_gff_version(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > > > 7) Error: > > test_records(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > > > 8) Error: > > test_sequence_regions(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > [....] > > 2175 tests, 5180 assertions, 0 failures, 54 errors > > > > > > Any ideas? Code is available here > > http://github.com/latvianlinuxgirl/bioruby/tree/dev > > > > Diana > > > > > From rozziite at gmail.com Mon Jun 15 10:37:27 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Mon, 15 Jun 2009 10:37:27 -0400 Subject: [BioRuby] Bioruby PhyloXML: method for iterating to the next tree, without returning anything Message-ID: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> Hi all, Now I have a method next_tree which parses the phylogeny element and all its sub elements and returns a tree. I propose to have a method for iterating to the next tree (maybe call it skip_tree), without actually parsing it, but advancing the libxml reader to the next phylogeny element. The reason I think this might be useful is because, for example, in my unit tests I work with one tree at a time. To get to specific tree I call next_tree several times, but don't use any of the returned data. Maybe other people also will need such functionality. Having a method skip_tree would make the process faster since it would not actually parse the elements and would not create objects. What do you think? Diana From czmasek at burnham.org Mon Jun 15 12:57:01 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 15 Jun 2009 09:57:01 -0700 Subject: [BioRuby] Bioruby PhyloXML: method for iterating to the next tree, without returning anything In-Reply-To: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> References: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> Message-ID: <4A367D5D.3010102@burnham.org> Hi, Diana: I think this is a good idea. Although, it will only be useful if the order of the trees is known beforehand (i.e. you know you want the 5th tree). Something to think about (if you have enough time): Since trees in phyloxml can have names and/or ids -- what about having the parser return trees with a matching name/id? E.g. from a file with 100 trees return those named "erk gene tree". Great work! Christian Diana Jaunzeikare wrote: > Hi all, > > Now I have a method next_tree which parses the phylogeny element and > all its sub elements and returns a tree. I propose to have a method > for iterating to the next tree (maybe call it skip_tree), without > actually parsing it, but advancing the libxml reader to the next > phylogeny element. > > The reason I think this might be useful is because, for example, in my > unit tests I work with one tree at a time. To get to specific tree I > call next_tree several times, but don't use any of the returned data. > Maybe other people also will need such functionality. Having a method > skip_tree would make the process faster since it would not actually > parse the elements and would not create objects. > > What do you think? > > Diana From kpatil at science.uva.nl Tue Jun 16 08:34:03 2009 From: kpatil at science.uva.nl (K. Patil) Date: Tue, 16 Jun 2009 14:34:03 +0200 (CEST) Subject: [BioRuby] CHange in Bio::Tree bioruby@lists.open-bio.org Message-ID: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> Hi, In the bioruby at lists.open-bio.org method of Bio::Tree the root node is also removed if it has 2 edges, it will be useful to have an argument deciding if the root should be remove or not. cheers, Kaustubh From kpatil at science.uva.nl Tue Jun 16 09:47:45 2009 From: kpatil at science.uva.nl (K. Patil) Date: Tue, 16 Jun 2009 15:47:45 +0200 (CEST) Subject: [BioRuby] CHange in Bio::Tree bioruby@lists.open-bio.org In-Reply-To: References: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> Message-ID: <64690.139.19.75.1.1245160065.squirrel@webmail.science.uva.nl> Oops sorry the name of the method should be "remove_nonsense_nodes" - kaustubh > Hi, > > Is that the correct method name? > > ben > > 2009/6/16 K. Patil > >> Hi, >> >> In the bioruby at lists.open-bio.org method of Bio::Tree the root node is >> also removed if it has 2 edges, it will be useful to have an argument >> deciding if the root should be remove or not. >> >> cheers, >> Kaustubh >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > > > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same > place. > From ngoto at gen-info.osaka-u.ac.jp Tue Jun 16 10:55:27 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 16 Jun 2009 23:55:27 +0900 Subject: [BioRuby] CHange in Bio::Tree bioruby@lists.open-bio.org In-Reply-To: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> References: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> Message-ID: <20090616145529.5208E1CBC43A@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 16 Jun 2009 14:34:03 +0200 (CEST) "K. Patil" wrote: > Hi, > > In the bioruby at lists.open-bio.org method of Bio::Tree the root node is > also removed if it has 2 edges, it will be useful to have an argument > deciding if the root should be remove or not. > > cheers, > Kaustubh Sorry, I can't understand what you mean. Please show example data and script, and current and expected behavior. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From sgujja at broad.mit.edu Tue Jun 16 11:14:07 2009 From: sgujja at broad.mit.edu (Sharvari Gujja) Date: Tue, 16 Jun 2009 11:14:07 -0400 Subject: [BioRuby] Import Python modules in Ruby... Message-ID: <4A37B6BF.6030303@broad.mit.edu> Hi, I'd like to know if there is a way to import Python modules into Ruby. I downloaded the Ruby/Python library to embed the Python interpreter. However on running the script I get an error saying: *C:/Program Files/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require': no such file to load -- python (LoadError) from C:/Program Files/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' from Z:/ruby_progs/test2.rb:1* I extracted "python" library at C:/Program Files/ruby/lib/ruby/site_ruby/1.8/ . [My ruby version is 1.8.6] The script I am trying to run is : #!/usr/bin/env ruby require 'rubygems' require 'python' require 'python/naming' require 'python/xreadlines' distance = naming.DISTANCE_TOOL.distance("protein 1", "protein2") print distance Could someone please help. Thanks S From ngoto at gen-info.osaka-u.ac.jp Wed Jun 17 06:10:18 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 17 Jun 2009 19:10:18 +0900 Subject: [BioRuby] Import Python modules in Ruby... In-Reply-To: <4A37B6BF.6030303@broad.mit.edu> References: <4A37B6BF.6030303@broad.mit.edu> Message-ID: <20090617101020.3D3301CBC4FF@idnmail.gen-info.osaka-u.ac.jp> On Tue, 16 Jun 2009 11:14:07 -0400 Sharvari Gujja wrote: > Hi, > > I'd like to know if there is a way to import Python modules into Ruby. I > downloaded the Ruby/Python library to embed the Python interpreter. Where did you download the "Ruby/Python library" from? I found two similar libraries. http://www.goto.info.waseda.ac.jp/~fukusima/ruby/python-e.html but "Last modified: Mon Sep 11 02:30:10 JST 2000" indicates no support for current version of Ruby and Python. http://rubyforge.org/projects/rubypython/ It can be installed by using rubygems, but it seems there are no Windows binary for the gem and Visual C++ compiler may be needed. In addition, appropriate version of python must be installed. Because they are not specific to bioruby nor bioinformatics, if no response here, please ask questions or discuss about them in another mailing list, maybe in ruby-talk. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From chen_li3 at yahoo.com Wed Jun 17 10:32:01 2009 From: chen_li3 at yahoo.com (chen li) Date: Wed, 17 Jun 2009 07:32:01 -0700 (PDT) Subject: [BioRuby] help to understand the codes Message-ID: <551726.61194.qm@web36803.mail.mud.yahoo.com> Hi all, I read source codes in sirna.rb in Bioruby. It implements the codes based on the following 4 rules( I copy the ruels from the paper): These rules indicate that siRNAs which simultaneously satisfy all four of the following sequence conditions are capable of inducing highly effective gene silencing in mammalian cells: (i) A/U at the 5' end of the antisense strand; (ii) G/C at the 5' end of the sense strand; (iii) at least five A/U residues in the 5' terminal one-third of the antisense strand; and (iv) the absence of any GC stretch of more than 9 nt in length. And here are the codes: In sirna.rb # Ui-Tei's rule. def uitei?(target) return false unless /^.{2}[GC]/i =~ target #which rule is for this line ? return false unless /[AU].{2}$/i =~ target #which rule is for this line return false if /[GC]{9}/i =~ target # rule 4 #rule 3 one_third = target.size * 1 / 3 start_pos = @target_size - one_third - 1 remain_seq = target.subseq(start_pos, @target_size - 2) au_number = remain_seq.scan(/[AU]/i).size return false if au_number < 5 return true end from these codes I don't think I understand how rule 1 and rule 2 are implemented. I wonder if someone can explain them a little more. Thanks, Li From tomoakin at kenroku.kanazawa-u.ac.jp Wed Jun 17 19:55:12 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Thu, 18 Jun 2009 08:55:12 +0900 Subject: [BioRuby] help to understand the codes In-Reply-To: <551726.61194.qm@web36803.mail.mud.yahoo.com> References: <551726.61194.qm@web36803.mail.mud.yahoo.com> Message-ID: <967703C4-0B5C-4FA5-ADC8-A0BF427F152D@kenroku.kanazawa-u.ac.jp> Hi, Perhaps, an implicit assumption is used that the siRNA duplex has 2 nt overhang at the 3' ends and the "target" is written for one strand containing both: So, the sequence should be from the rule 1 and 2: SNNN...NNNNNNNNNNN NNNNNN...NNNNNNNNW (W: A or U, S: G or C) from the compliment rule this will be SNNN...NNNNNNNNWNN NNSNNN...NNNNNNNNW and if you write only the top strand (or the original mRNA sequence) NNSNNN...NNNNNNNNWNN thus > return false unless /^.{2}[GC]/i =~ target #which rule is > for this line ? is for rule 2 and > return false unless /[AU].{2}$/i =~ target #which rule is > for this line is for rule 1 -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan On 2009/06/17, at 23:32, chen li wrote: > > Hi all, > > I read source codes in sirna.rb in Bioruby. It implements the codes > based on the following 4 rules( I copy the ruels from the paper): > These rules indicate that siRNAs which > simultaneously satisfy all four of the following > sequence conditions are capable of inducing highly > effective gene silencing in mammalian cells: > > (i) A/U at the 5' end of the antisense strand; > (ii) G/C at the 5' end of the sense strand; > (iii) at least five A/U residues in the 5' terminal one-third of > the antisense > strand; > and (iv) the absence of any GC stretch of more than 9 nt in length. > > > And here are the codes: > In sirna.rb > # Ui-Tei's rule. > def uitei?(target) > return false unless /^.{2}[GC]/i =~ target #which rule is > for this line ? > return false unless /[AU].{2}$/i =~ target #which rule is > for this line > > return false if /[GC]{9}/i =~ target # rule 4 > > #rule 3 > one_third = target.size * 1 / 3 > start_pos = @target_size - one_third - 1 > remain_seq = target.subseq(start_pos, @target_size - 2) > au_number = remain_seq.scan(/[AU]/i).size > return false if au_number < 5 > > return true > end > > > from these codes I don't think I understand how rule 1 and rule 2 > are implemented. I wonder if someone can explain them a little more. > > > Thanks, > > Li > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From hlapp at gmx.net Wed Jun 17 18:26:49 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 17 Jun 2009 17:26:49 -0500 Subject: [BioRuby] [Wg-phyloinformatics] Bioruby PhyloXML: method for iterating to the next tree, without returning anything In-Reply-To: <4A367D5D.3010102@burnham.org> References: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> <4A367D5D.3010102@burnham.org> Message-ID: <2FE711A4-CC95-4B9A-B9D5-CBA887902567@gmx.net> On Jun 15, 2009, at 11:57 AM, Christian M Zmasek wrote: > Although, it will only be useful if the order of the trees is known > beforehand (i.e. you know you want the 5th tree). Right - I recognize that this would be useful for your unit testing, but frankly I'm not sure what the "normal" use case for this function would be. > Something to think about (if you have enough time): Since trees in > phyloxml can have names and/or ids -- what about having the parser > return trees with a matching name/id? E.g. from a file with 100 trees > return those named "erk gene tree". I agree, being able to pass a filter function (e.g., one that accepts the unparsed XML and returns true or false?) would indeed be pretty useful. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chen_li3 at yahoo.com Thu Jun 18 11:42:03 2009 From: chen_li3 at yahoo.com (chen li) Date: Thu, 18 Jun 2009 08:42:03 -0700 (PDT) Subject: [BioRuby] help to understand the codes Message-ID: <897032.22567.qm@web36806.mail.mud.yahoo.com> Hi Tomoaki, Thank you for the explanation. For rule 1: /[AU].{2}$/i =~ target Based on my understanding of regular expression, it will match the following nts: N---NA/UAA N---NA/UTT N---NA/UGG N---NA/UCC but will not match the following nts: N---NA/UAT N---NA/UTA which mean the last two nts are identical, is that right? The similar situation applies to rule 2: /^.{2}[GC]/i =~ target starting with two identical nts followed by G/C at the third position. If this is the case I wonder where the paper mentions that the last two nts are the same and the first two nts are identical. Do I miss something when I read the paper? Thanks, Li --- On Wed, 6/17/09, Tomoaki NISHIYAMA wrote: > From: Tomoaki NISHIYAMA > Subject: Re: [BioRuby] help to understand the codes > To: "chen li" > Cc: "Tomoaki NISHIYAMA" , bioruby at lists.open-bio.org > Date: Wednesday, June 17, 2009, 7:55 PM > Hi, > > Perhaps, an implicit assumption is used that the siRNA > duplex > has 2 nt overhang at the 3' ends and the "target" is > written for one strand containing both: > So, the sequence should be > from the rule 1 and 2: > ? SNNN...NNNNNNNNNNN > NNNNNN...NNNNNNNNW > > (W: A or U, S: G or C) > > from the compliment rule > this will be > ? SNNN...NNNNNNNNWNN > NNSNNN...NNNNNNNNW > > and if you write only the top strand (or the original mRNA > sequence) > NNSNNN...NNNNNNNNWNN > > thus > >? ? ???return false unless > /^.{2}[GC]/i =~ target? #which rule is for this line ? > is for rule 2 > and > >? ? ???return false unless > /[AU].{2}$/i =~ target???#which rule is for > this line > is for rule 1 > --Tomoaki NISHIYAMA > > Advanced Science Research Center, > Kanazawa University, > 13-1 Takara-machi, > Kanazawa, 920-0934, Japan > From tomoakin at kenroku.kanazawa-u.ac.jp Thu Jun 18 19:47:10 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Fri, 19 Jun 2009 08:47:10 +0900 Subject: [BioRuby] help to understand the codes In-Reply-To: <897032.22567.qm@web36806.mail.mud.yahoo.com> References: <897032.22567.qm@web36806.mail.mud.yahoo.com> Message-ID: <8242DF74-07D3-4EF1-AE85-6E494DAE3CBB@kenroku.kanazawa-u.ac.jp> Hi, > but will not match the following nts: > N---NA/UAT > N---NA/UTA > > which mean the last two nts are identical, is that right? .{2} is equivalent to .. and should match any two characters, identical or different. -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan On 2009/06/19, at 0:42, chen li wrote: > > Hi Tomoaki, > > Thank you for the explanation. > > For rule 1: /[AU].{2}$/i =~ target > Based on my understanding of regular expression, it will match the > following nts: > N---NA/UAA > N---NA/UTT > N---NA/UGG > N---NA/UCC > but will not match the following nts: > N---NA/UAT > N---NA/UTA > > which mean the last two nts are identical, is that right? > The similar situation applies to rule 2: /^.{2}[GC]/i =~ target > starting with two identical nts followed by G/C at the third > position. > > If this is the case I wonder where the paper mentions that the last > two nts are the same and the first two nts are identical. Do I miss > something when I read the paper? > > > > Thanks, > > Li > > > > > > > > > --- On Wed, 6/17/09, Tomoaki NISHIYAMA u.ac.jp> wrote: > >> From: Tomoaki NISHIYAMA >> Subject: Re: [BioRuby] help to understand the codes >> To: "chen li" >> Cc: "Tomoaki NISHIYAMA" , >> bioruby at lists.open-bio.org >> Date: Wednesday, June 17, 2009, 7:55 PM >> Hi, >> >> Perhaps, an implicit assumption is used that the siRNA >> duplex >> has 2 nt overhang at the 3' ends and the "target" is >> written for one strand containing both: >> So, the sequence should be >> from the rule 1 and 2: >> SNNN...NNNNNNNNNNN >> NNNNNN...NNNNNNNNW >> >> (W: A or U, S: G or C) >> >> from the compliment rule >> this will be >> SNNN...NNNNNNNNWNN >> NNSNNN...NNNNNNNNW >> >> and if you write only the top strand (or the original mRNA >> sequence) >> NNSNNN...NNNNNNNNWNN >> >> thus >>> return false unless >> /^.{2}[GC]/i =~ target #which rule is for this line ? >> is for rule 2 >> and >>> return false unless >> /[AU].{2}$/i =~ target #which rule is for >> this line >> is for rule 1 >> --Tomoaki NISHIYAMA >> >> Advanced Science Research Center, >> Kanazawa University, >> 13-1 Takara-machi, >> Kanazawa, 920-0934, Japan >> > > > > From chen_li3 at yahoo.com Fri Jun 19 11:21:27 2009 From: chen_li3 at yahoo.com (chen li) Date: Fri, 19 Jun 2009 08:21:27 -0700 (PDT) Subject: [BioRuby] help to understand the codes Message-ID: <318484.35858.qm@web36801.mail.mud.yahoo.com> Hi Tomoaki, Thank you for the info. Now I think I understand much better about the codes: What the script does is to search a stretch nts of 23 bp and check if it fits the rules: the first two nts and the last two nts are actually the overhanged nts and the middle part is the core of the sirna. One more question: When I read method # uitei?(target) I see an instant variable called @target_size but it is defined in another method # design(rule='uitei'). Since Ruby reads codes from top to bottom, isn't' it better to define #design(rule='uitei') first then followed by # uitei?(target)? Or it is just personal preference? Li # Ui-Tei's rule. def uitei?(target) ...line code.... start_pos = @target_size - one_third - 1 return true end # rule can be one of 'uitei' (default) and 'reynolds'. def design(rule = 'uitei') @target_size = @antisense_size + 2 ....line code.... end From rozziite at gmail.com Mon Jun 22 12:27:11 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Mon, 22 Jun 2009 12:27:11 -0400 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Code Review Message-ID: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> Hi all, In the Google Summer of Code project I have reached a stage where most of the code has been written for PhyloXML parser and I would like to ask for code review. I would like to know answers to these questions: * What parts should have more documentation? * Are there any places where code could be made more rubyish? * Are the structure of unit tests fine, or there are some conventions which my code doesn't follow? * Is code readable? * Are there any conventions that I don't follow? (like lines should strictly fit into 80 columns)? Any comments would be appreciated. Code is available on github http://github.com/latvianlinuxgirl/bioruby/tree/dev in * lib/bio/db/phyloxml.rb* and *test/unit/bio/db/test_phyloxml.rb* files. Diana From czmasek at burnham.org Tue Jun 23 15:29:42 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 23 Jun 2009 12:29:42 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Code Review In-Reply-To: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> References: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> Message-ID: <4A412D26.30800@burnham.org> Hi, Diana: Diana Jaunzeikare wrote: > Hi all, > > In the Google Summer of Code project I have reached a stage where most > of the code has been written for PhyloXML parser and I would like to > ask for code review. > > I would like to know answers to these questions: > > * What parts should have more documentation? Node might benefit from a more detailed description of all its (sub-) elements. Also, you don't always use the "rdoc" format. Some documentation sare hard to read (such as the one for Sequence), simply because of the way the text is formatted. In general, I would point out the dependency on libxml2 more prominently. > > * Are there any places where code could be made more rubyish? Maybe core-BioRuby developers can give an answer for this one. Looks like Ruby to me, but I started off programing with C++ and Java -- so, I might be biased ;) > > * Are the structure of unit tests fine, or there are some conventions > which my code doesn't follow? I think it would be best to add more tests for "marginal"/error cases (for parsing). Listed in increasing severity: Are empty elements handled properly (e.g. )? What about new-lines, tabs, non-printable ascii characters in place where text is expected? Trailing and leading whitespaces? Does this get trimmed of? Valid XML documents violating phyloXML specs? Invalid XML? All these should be handled gracefully. > > * Is code readable? Yes. > * Are there any conventions that I don't follow? (like lines should > strictly fit into 80 columns)? > > Any comments would be appreciated. > > Code is available on github > http://github.com/latvianlinuxgirl/bioruby/tree/dev in > *lib/bio/db/phyloxml.rb* and *test/unit/bio/db/test_phyloxml.rb* files. > > > Diana > Furthermore, it might be a good time to start testing your parser/objects against really large files. This might help to uncover potential hidden problems. Obviously, you could not add such large files to BioRuby's test files. But it would still be nice to know how your parser and objects scale.... Also, I am not sure if it's such a great idea to have all your classes in the same file/directory (i.e. both parser _and_ data objects). Right now, if the libxml2 gem is not install the test for the whole of bioruby exits. Christian From ngoto at gen-info.osaka-u.ac.jp Sat Jun 27 04:43:12 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 27 Jun 2009 17:43:12 +0900 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Code Review In-Reply-To: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> References: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> Message-ID: <20090627084313.4B51D1CBC4ED@idnmail.gen-info.osaka-u.ac.jp> Hi, On Mon, 22 Jun 2009 12:27:11 -0400 Diana Jaunzeikare wrote: > Hi all, > > In the Google Summer of Code project I have reached a stage where most of > the code has been written for PhyloXML parser and I would like to ask for > code review. > > I would like to know answers to these questions: > > * What parts should have more documentation? For each attribute and methods, not only the return value's class but also description for the attribute will be needed, although it will be nearly the same as the phyloxml's description. > * Are there any places where code could be made more rubyish? Currently, no problem. > > * Are the structure of unit tests fine, or there are some conventions which > my code doesn't follow? It is good that the module TestPhyloXMLData is defined inside the module Bio namespace. > > * Is code readable? Yes. > > * Are there any conventions that I don't follow? (like lines should strictly > fit into 80 columns)? There are no strict conventions, especially for tests which may depend on test data variety. > > Any comments would be appreciated. > > Code is available on github > http://github.com/latvianlinuxgirl/bioruby/tree/dev in * > lib/bio/db/phyloxml.rb* and *test/unit/bio/db/test_phyloxml.rb* files. > > > Diana > In my environment, (Debian lenny i386, Ruby 1.8.7-p160, libxml-ruby 1.1.3) % ruby -r rubygems test/unit/bio/db/test_phyloxml.rb Loaded suite test/unit/bio/db/test_phyloxml Started ............................ Finished in 1.375441 seconds. 28 tests, 91 assertions, 0 failures, 0 errors -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From rozziite at gmail.com Sun Jun 28 15:55:44 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sun, 28 Jun 2009 15:55:44 -0400 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Profiling Message-ID: <4057d3bf0906281255t86fe2a7m4eaa9e047efc2e10@mail.gmail.com> Hi all, I did some profiling of the code. My system is Ubuntu 9.04, ruby 1.8.7 [i486-linux], Intel Core 2 Duo P8600 @2.4GHz I created test_phyloxml_big.rb test file. It has test_next_tree method which calls next_tree on the phyloxml file until end of file is reached. Here follow results on the ncbi_taxonomy_mollusca.xml file which is 1.5MB large with 5632 external nodes. It takes around 7.5min to finish test_phyloxml_big.rb test. (Finished in 443.231507 seconds. Finished in 457.255576 seconds. ) output of the top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21222 diana 20 0 29020 25m 1928 R 95 0.9 5:32.92 ruby So it looks like memory footprint is small ~ 25MBs. CPU usage is 95% (i have two processors, so it is completely using one of them). I did the same thing for tol_life_xml but it took forever to finish. (more than 3 hours) For curiosity I created a method next_tree_dummy. All it does is to reader.read from file until it reaches element. tree of life xml (file size: 45.1MB) - Finished in 3.177743 seconds. mollusca xml (1.5MB) - Finished in 0.252993 seconds. metazoa xml (32.3MB) - Finished in 3.393467 seconds. I think this shows that libxml is really fast. I also did profiling with ruby-prof on ncbi mollusca taxonomy file. Here is partial output: diana at diana-ubuntu:~/bioruby$ ruby-prof -p graph test/unit/bio/db/test_phyloxml_big.rb Loaded suite /usr/bin/ruby-prof Started . Finished in 1345.4039 seconds. 1 tests, 0 assertions, 0 failures, 0 errors Thread ID: 3084157360 Total Time: 1257.6 [..] ------------------------------------------------------------------------------- 1257.56 0.60 0.00 1256.96 2/2 Bio::TestPhyloXMLBig#test_next_tree 100.00% 0.05% 1257.56 0.60 0.00 1256.96 2 Bio::PhyloXML#next_tree 0.08 0.03 0.00 0.05 8107/16210 Bio::PhyloXML#parse_attributes 0.00 0.00 0.00 0.00 1/243188 String#== 0.00 0.00 0.00 0.00 8104/24322 LibXML::XML::Reader#[] 6.18 0.97 0.00 5.21 48616/48616 Bio::PhyloXML#parse_clade_elements 0.14 0.14 0.00 0.00 48623/97244 LibXML::XML::Reader#read 0.27 0.17 0.00 0.10 48644/875134 Bio::PhyloXML#is_element? 0.11 0.04 0.00 0.07 16206/32442 Class#new 0.04 0.04 0.00 0.00 16206/116034 Kernel#== 0.00 0.00 0.00 0.00 4/72929 Bio::PhyloXML#parse_simple_elements 0.07 0.01 0.00 0.06 8102/8102 Bio::Tree#add_node 0.44 0.31 0.00 0.13 97243/137758 Bio::PhyloXML#is_end_element? 1249.05 0.02 0.00 1249.03 8102/8102 Bio::Tree#parent 0.58 0.07 0.00 0.51 8102/8102 Bio::Tree#add_edge ----------------------------------------------------------------------------- 1249.05 0.02 0.00 1249.03 8102/8102 Bio::PhyloXML#next_tree 99.32% 0.00% 1249.05 0.02 0.00 1249.03 8102 Bio::Tree#parent 1249.03 0.13 0.00 1248.90 8102/8102 Bio::Tree#path 0.00 0.00 0.00 0.00 8102/72975 Array#[] -------------------------------------------------------------------------------- 1249.03 0.13 0.00 1248.90 8102/8102 Bio::Tree#parent 99.32% 0.01% 1249.03 0.13 0.00 1248.90 8102 Bio::Tree#path 0.04 0.01 0.00 0.03 16204/164638052 Hash#[] 1248.82 0.27 0.00 1248.55 8102/8102 Bio::Pathway#bfs_shortest_path 0.03 0.03 0.00 0.00 24306/116034 Kernel#== 0.01 0.01 0.00 0.00 16204/72975 Array#[] -------------------------------------------------------------------------------- 1248.82 0.27 0.00 1248.55 8102/8102 Bio::Tree#path 99.30% 0.02% 1248.82 0.27 0.00 1248.55 8102 Bio::Pathway#bfs_shortest_path 0.26 0.19 0.00 0.07 142736/164638052 Hash#[] 0.07 0.07 0.00 0.00 75419/116034 Kernel#== 1248.18 115.50 0.00 1132.68 8102/8102 Bio::Pathway#breadth_first_search 0.04 0.04 0.00 0.00 67317/67330 Array#unshift -------------------------------------------------------------------------------- 1248.18 115.50 0.00 1132.68 8102/8102 Bio::Pathway#bfs_shortest_path 99.25% 9.18% 1248.18 115.50 0.00 1132.68 8102 Bio::Pathway#breadth_first_search 136.52 92.65 0.00 43.8765785140/164638052 Hash#[] 22.53 22.53 0.00 0.0032900672/32900681 Array#shift 973.59 324.56 0.00 649.0332892570/32892570 Hash#each_key 0.04 0.03 0.00 0.01 24306/98702064 Hash#[]= [..] 99.32% of the total time is spent in Bio::Tree#parent method and the methods it calls. Bio::Tree#parent calls Bio::Tree#path which calls Bio::Pathways#bfs_shortest_path which in turn calls Bio::Pathway#breadth_first_search (99.25% of total time is spent in this method and its sub calls). This was a huge surprise for me. Why would breadth first search be needed if I just want to know the parent node of the current node. The reason I am using Bio::Tree#parent is because I have to keep track of the current node I am parsing. When I have reached element i set the current_node to the parent of the node I just parsed. I see here two options. 1) Keep track of the current node myself (by putting references in an array and pushing and poping accordingly). Thus I won't have to call the Bio::Tree#parent method. 2) Update Bio::Tree/ Bio::Node class so that nodes contain references to their parents. (thus not needing to call breadth first search). What do you think? Diana From czmasek at burnham.org Mon Jun 29 17:26:23 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Jun 2009 14:26:23 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Profiling In-Reply-To: <4057d3bf0906281255t86fe2a7m4eaa9e047efc2e10@mail.gmail.com> References: <4057d3bf0906281255t86fe2a7m4eaa9e047efc2e10@mail.gmail.com> Message-ID: <4A49317F.3030708@burnham.org> Hi, Diana: Great analysis! > > The reason I am using Bio::Tree#parent is because I have to keep track > of the current node I am parsing. When I have reached element > i set the current_node to the parent of the node I just parsed. > > I see here two options. > > 1) Keep track of the current node myself (by putting references in an > array and pushing and poping accordingly). Thus I won't have to call > the Bio::Tree#parent method. As a temporary solution, you could try this. > > 2) Update Bio::Tree/ Bio::Node class so that nodes contain references > to their parents. (thus not needing to call breadth first search). This seems a better (long term) solution, but _might_ be out of scope for this summer project. Christian From donttrustben at gmail.com Mon Jun 29 20:12:50 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 30 Jun 2009 10:12:50 +1000 Subject: [BioRuby] Bio::NCBI:REST:EFetch Message-ID: Hi, I was just googling how to download a genbank sequence using bioruby, and somehow got pointed to this example code: # == Usage # # Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") But this doesn't seem to work in irb: $ gem list bio *** LOCAL GEMS *** bio (1.3.0) $ irb -rubygems irb(main):001:0> require 'bio' => true irb(main):002:0> Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") NameError: uninitialized constant Bio::NCBI::REST::EFetch from (irb):2 Then I noticed by looking at the code I could just do Bio::NCBI::REST::efetch("EF489424", {:rettype => 'fasta', :db => 'sequences'}) So it seems there is some redundancy. What is going on? Should there be a pointer to Bio::NCBI::REST::efetch from Bio::NCBI::REST::EFetch in the rdoc? That would have made me understand a lot quicker, and I wouldn't have had to look at the code to figure it out. Thanks, ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From rozziite at gmail.com Mon Jun 29 21:26:28 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Mon, 29 Jun 2009 21:26:28 -0400 Subject: [BioRuby] Bioruby PhyloXML update 6 Message-ID: <4057d3bf0906291826r7242237dh5c249fd762b7b6be@mail.gmail.com> Hi all, there is update of the last week: ? Asked for a code review. I got very good suggestions on what and how to improve things. Some of them did this week, some will come later. ? Documented requirement of libxml-ruby. ? Documented more PhyloXML::Node element. ? Wrote code so that phyloxml test suite exits if libxml-ruby library is not present. (This took me quite a long time to figure it out. Eventually i sent email to ruby-talk mailing list and got a great help.) ? Created a branch testbig. There created file test_phyloxml_big.rb wrote method parse_tree_dummy. ? Did code profiling. Discovered that ~99% of the time is spent in Bio::Tree#parent. Changed the code to keep track myself of the current node in an array. Speed increase was tremendous. When parsing mollusca xml (1.5MB of data) it went down from 443 to 2 seconds. When parsing tree of life xml (45MB of data) it took 34 seconds instead of more than 3 hours. Plan for next week: ? Continue working on documentation ? write usage cases like phyloxml.each do |tree| end ; Calculate total branch lengths? (Any other uses? ) Look at Perl Phyloxml implementation and port those usage cases in Bioruby. ? Adding tests for marginal cases. (decide what to do with invalid xml files). ? Will do some more code profiling (its fun :) ) But it looks like we are in pretty good shape. ? Change organization of classes a bit. Split code in several files. Have a module PhyloXML. Have a class PhyloXMLParser (in phyloxml_parser.rb) in it. Have all the phyloxml element classes defined in phyloxml_elements.rb file (under PhyloXML module). And then later will have PhyloXMLWriter class. ? Other tweaks to prepare for PhyloXML parser deliverable. Diana From ngoto at gen-info.osaka-u.ac.jp Tue Jun 30 08:19:25 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 30 Jun 2009 21:19:25 +0900 Subject: [BioRuby] Bio::NCBI:REST:EFetch In-Reply-To: References: Message-ID: <20090630121925.E048F1CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 30 Jun 2009 10:12:50 +1000 Ben Woodcroft wrote: > Hi, > > I was just googling how to download a genbank sequence using bioruby, and > somehow got pointed to this example code: > > # == Usage > # > # Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") > > But this doesn't seem to work in irb: > > $ gem list bio > > *** LOCAL GEMS *** > > bio (1.3.0) > $ irb -rubygems > irb(main):001:0> require 'bio' > => true > irb(main):002:0> > Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") > NameError: uninitialized constant Bio::NCBI::REST::EFetch > from (irb):2 In my machine, it works correctly. $ ruby --version ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-linux] lng[ngoto<3>]$ gem --version 1.3.4 $ gem install bio Successfully installed bio-1.3.0 1 gem installed Installing ri documentation for bio-1.3.0... Installing RDoc documentation for bio-1.3.0... $ irb -r rubygems irb(main):001:0> require 'bio' => true irb(main):002:0> Bio::BIORUBY_VERSION => [1, 3, 0] irb(main):003:0> Bio::BIORUBY_VERSION_ID => "1.3.0" irb(main):004:0> Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") => "LOCUS X63139 854 bp DNA linear MAM 17-DEC-1991\nDEFINITION B.taurus beta-lactoglobulin gene 5'-region and partial exon 1.\ (snip) The NameError may be caused by old version of BioRuby which may exist somewhere in the $LOAD_PATH. Please check the following version identifiers of BioRuby. p Bio::BIORUBY_VERSION p Bio::BIORUBY_VERSION_ID p Bio::BIORUBY_EXTRA_VERSION > Then I noticed by looking at the code I could just do > > Bio::NCBI::REST::efetch("EF489424", {:rettype => 'fasta', :db => > 'sequences'}) > This also works. > So it seems there is some redundancy. What is going on? Should there be a > pointer to Bio::NCBI::REST::efetch from Bio::NCBI::REST::EFetch in the rdoc? > That would have made me understand a lot quicker, and I wouldn't have had to > look at the code to figure it out. Both should work, and I think redundancy is not severe problem. Methods about EFetch is defined in Bio::NCBI::REST::EFetch::Methods and documents for the methods are also available. http://bioruby.org/rdoc/classes/Bio/NCBI/REST/EFetch/Methods.html But, the hierarchy of the documentation may be difficult to know for most users. Contributions and suggestions for documentation are welcome. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > Thanks, > ben > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same > place. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From donttrustben at gmail.com Tue Jun 30 08:40:57 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 30 Jun 2009 22:40:57 +1000 Subject: [BioRuby] Fwd: Bio::NCBI:REST:EFetch In-Reply-To: References: <20090630121925.E048F1CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Message-ID: oops - forgot to post back to the mailing list. ---------- Forwarded message ---------- From: Ben Woodcroft Date: 2009/6/30 Subject: Re: [BioRuby] Bio::NCBI:REST:EFetch To: Naohisa GOTO Hi, > The NameError may be caused by old version of BioRuby which > may exist somewhere in the $LOAD_PATH. You are smart and I am stupid. Both should work, and I think redundancy is not severe problem. > Methods about EFetch is defined in Bio::NCBI::REST::EFetch::Methods > and documents for the methods are also available. > > http://bioruby.org/rdoc/classes/Bio/NCBI/REST/EFetch/Methods.html > > But, the hierarchy of the documentation may be difficult to know > for most users. Contributions and suggestions for documentation > are welcome. It is a bit misleading that they are redundant, but if they both work, then I don't mind so much. One suggestion I do have is that the returned objects shouldn't just be strings, but should automatically be parsed. It seems redundant to call Bio::FastaFormat.new(Bio::NCBI::REST::efetch("EF489424", {:rettype => 'fasta', :db =>'sequences'})[0]) But it isn't too much of a big deal. In the end I've got my pipeline up and bioruby is automating the things I want it to, so thanks! ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From czmasek at burnham.org Mon Jun 1 03:46:14 2009 From: czmasek at burnham.org (Christian Zmasek) Date: Sun, 31 May 2009 20:46:14 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Mapping sequence In-Reply-To: References: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> , Message-ID: Hi, Mike: In general only external nodes would have a sequence associated with them. In the case of ancestral sequence reconstruction attempts, internal nodes might have sequences, too, though. Please remember to none of the elements in phyloXML are mandatory. With 'original' sequence I meant the sequence prior to alignment. Cheers, Christian ________________________________________ From: Michael Barton [mail at michaelbarton.me.uk] Sent: Sunday, May 31, 2009 1:59 PM To: Christian Zmasek Cc: Phyloinformatics Group; bioruby at lists.open-bio.org; rozziite at gmail.com Subject: Re: [BioRuby] GSOC: phyloXML for BioRuby: Mapping sequence Hi Christian Would this mean that there is a predicted single ancestor sequence object associated at each node in a phylogenetic tree? You could start at a specific node and traverse to ancestor or descendant nodes, and therefore sequences? When you write original sequence, you mean the multiple sequence alignment? Cheers Mike 2009/5/31 Christian Zmasek : > Hi, Michael: > > Good point. Actually, it is not specified. It is just a sequence associated with a node. > In my own work, I use it for the original sequence (before the introduction of gaps, and possible trimming of columns, during and after the alignment process). > Hence, I do not think reuse of the MSA class is appropriate. > > Christian > > > > > ________________________________________ > From: bioruby-bounces at lists.open-bio.org [bioruby-bounces at lists.open-bio.org] On Behalf Of Michael Barton [mail at michaelbarton.me.uk] > Sent: Sunday, May 31, 2009 2:45 AM > To: Phyloinformatics Group; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] GSOC: phyloXML for BioRuby: Mapping sequence > > I'm not very familiar with phyloXML, but when you write sequence, do > you mean a multiple sequence alignment from which the phylogeny was > estimated? If that's the case, there is a MSA class in bioruby which > this could be mapped to perhaps? > > 2009/5/30 Diana Jaunzeikare : >> Hi all, >> >> So I looked more carefully at the sequence element of phyloXML and it >> consists of information which cannot be mapped to Bio::Sequence object. I >> suggest to have a sequence class which closely resembles phyloXML structure >> and then have a method to extract relevant elements return Bio::Sequence >> object. What do you think? >> >> Here on the left i listed phyloXML sequence tag elements and after the arrow >> -> the possible corresponding attribute of Bio::Sequence >> * type >> ** rna, dna -> Bio::Sequence::NA -> molecule type >> ** aa -> Bio::Sequence::AA >> * id_source (string ?) -> id_namespace >> * id_ref (string ) -> entry_id >> * symbol (string ?) >> * accession >> ** source (example: "UniProtKB") -> >> ** id (example: "P17304") -> primary_accession >> * name (string ) >> * location (string ? ) >> * mol_seq (string) -> seq / Bio::Sequence::NA/AA >> * uri >> ** desc (string) >> ** type (string ) >> ** uri >> >> * annotation [] >> ** ref >> ** source >> ** evidence >> ** type >> ** desc >> ** confidence >> ** property [] >> ** uri >> >> * domain_architecture >> ** length >> ** domain [] >> *** from >> *** to >> *** confidence >> *** id >> >> Diana >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From rozziite at gmail.com Fri Jun 5 14:56:02 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 5 Jun 2009 10:56:02 -0400 Subject: [BioRuby] GSOC: phyloXML for BioRuby: should pluralize attribute names which hold arrays? Message-ID: <4057d3bf0906050756t3b059281id3b84d532b3a2c15@mail.gmail.com> Hi all, Distribution element of phyloXML consists of [0..1], [0..*] and [0..*] tags. When mapping it to a class I have a temptation to call the class attributes in plural form for point and polygon, since there can be several such tags included in the Distribution. Like this: Distribution class: - desc (string) - points [] (Array of Point objects) - polygons [] (Array of Polygon objects) If I were to follow such convention throughout all classes, some plural forms might sound a bit awkward. For example: PhyloXMLNode class: - confidences [] (Array of Confidence objects) - taxonomies [] (array of Taxonomy objects) - sequences [] (Array of Sequence objects) - events (Events objectS) - distributions [] (Array of Distribution objects) - references [] (Reference object) - properties [] (Property object) Confidences sound a bit weird (but then again, I am not a native English speaker). Events are plural, but its not an array of objects. The reason I am bringing this up, is because if the attributes which hold arrays would be plural it would be easier to remember that they are arrays, and not forget to add index in brackets (which I myself forget fairly often when writing unit tests), for example node.sequences[0].name instead of node.sequence[0].name What do you think? Diana From ngoto at gen-info.osaka-u.ac.jp Sun Jun 7 06:28:53 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sun, 7 Jun 2009 15:28:53 +0900 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Mapping sequence In-Reply-To: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> References: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> Message-ID: <20090607062854.BB86C1CBC552@idnmail.gen-info.osaka-u.ac.jp> Hi, sorry for delay. On Sat, 30 May 2009 17:27:52 -0400 Diana Jaunzeikare wrote: > Hi all, > > So I looked more carefully at the sequence element of phyloXML and it > consists of information which cannot be mapped to Bio::Sequence object. I > suggest to have a sequence class which closely resembles phyloXML structure > and then have a method to extract relevant elements return Bio::Sequence > object. What do you think? In this case, the method to convert from Bio::Sequence to the phyloXML sequence class is also needed. If some of the attributes are really essential and not specific to phyloXML but will be needed from other data types, it is also possible to add new attributes to Bio::Sequence. > Here on the left i listed phyloXML sequence tag elements and after the arrow > -> the possible corresponding attribute of Bio::Sequence > * type > ** rna, dna -> Bio::Sequence::NA -> molecule type > ** aa -> Bio::Sequence::AA > * id_source (string ?) -> id_namespace > * id_ref (string ) -> entry_id > * symbol (string ?) > * accession > ** source (example: "UniProtKB") -> > ** id (example: "P17304") -> primary_accession > * name (string ) > * location (string ? ) > * mol_seq (string) -> seq / Bio::Sequence::NA/AA > * uri > ** desc (string) > ** type (string ) > ** uri > > * annotation [] > ** ref > ** source > ** evidence > ** type > ** desc > ** confidence > ** property [] > ** uri > > * domain_architecture > ** length > ** domain [] > *** from > *** to > *** confidence > *** id The annotations and domain architecture could be mapped to the features in Bio::Sequence. But, in some cases, it is difficult to be mapped, depending on the vocabulary used in the annotations/domain_architecture. -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From georgkam at gmail.com Tue Jun 9 12:26:45 2009 From: georgkam at gmail.com (George Githinji) Date: Tue, 9 Jun 2009 15:26:45 +0300 Subject: [BioRuby] Problem with Bio::GFF::GFF2 Message-ID: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> Hi all, I am try to parse a GFF file. The file looks like this ##gff-version 2 ##source-version bepipred-1.0b ##date 2009-06-09 ##Type Protein seq1 # seqname source feature start end score N/A ? # --------------------------------------------------------------------------- seq1 bepipred-1.0b epitope 1 1 0.173 . . . seq1 bepipred-1.0b epitope 2 2 -0.043 . . . seq1 bepipred-1.0b epitope 3 3 -0.014 . . . seq1 bepipred-1.0b epitope 4 4 0.144 . . . seq1 bepipred-1.0b epitope 5 5 0.250 . . . seq1 bepipred-1.0b epitope 6 6 0.218 . . . ....truncated and i have written the following lines with an aim of extracting the start, end and score attributes. but before that i wanted to know whether the full attributes are available. so i did the following. require 'rubygems' require 'bio' bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) bep_gff.records.each do |record| puts record.attributes_to_hash.inspect end However, i get empty hashes. Any ideas? Thank you -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ From ngoto at gen-info.osaka-u.ac.jp Tue Jun 9 13:44:19 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 9 Jun 2009 22:44:19 +0900 Subject: [BioRuby] Problem with Bio::GFF::GFF2 In-Reply-To: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> References: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> Message-ID: <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> Hi George, On Tue, 9 Jun 2009 15:26:45 +0300 George Githinji wrote: > Hi all, > I am try to parse a GFF file. The file looks like this > > ##gff-version 2 > ##source-version bepipred-1.0b > ##date 2009-06-09 > ##Type Protein seq1 > # seqname source feature start end score N/A ? > # > --------------------------------------------------------------------------- > seq1 bepipred-1.0b epitope 1 1 0.173 . . . > seq1 bepipred-1.0b epitope 2 2 -0.043 . . . > seq1 bepipred-1.0b epitope 3 3 -0.014 . . . > seq1 bepipred-1.0b epitope 4 4 0.144 . . . > seq1 bepipred-1.0b epitope 5 5 0.250 . . . > seq1 bepipred-1.0b epitope 6 6 0.218 . . . > > ....truncated The above GFF records do not contain any "attributes". The field definition of each GFF line is: [attributes] [comments] When talking about GFF, the word "attributes" points the "attributes" field in each GFF line. See the GFF2 specifications document for details. http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml > and i have written the following lines with an aim of extracting the start, > end and score attributes. but before that i wanted to know whether the full > attributes are available. so i did the following. > > require 'rubygems' > require 'bio' > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) > > bep_gff.records.each do |record| > puts record.attributes_to_hash.inspect > end > > However, i get empty hashes. > Any ideas? Because the Bio::GFF2::Record#attributes_to_hash method returns "attributes" as a hash, and all "attributes" field in the above GFF2 records are empty, showing empty hashes is logically right. If you really want a hash, adding each field into a hash would be the easiest way. For example, bep_gff.records.each do |record| h = {} h['seqname'] = record.seqname h['source'] = record.source h['feature'] = record.feature h['start'] = record.start h['end'] = record.end h['score'] = record.score h['strand'] = record.strand h['frame'] = record.frame h['attributes'] = record.attributes_to_hash p h end Bio::GFF2::Record have seqname, source, feature, start, end, score, strand, frame attributes(so called in the Ruby language), which are inherited from Bio::GFF::Record class. Normally, it is natural using the above attributes(in Ruby) directly without creating a hash. Note that using attributes_to_hash may lost some data when there are two or more values with the same tag name in an "attributes" field. When creating new data, in case using "attributes" extensively, GFF3 is recommended, because the design of GFF2 attributes is somehow broken. > Thank you > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ Your blog is nice! -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From georgkam at gmail.com Tue Jun 9 14:24:38 2009 From: georgkam at gmail.com (George Githinji) Date: Tue, 9 Jun 2009 17:24:38 +0300 Subject: [BioRuby] Problem with Bio::GFF::GFF2 In-Reply-To: <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> References: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <55915f820906090724i37419d65hea6f6e36260c1f42@mail.gmail.com> Thank you so much Naohisa for the excellent explanation!! however bep_gff.records.each do |record| p record.seqname end returns "seq1 bepipred-1.0b epitope 1 1 0.173 . . ." which is not what is intended and record.score, record.start etc all return nil. :( On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO wrote: > Hi George, > > On Tue, 9 Jun 2009 15:26:45 +0300 > George Githinji wrote: > > > Hi all, > > I am try to parse a GFF file. The file looks like this > > > > ##gff-version 2 > > ##source-version bepipred-1.0b > > ##date 2009-06-09 > > ##Type Protein seq1 > > # seqname source feature start end score N/A > ? > > # > > > --------------------------------------------------------------------------- > > seq1 bepipred-1.0b epitope 1 1 0.173 . . . > > seq1 bepipred-1.0b epitope 2 2 -0.043 . . . > > seq1 bepipred-1.0b epitope 3 3 -0.014 . . . > > seq1 bepipred-1.0b epitope 4 4 0.144 . . . > > seq1 bepipred-1.0b epitope 5 5 0.250 . . . > > seq1 bepipred-1.0b epitope 6 6 0.218 . . . > > > > ....truncated > > The above GFF records do not contain any "attributes". > The field definition of each GFF line is: > > [attributes] [comments] > > When talking about GFF, the word "attributes" points the > "attributes" field in each GFF line. > > See the GFF2 specifications document for details. > http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml > > > and i have written the following lines with an aim of extracting the > start, > > end and score attributes. but before that i wanted to know whether the > full > > attributes are available. so i did the following. > > > > require 'rubygems' > > require 'bio' > > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) > > > > bep_gff.records.each do |record| > > puts record.attributes_to_hash.inspect > > end > > > > However, i get empty hashes. > > Any ideas? > > Because the Bio::GFF2::Record#attributes_to_hash method returns > "attributes" as a hash, and all "attributes" field in the above > GFF2 records are empty, showing empty hashes is logically right. > > If you really want a hash, adding each field into a hash would > be the easiest way. For example, > > bep_gff.records.each do |record| > h = {} > h['seqname'] = record.seqname > h['source'] = record.source > h['feature'] = record.feature > h['start'] = record.start > h['end'] = record.end > h['score'] = record.score > h['strand'] = record.strand > h['frame'] = record.frame > h['attributes'] = record.attributes_to_hash > p h > end > > Bio::GFF2::Record have seqname, source, feature, start, end, > score, strand, frame attributes(so called in the Ruby language), > which are inherited from Bio::GFF::Record class. > Normally, it is natural using the above attributes(in Ruby) > directly without creating a hash. > > Note that using attributes_to_hash may lost some data when > there are two or more values with the same tag name in an > "attributes" field. > > When creating new data, in case using "attributes" extensively, > GFF3 is recommended, because the design of GFF2 attributes is > somehow broken. > > > Thank you > > > > > > -- > > --------------- > > Sincerely > > George > > > > Skype: george_g2 > > Blog: http://biorelated.wordpress.com/ > > Your blog is nice! > > -- > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > -- --------------- Sincerely George Skype: george_g2 Blog: http://biorelated.wordpress.com/ From czmasek at burnham.org Tue Jun 9 18:10:40 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 9 Jun 2009 11:10:40 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: should pluralize attribute names which hold arrays? In-Reply-To: <4057d3bf0906050756t3b059281id3b84d532b3a2c15@mail.gmail.com> References: <4057d3bf0906050756t3b059281id3b84d532b3a2c15@mail.gmail.com> Message-ID: <4A2EA5A0.2070502@burnham.org> Hi, Diana: Indeed some of these plurals sound a little weird. Part of it is due to English issues, and also because for some elements, it is unusual to have more than one. For example, taxonomies: in the vast majority of cases, each node is associated with one taxonomy, yet phyloxml allows more than one taxonomy per node. That being said, I would agree with you and others, and conclude that using the plurals might be more appropriate, even though a little unexpected. Christian Diana Jaunzeikare wrote: > Hi all, > > Distribution element of phyloXML consists of [0..1], > [0..*] and [0..*] tags. When mapping it to a class I > have a temptation to call the class attributes in plural form for > point and polygon, since there can be several such tags included in > the Distribution. Like this: > > > Distribution class: > > * desc (string) > * points [] (Array of Point objects) > * polygons [] (Array of Polygon objects) > > > If I were to follow such convention throughout all classes, some > plural forms might sound a bit awkward. For example: > > PhyloXMLNode class: > > * confidences [] (Array of Confidence objects) > * taxonomies [] (array of Taxonomy objects) > * sequences [] (Array of Sequence objects) > * events (Events objectS) > * distributions [] (Array of Distribution objects) > * references [] (Reference object) > * properties [] (Property object) > > Confidences sound a bit weird (but then again, I am not a native > English speaker). Events are plural, but its not an array of objects. > > The reason I am bringing this up, is because if the attributes which > hold arrays would be plural it would be easier to remember that they > are arrays, and not forget to add index in brackets (which I myself > forget fairly often when writing unit tests), for example > > node.sequences[0].name > > instead of > > node.sequence[0].name > > What do you think? > > Diana From czmasek at burnham.org Tue Jun 9 19:18:20 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 9 Jun 2009 12:18:20 -0700 Subject: [BioRuby] [Wg-phyloinformatics] GSOC: phyloXML for BioRuby: Mapping sequence In-Reply-To: <20090607062854.BB86C1CBC552@idnmail.gen-info.osaka-u.ac.jp> References: <4057d3bf0905301427u2e6cd6c8t759c29566b08f4db@mail.gmail.com> <20090607062854.BB86C1CBC552@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4A2EB57C.6090100@burnham.org> Hi: Thank you for the detailed comments. I think this is a very crucial point, since sequence and taxonomy are the two most important elements. At this point, I would recommend to create a special class for phyloxml-sequence, and add methods/constructors to it which make transferring to and from Bio::Sequence easy. But I can definitely see the advantages of directly using Bio::Sequence, too. Also, please don't forget that, should a consensus/strong opinion emerge, we could also add features to the phyloxml-sequence definition to make it match BioRuby and BioPython sequence better. Christian Naohisa GOTO wrote: > Hi, > > sorry for delay. > > On Sat, 30 May 2009 17:27:52 -0400 > Diana Jaunzeikare wrote: > > >> Hi all, >> >> So I looked more carefully at the sequence element of phyloXML and it >> consists of information which cannot be mapped to Bio::Sequence object. I >> suggest to have a sequence class which closely resembles phyloXML structure >> and then have a method to extract relevant elements return Bio::Sequence >> object. What do you think? >> > > In this case, the method to convert from Bio::Sequence to the > phyloXML sequence class is also needed. > > If some of the attributes are really essential and not specific > to phyloXML but will be needed from other data types, it is > also possible to add new attributes to Bio::Sequence. > > >> Here on the left i listed phyloXML sequence tag elements and after the arrow >> -> the possible corresponding attribute of Bio::Sequence >> * type >> ** rna, dna -> Bio::Sequence::NA -> molecule type >> ** aa -> Bio::Sequence::AA >> * id_source (string ?) -> id_namespace >> * id_ref (string ) -> entry_id >> id_source and id_ref are actually used to describe relations between sequences, for example to describe orthology-relationships. >> * symbol (string ?) >> * accession >> ** source (example: "UniProtKB") -> >> ** id (example: "P17304") -> primary_accession >> ** source -> id_namespace ** id -> primary_accession (or entry_id) >> * name (string ) >> * location (string ? ) >> * mol_seq (string) -> seq / Bio::Sequence::NA/AA >> * uri >> ** desc (string) >> ** type (string ) >> ** uri >> >> * annotation [] >> ** ref >> ** source >> ** evidence >> ** type >> ** desc >> ** confidence >> ** property [] >> ** uri >> >> * domain_architecture >> ** length >> ** domain [] >> *** from >> *** to >> *** confidence >> *** id >> > > The annotations and domain architecture could be mapped to the features > in Bio::Sequence. But, in some cases, it is difficult to be mapped, > depending on the vocabulary used in the annotations/domain_architecture. > > From ngoto at gen-info.osaka-u.ac.jp Wed Jun 10 06:14:30 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 10 Jun 2009 15:14:30 +0900 Subject: [BioRuby] Problem with Bio::GFF::GFF2 In-Reply-To: <55915f820906090724i37419d65hea6f6e36260c1f42@mail.gmail.com> References: <55915f820906090526s16271c5am319cdb94d69defb9@mail.gmail.com> <20090609134420.8C2121CBC562@idnmail.gen-info.osaka-u.ac.jp> <55915f820906090724i37419d65hea6f6e36260c1f42@mail.gmail.com> Message-ID: <20090610061431.02FB21CBC56B@idnmail.gen-info.osaka-u.ac.jp> On Tue, 9 Jun 2009 17:24:38 +0300 George Githinji wrote: > Thank you so much Naohisa for the excellent explanation!! > however > > bep_gff.records.each do |record| > p record.seqname > end > > returns > "seq1 bepipred-1.0b epitope 1 1 0.173 . . ." > > > which is not what is intended and > record.score, record.start etc all return nil. It seems this is NOT a valid GFF2 format. In GFF formats, delimiter must be a TAB ("\t" in Ruby). However, in above data, it seems that characters between "seq1" and "bepipred-1.0b" entry may be white spaces (" " in Ruby), instead of a TAB. Copy-and-paste from terminal or web browser, or autocomlete function in a text editor or wordprocessor can often create such kind of degenerated data. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > :( > > > > > > On Tue, Jun 9, 2009 at 4:44 PM, Naohisa GOTO > wrote: > > > Hi George, > > > > On Tue, 9 Jun 2009 15:26:45 +0300 > > George Githinji wrote: > > > > > Hi all, > > > I am try to parse a GFF file. The file looks like this > > > > > > ##gff-version 2 > > > ##source-version bepipred-1.0b > > > ##date 2009-06-09 > > > ##Type Protein seq1 > > > # seqname source feature start end score N/A > > ? > > > # > > > > > --------------------------------------------------------------------------- > > > seq1 bepipred-1.0b epitope 1 1 0.173 . . . > > > seq1 bepipred-1.0b epitope 2 2 -0.043 . . . > > > seq1 bepipred-1.0b epitope 3 3 -0.014 . . . > > > seq1 bepipred-1.0b epitope 4 4 0.144 . . . > > > seq1 bepipred-1.0b epitope 5 5 0.250 . . . > > > seq1 bepipred-1.0b epitope 6 6 0.218 . . . > > > > > > ....truncated > > > > The above GFF records do not contain any "attributes". > > The field definition of each GFF line is: > > > > [attributes] [comments] > > > > When talking about GFF, the word "attributes" points the > > "attributes" field in each GFF line. > > > > See the GFF2 specifications document for details. > > http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml > > > > > and i have written the following lines with an aim of extracting the > > start, > > > end and score attributes. but before that i wanted to know whether the > > full > > > attributes are available. so i did the following. > > > > > > require 'rubygems' > > > require 'bio' > > > bep_gff = Bio::GFF::GFF2.new(File.open('/home/george/bpred.gff')) > > > > > > bep_gff.records.each do |record| > > > puts record.attributes_to_hash.inspect > > > end > > > > > > However, i get empty hashes. > > > Any ideas? > > > > Because the Bio::GFF2::Record#attributes_to_hash method returns > > "attributes" as a hash, and all "attributes" field in the above > > GFF2 records are empty, showing empty hashes is logically right. > > > > If you really want a hash, adding each field into a hash would > > be the easiest way. For example, > > > > bep_gff.records.each do |record| > > h = {} > > h['seqname'] = record.seqname > > h['source'] = record.source > > h['feature'] = record.feature > > h['start'] = record.start > > h['end'] = record.end > > h['score'] = record.score > > h['strand'] = record.strand > > h['frame'] = record.frame > > h['attributes'] = record.attributes_to_hash > > p h > > end > > > > Bio::GFF2::Record have seqname, source, feature, start, end, > > score, strand, frame attributes(so called in the Ruby language), > > which are inherited from Bio::GFF::Record class. > > Normally, it is natural using the above attributes(in Ruby) > > directly without creating a hash. > > > > Note that using attributes_to_hash may lost some data when > > there are two or more values with the same tag name in an > > "attributes" field. > > > > When creating new data, in case using "attributes" extensively, > > GFF3 is recommended, because the design of GFF2 attributes is > > somehow broken. > > > > > Thank you > > > > > > > > > -- > > > --------------- > > > Sincerely > > > George > > > > > > Skype: george_g2 > > > Blog: http://biorelated.wordpress.com/ > > > > Your blog is nice! > > > > -- > > Naohisa Goto > > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > > > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Blog: http://biorelated.wordpress.com/ > From jan.aerts at gmail.com Fri Jun 12 09:53:08 2009 From: jan.aerts at gmail.com (Jan Aerts) Date: Fri, 12 Jun 2009 10:53:08 +0100 Subject: [BioRuby] locus mixin Message-ID: <4c7507a70906120253l166e052m42ff7df7c8864df2@mail.gmail.com> What do people think about adding a IsLocus mixin to bioruby? For a lot of my work I need to check if genes or polymorphisms or clones or ... overlap. I use the IsLocus mixin to get that done. Any object that has a chromosome, start and stop can have the module mixed in. Some of the methods as I have them defined locally: module IsLocus def range return Range.new(self.start, self.stop) end def overlaps?(other_locus) return false if self.chromosome != other_locus.chromosome if self.range.overlaps?(other_locus.range) return true end return false end def contained_by?(other_locus) return false if self.chromosome != other_locus.chromosome if self.range.contained_by?(other_locus.range) return true end return false end def contains?(other_locus) return false if self.chromosome != other_locus.chromosome if self.range.contains?(other_locus.range) return true end return false end def to_s return self.chromosome + ':' + self.range.to_s end def to_gff3 return [self.chromosome, self.class.name, self.start, self.stop, '.', '.', '.', 'ID=' + self.id.to_s].join("\t") end def to_bed if self.respond_to?(:name) return [self.chromosome, self.start, self.stop, self.name].join("\t") else return [self.chromosome, self.start, self.stop, self.class.name + '_' + self.id.to_s].join("\t") end end # The following makes it possible to call Gene#to_bed which would dump all Gene objects in BED format def self.included mod class << mod def to_bed output = Array.new output.push("track name='#{self.name}' description='#{self.name}'") self.all.each do |record| output.push record.to_bed end return output.join("\n") end end end end Let me know what you think, jan. From rozziite at gmail.com Fri Jun 12 15:25:40 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 12 Jun 2009 11:25:40 -0400 Subject: [BioRuby] Bioruby unit tests Message-ID: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> Hi all, I am working on implementing phyloxml support and I was running only my unit tests to test my code. Then yesterday I ran all of the unit tests and it gave me errors (when I first cloned it did not gave me any errors). I don't think i changed anything in any other file than lib/bio/db/phyloxml.rb and test/unit/bio/db/test_phyloxml.rb Here is the output of test/runner.rb. Looks all of the errors are of the same kind. diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb Loaded suite . Started .........................................................................................EE...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE........................................................................................................................................................................................................................................EEEEEE...................................................................E........E.................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEE............................................................................................................................... Finished in 176.329241 seconds. 1) Error: test_output_embl(Bio::FuncTestSequenceOutputEMBL): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:263:in `initialize' ./lib/bio/sequence.rb:263:in `new' ./lib/bio/sequence.rb:263:in `auto' ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' 2) Error: test_output_fasta(Bio::FuncTestSequenceOutputEMBL): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:263:in `initialize' ./lib/bio/sequence.rb:263:in `new' ./lib/bio/sequence.rb:263:in `auto' ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' 3) Error: test_alignment(Bio::TestAlignmentMultiFastaFormat): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:89:in `each' ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:61:in `alignment' ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' 4) Error: test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:89:in `each' ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' ./lib/bio/appl/mafft/report.rb:61:in `alignment' ./test/unit/bio/appl/mafft/test_report.rb:57:in `test_determine_seq_method' 5) Error: test_const_version(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' 6) Error: test_gff_version(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' 7) Error: test_records(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' 8) Error: test_sequence_regions(Bio::TestGFF3): ArgumentError: wrong number of arguments (1 for 0) ./lib/bio/sequence.rb:443:in `initialize' ./lib/bio/sequence.rb:443:in `new' ./lib/bio/sequence.rb:443:in `adapter' ./lib/bio/db/fasta.rb:221:in `to_seq' ./lib/bio/db/gff.rb:954:in `parse_fasta' ./lib/bio/db/gff.rb:949:in `each_line' ./lib/bio/db/gff.rb:949:in `parse_fasta' ./lib/bio/db/gff.rb:941:in `parse' ./lib/bio/db/gff.rb:881:in `initialize' ./test/unit/bio/db/test_gff.rb:644:in `new' ./test/unit/bio/db/test_gff.rb:644:in `setup' [....] 2175 tests, 5180 assertions, 0 failures, 54 errors Any ideas? Code is available here http://github.com/latvianlinuxgirl/bioruby/tree/dev Diana From czmasek at burnham.org Fri Jun 12 21:13:13 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Fri, 12 Jun 2009 14:13:13 -0700 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> Message-ID: <4A32C4E9.7050000@burnham.org> Hi: I usually have one test fail (I don't remember which one though) but not that many. Which version of ruby are you using? Christian Diana Jaunzeikare wrote: > Hi all, > > I am working on implementing phyloxml support and I was running only > my unit tests to test my code. Then yesterday I ran all of the unit > tests and it gave me errors (when I first cloned it did not gave me > any errors). I don't think i changed anything in any other file than > lib/bio/db/phyloxml.rb and test/unit/bio/db/test_phyloxml.rb > > Here is the output of test/runner.rb. Looks all of the errors are of > the same kind. > > diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb > Loaded suite . > Started > .........................................................................................EE...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE........................................................................................................................................................................................................................................EEEEEE...................................................................E........E.................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEE............................................................................................................................... > Finished in 176.329241 seconds. > > 1) Error: > test_output_embl(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 2) Error: > test_output_fasta(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 3) Error: > test_alignment(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' > > 4) Error: > test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:57:in > `test_determine_seq_method' > > 5) Error: > test_const_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 6) Error: > test_gff_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 7) Error: > test_records(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 8) Error: > test_sequence_regions(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > [....] > 2175 tests, 5180 assertions, 0 failures, 54 errors > > > Any ideas? Code is available here > http://github.com/latvianlinuxgirl/bioruby/tree/dev > > Diana From rozziite at gmail.com Fri Jun 12 21:23:35 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Fri, 12 Jun 2009 17:23:35 -0400 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <4A32C4E9.7050000@burnham.org> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> <4A32C4E9.7050000@burnham.org> Message-ID: <4057d3bf0906121423l6277e57ao5e27ceaef0f88fec@mail.gmail.com> I am using ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux] Diana On Fri, Jun 12, 2009 at 5:13 PM, Christian M Zmasek wrote: > Hi: > > I usually have one test fail (I don't remember which one though) but not > that many. > Which version of ruby are you using? > > Christian > > > > Diana Jaunzeikare wrote: > >> Hi all, >> >> I am working on implementing phyloxml support and I was running only my >> unit tests to test my code. Then yesterday I ran all of the unit tests and >> it gave me errors (when I first cloned it did not gave me any errors). I >> don't think i changed anything in any other file than lib/bio/db/phyloxml.rb >> and test/unit/bio/db/test_phyloxml.rb >> >> Here is the output of test/runner.rb. Looks all of the errors are of the >> same kind. >> diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb >> Loaded suite . >> Started >> >> .........................................................................................EE...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE........................................................................................................................................................................................................................................EEEEEE...................................................................E........E.................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEE............................................................................................................................... >> Finished in 176.329241 seconds. >> >> 1) Error: >> test_output_embl(Bio::FuncTestSequenceOutputEMBL): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:263:in `initialize' >> ./lib/bio/sequence.rb:263:in `new' >> ./lib/bio/sequence.rb:263:in `auto' >> ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' >> >> 2) Error: >> test_output_fasta(Bio::FuncTestSequenceOutputEMBL): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:263:in `initialize' >> ./lib/bio/sequence.rb:263:in `new' >> ./lib/bio/sequence.rb:263:in `auto' >> ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' >> >> 3) Error: >> test_alignment(Bio::TestAlignmentMultiFastaFormat): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:89:in `each' >> ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:61:in `alignment' >> ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' >> >> 4) Error: >> test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:89:in `each' >> ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' >> ./lib/bio/appl/mafft/report.rb:61:in `alignment' >> ./test/unit/bio/appl/mafft/test_report.rb:57:in >> `test_determine_seq_method' >> >> 5) Error: >> test_const_version(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> >> 6) Error: >> test_gff_version(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> >> 7) Error: >> test_records(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> >> 8) Error: >> test_sequence_regions(Bio::TestGFF3): >> ArgumentError: wrong number of arguments (1 for 0) >> ./lib/bio/sequence.rb:443:in `initialize' >> ./lib/bio/sequence.rb:443:in `new' >> ./lib/bio/sequence.rb:443:in `adapter' >> ./lib/bio/db/fasta.rb:221:in `to_seq' >> ./lib/bio/db/gff.rb:954:in `parse_fasta' >> ./lib/bio/db/gff.rb:949:in `each_line' >> ./lib/bio/db/gff.rb:949:in `parse_fasta' >> ./lib/bio/db/gff.rb:941:in `parse' >> ./lib/bio/db/gff.rb:881:in `initialize' >> ./test/unit/bio/db/test_gff.rb:644:in `new' >> ./test/unit/bio/db/test_gff.rb:644:in `setup' >> [....] >> 2175 tests, 5180 assertions, 0 failures, 54 errors >> >> >> Any ideas? Code is available here >> http://github.com/latvianlinuxgirl/bioruby/tree/dev >> Diana >> > > > From ngoto at gen-info.osaka-u.ac.jp Sat Jun 13 04:47:02 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 13 Jun 2009 13:47:02 +0900 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> Message-ID: <20090613044703.7200F1CBC4B9@idnmail.gen-info.osaka-u.ac.jp> Hi Diana, This is because your original Sequence class definition in line 269 in lib/bio/db/phyloxml.rb violates BioRuby's Bio::Sequence. The PhyloXML Sequence class (and Events, Date, Id, Uri, etc) should be defined inside the Bio::PhyloXML namespace. For example, module Bio class PhyloXML class Sequence #... end class Events #... end class Date #... end end end In this case, Bio::PhyloXML::Sequence is different from Bio::Sequence. Be careful that the name Date is already used by Ruby's standard bundled library (require 'date'), althogh you can distinguish it by using ::Date and Bio::PhyloXML::Date. I also recommend that PhyloXMLTree and PhyloXMLNode are located inside the Bio::PhyloXML namespace (this means Bio::PhyloXML::PhyloXMLTree and Bio::PhyloXML::PhyloXMLNode. If possible, to rename to Bio::PhyloXML::Tree and Bio::PhyloXML::Node may be a good choice.) Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org On Fri, 12 Jun 2009 11:25:40 -0400 Diana Jaunzeikare wrote: > Hi all, > > I am working on implementing phyloxml support and I was running only my unit > tests to test my code. Then yesterday I ran all of the unit tests and it > gave me errors (when I first cloned it did not gave me any errors). I don't > think i changed anything in any other file than lib/bio/db/phyloxml.rb and > test/unit/bio/db/test_phyloxml.rb > > Here is the output of test/runner.rb. Looks all of the errors are of the > same kind. > > diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb > Loaded suite . > Started > .........................................................................................EE............................................... (snip) > Finished in 176.329241 seconds. > > 1) Error: > test_output_embl(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 2) Error: > test_output_fasta(Bio::FuncTestSequenceOutputEMBL): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:263:in `initialize' > ./lib/bio/sequence.rb:263:in `new' > ./lib/bio/sequence.rb:263:in `auto' > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > 3) Error: > test_alignment(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' > > 4) Error: > test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:89:in `each' > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > ./test/unit/bio/appl/mafft/test_report.rb:57:in > `test_determine_seq_method' > > 5) Error: > test_const_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 6) Error: > test_gff_version(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 7) Error: > test_records(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > 8) Error: > test_sequence_regions(Bio::TestGFF3): > ArgumentError: wrong number of arguments (1 for 0) > ./lib/bio/sequence.rb:443:in `initialize' > ./lib/bio/sequence.rb:443:in `new' > ./lib/bio/sequence.rb:443:in `adapter' > ./lib/bio/db/fasta.rb:221:in `to_seq' > ./lib/bio/db/gff.rb:954:in `parse_fasta' > ./lib/bio/db/gff.rb:949:in `each_line' > ./lib/bio/db/gff.rb:949:in `parse_fasta' > ./lib/bio/db/gff.rb:941:in `parse' > ./lib/bio/db/gff.rb:881:in `initialize' > ./test/unit/bio/db/test_gff.rb:644:in `new' > ./test/unit/bio/db/test_gff.rb:644:in `setup' > [....] > 2175 tests, 5180 assertions, 0 failures, 54 errors > > > Any ideas? Code is available here > http://github.com/latvianlinuxgirl/bioruby/tree/dev > > Diana > From ngoto at gen-info.osaka-u.ac.jp Sat Jun 13 06:05:01 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 13 Jun 2009 15:05:01 +0900 Subject: [BioRuby] locus mixin In-Reply-To: <4c7507a70906120253l166e052m42ff7df7c8864df2@mail.gmail.com> References: <4c7507a70906120253l166e052m42ff7df7c8864df2@mail.gmail.com> Message-ID: <20090613060502.F2CF51CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Hi, On Fri, 12 Jun 2009 10:53:08 +0100 Jan Aerts wrote: > What do people think about adding a IsLocus mixin to bioruby? For a lot of > my work I need to check if genes or polymorphisms or clones or ... overlap. > I use the IsLocus mixin to get that done. Any object that has a chromosome, > start and stop can have the module mixed in. Some of the methods as I have > them defined locally: What classes are considered the module to be mixed in? In BioRuby, as far as I know, there are currently no classes which have all of these methods simultaneously. I think only putting a mixin is not a good way. It is better to prepare some classes which can handle real data (which can probably be downloaded from famous genome/expression data repositories) and can perform typical tasks conveniently. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > module IsLocus > def range > return Range.new(self.start, self.stop) > end > > def overlaps?(other_locus) > return false if self.chromosome != other_locus.chromosome > > if self.range.overlaps?(other_locus.range) > return true > end > > return false > end > > def contained_by?(other_locus) > return false if self.chromosome != other_locus.chromosome > > if self.range.contained_by?(other_locus.range) > return true > end > > return false > end > > def contains?(other_locus) > return false if self.chromosome != other_locus.chromosome > > if self.range.contains?(other_locus.range) > return true > end > > return false > end > > def to_s > return self.chromosome + ':' + self.range.to_s > end > > def to_gff3 > return [self.chromosome, self.class.name, self.start, self.stop, '.', > '.', '.', 'ID=' + self.id.to_s].join("\t") > end > > def to_bed > if self.respond_to?(:name) > return [self.chromosome, self.start, self.stop, self.name].join("\t") > else > return [self.chromosome, self.start, self.stop, self.class.name + '_' > + self.id.to_s].join("\t") > end > end > > # The following makes it possible to call Gene#to_bed which would dump all > Gene objects in BED format > def self.included mod > class << mod > def to_bed > output = Array.new > output.push("track name='#{self.name}' description='#{self.name}'") > self.all.each do |record| > output.push record.to_bed > end > return output.join("\n") > end > end > end > end > > Let me know what you think, > jan. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From rozziite at gmail.com Sun Jun 14 15:22:29 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sun, 14 Jun 2009 11:22:29 -0400 Subject: [BioRuby] Bioruby unit tests In-Reply-To: <20090613044703.7200F1CBC4B9@idnmail.gen-info.osaka-u.ac.jp> References: <4057d3bf0906120825h87a9ab6y211a1771e845f67@mail.gmail.com> <20090613044703.7200F1CBC4B9@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4057d3bf0906140822i588e798bvea0f4e13d8c34b67@mail.gmail.com> Thanks! This worked like a charm. Now all the tests pass. Diana On Sat, Jun 13, 2009 at 12:47 AM, Naohisa GOTO wrote: > Hi Diana, > > This is because your original Sequence class definition in line 269 > in lib/bio/db/phyloxml.rb violates BioRuby's Bio::Sequence. > > The PhyloXML Sequence class (and Events, Date, Id, Uri, etc) > should be defined inside the Bio::PhyloXML namespace. > > For example, > > module Bio > class PhyloXML > class Sequence > #... > end > class Events > #... > end > class Date > #... > end > end > end > > In this case, Bio::PhyloXML::Sequence is different from > Bio::Sequence. > > Be careful that the name Date is already used by Ruby's > standard bundled library (require 'date'), althogh you can > distinguish it by using ::Date and Bio::PhyloXML::Date. > > I also recommend that PhyloXMLTree and PhyloXMLNode are > located inside the Bio::PhyloXML namespace (this means > Bio::PhyloXML::PhyloXMLTree and Bio::PhyloXML::PhyloXMLNode. > If possible, to rename to Bio::PhyloXML::Tree and > Bio::PhyloXML::Node may be a good choice.) > > Naohisa Goto > ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > On Fri, 12 Jun 2009 11:25:40 -0400 > Diana Jaunzeikare wrote: > > > Hi all, > > > > I am working on implementing phyloxml support and I was running only my > unit > > tests to test my code. Then yesterday I ran all of the unit tests and it > > gave me errors (when I first cloned it did not gave me any errors). I > don't > > think i changed anything in any other file than lib/bio/db/phyloxml.rb > and > > test/unit/bio/db/test_phyloxml.rb > > > > Here is the output of test/runner.rb. Looks all of the errors are of the > > same kind. > > > > diana at diana-ubuntu:~/bioruby$ ruby test/runner.rb > > Loaded suite . > > Started > > > .........................................................................................EE............................................... > (snip) > > Finished in 176.329241 seconds. > > > > 1) Error: > > test_output_embl(Bio::FuncTestSequenceOutputEMBL): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:263:in `initialize' > > ./lib/bio/sequence.rb:263:in `new' > > ./lib/bio/sequence.rb:263:in `auto' > > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > > > 2) Error: > > test_output_fasta(Bio::FuncTestSequenceOutputEMBL): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:263:in `initialize' > > ./lib/bio/sequence.rb:263:in `new' > > ./lib/bio/sequence.rb:263:in `auto' > > ./test/functional/bio/sequence/test_output_embl.rb:21:in `setup' > > > > 3) Error: > > test_alignment(Bio::TestAlignmentMultiFastaFormat): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:89:in `each' > > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > > ./test/unit/bio/appl/mafft/test_report.rb:47:in `test_alignment' > > > > 4) Error: > > test_determine_seq_method(Bio::TestAlignmentMultiFastaFormat): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/appl/mafft/report.rb:90:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:89:in `each' > > ./lib/bio/appl/mafft/report.rb:89:in `determine_seq_method' > > ./lib/bio/appl/mafft/report.rb:61:in `alignment' > > ./test/unit/bio/appl/mafft/test_report.rb:57:in > > `test_determine_seq_method' > > > > 5) Error: > > test_const_version(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > > > 6) Error: > > test_gff_version(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > > > 7) Error: > > test_records(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > > > 8) Error: > > test_sequence_regions(Bio::TestGFF3): > > ArgumentError: wrong number of arguments (1 for 0) > > ./lib/bio/sequence.rb:443:in `initialize' > > ./lib/bio/sequence.rb:443:in `new' > > ./lib/bio/sequence.rb:443:in `adapter' > > ./lib/bio/db/fasta.rb:221:in `to_seq' > > ./lib/bio/db/gff.rb:954:in `parse_fasta' > > ./lib/bio/db/gff.rb:949:in `each_line' > > ./lib/bio/db/gff.rb:949:in `parse_fasta' > > ./lib/bio/db/gff.rb:941:in `parse' > > ./lib/bio/db/gff.rb:881:in `initialize' > > ./test/unit/bio/db/test_gff.rb:644:in `new' > > ./test/unit/bio/db/test_gff.rb:644:in `setup' > > [....] > > 2175 tests, 5180 assertions, 0 failures, 54 errors > > > > > > Any ideas? Code is available here > > http://github.com/latvianlinuxgirl/bioruby/tree/dev > > > > Diana > > > > > From rozziite at gmail.com Mon Jun 15 14:37:27 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Mon, 15 Jun 2009 10:37:27 -0400 Subject: [BioRuby] Bioruby PhyloXML: method for iterating to the next tree, without returning anything Message-ID: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> Hi all, Now I have a method next_tree which parses the phylogeny element and all its sub elements and returns a tree. I propose to have a method for iterating to the next tree (maybe call it skip_tree), without actually parsing it, but advancing the libxml reader to the next phylogeny element. The reason I think this might be useful is because, for example, in my unit tests I work with one tree at a time. To get to specific tree I call next_tree several times, but don't use any of the returned data. Maybe other people also will need such functionality. Having a method skip_tree would make the process faster since it would not actually parse the elements and would not create objects. What do you think? Diana From czmasek at burnham.org Mon Jun 15 16:57:01 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 15 Jun 2009 09:57:01 -0700 Subject: [BioRuby] Bioruby PhyloXML: method for iterating to the next tree, without returning anything In-Reply-To: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> References: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> Message-ID: <4A367D5D.3010102@burnham.org> Hi, Diana: I think this is a good idea. Although, it will only be useful if the order of the trees is known beforehand (i.e. you know you want the 5th tree). Something to think about (if you have enough time): Since trees in phyloxml can have names and/or ids -- what about having the parser return trees with a matching name/id? E.g. from a file with 100 trees return those named "erk gene tree". Great work! Christian Diana Jaunzeikare wrote: > Hi all, > > Now I have a method next_tree which parses the phylogeny element and > all its sub elements and returns a tree. I propose to have a method > for iterating to the next tree (maybe call it skip_tree), without > actually parsing it, but advancing the libxml reader to the next > phylogeny element. > > The reason I think this might be useful is because, for example, in my > unit tests I work with one tree at a time. To get to specific tree I > call next_tree several times, but don't use any of the returned data. > Maybe other people also will need such functionality. Having a method > skip_tree would make the process faster since it would not actually > parse the elements and would not create objects. > > What do you think? > > Diana From kpatil at science.uva.nl Tue Jun 16 12:34:03 2009 From: kpatil at science.uva.nl (K. Patil) Date: Tue, 16 Jun 2009 14:34:03 +0200 (CEST) Subject: [BioRuby] CHange in Bio::Tree bioruby@lists.open-bio.org Message-ID: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> Hi, In the bioruby at lists.open-bio.org method of Bio::Tree the root node is also removed if it has 2 edges, it will be useful to have an argument deciding if the root should be remove or not. cheers, Kaustubh From kpatil at science.uva.nl Tue Jun 16 13:47:45 2009 From: kpatil at science.uva.nl (K. Patil) Date: Tue, 16 Jun 2009 15:47:45 +0200 (CEST) Subject: [BioRuby] CHange in Bio::Tree bioruby@lists.open-bio.org In-Reply-To: References: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> Message-ID: <64690.139.19.75.1.1245160065.squirrel@webmail.science.uva.nl> Oops sorry the name of the method should be "remove_nonsense_nodes" - kaustubh > Hi, > > Is that the correct method name? > > ben > > 2009/6/16 K. Patil > >> Hi, >> >> In the bioruby at lists.open-bio.org method of Bio::Tree the root node is >> also removed if it has 2 edges, it will be useful to have an argument >> deciding if the root should be remove or not. >> >> cheers, >> Kaustubh >> >> >> _______________________________________________ >> BioRuby mailing list >> BioRuby at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioruby >> > > > > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same > place. > From ngoto at gen-info.osaka-u.ac.jp Tue Jun 16 14:55:27 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 16 Jun 2009 23:55:27 +0900 Subject: [BioRuby] CHange in Bio::Tree bioruby@lists.open-bio.org In-Reply-To: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> References: <53520.139.19.75.1.1245155643.squirrel@webmail.science.uva.nl> Message-ID: <20090616145529.5208E1CBC43A@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 16 Jun 2009 14:34:03 +0200 (CEST) "K. Patil" wrote: > Hi, > > In the bioruby at lists.open-bio.org method of Bio::Tree the root node is > also removed if it has 2 edges, it will be useful to have an argument > deciding if the root should be remove or not. > > cheers, > Kaustubh Sorry, I can't understand what you mean. Please show example data and script, and current and expected behavior. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From sgujja at broad.mit.edu Tue Jun 16 15:14:07 2009 From: sgujja at broad.mit.edu (Sharvari Gujja) Date: Tue, 16 Jun 2009 11:14:07 -0400 Subject: [BioRuby] Import Python modules in Ruby... Message-ID: <4A37B6BF.6030303@broad.mit.edu> Hi, I'd like to know if there is a way to import Python modules into Ruby. I downloaded the Ruby/Python library to embed the Python interpreter. However on running the script I get an error saying: *C:/Program Files/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require': no such file to load -- python (LoadError) from C:/Program Files/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' from Z:/ruby_progs/test2.rb:1* I extracted "python" library at C:/Program Files/ruby/lib/ruby/site_ruby/1.8/ . [My ruby version is 1.8.6] The script I am trying to run is : #!/usr/bin/env ruby require 'rubygems' require 'python' require 'python/naming' require 'python/xreadlines' distance = naming.DISTANCE_TOOL.distance("protein 1", "protein2") print distance Could someone please help. Thanks S From ngoto at gen-info.osaka-u.ac.jp Wed Jun 17 10:10:18 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Wed, 17 Jun 2009 19:10:18 +0900 Subject: [BioRuby] Import Python modules in Ruby... In-Reply-To: <4A37B6BF.6030303@broad.mit.edu> References: <4A37B6BF.6030303@broad.mit.edu> Message-ID: <20090617101020.3D3301CBC4FF@idnmail.gen-info.osaka-u.ac.jp> On Tue, 16 Jun 2009 11:14:07 -0400 Sharvari Gujja wrote: > Hi, > > I'd like to know if there is a way to import Python modules into Ruby. I > downloaded the Ruby/Python library to embed the Python interpreter. Where did you download the "Ruby/Python library" from? I found two similar libraries. http://www.goto.info.waseda.ac.jp/~fukusima/ruby/python-e.html but "Last modified: Mon Sep 11 02:30:10 JST 2000" indicates no support for current version of Ruby and Python. http://rubyforge.org/projects/rubypython/ It can be installed by using rubygems, but it seems there are no Windows binary for the gem and Visual C++ compiler may be needed. In addition, appropriate version of python must be installed. Because they are not specific to bioruby nor bioinformatics, if no response here, please ask questions or discuss about them in another mailing list, maybe in ruby-talk. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From chen_li3 at yahoo.com Wed Jun 17 14:32:01 2009 From: chen_li3 at yahoo.com (chen li) Date: Wed, 17 Jun 2009 07:32:01 -0700 (PDT) Subject: [BioRuby] help to understand the codes Message-ID: <551726.61194.qm@web36803.mail.mud.yahoo.com> Hi all, I read source codes in sirna.rb in Bioruby. It implements the codes based on the following 4 rules( I copy the ruels from the paper): These rules indicate that siRNAs which simultaneously satisfy all four of the following sequence conditions are capable of inducing highly effective gene silencing in mammalian cells: (i) A/U at the 5' end of the antisense strand; (ii) G/C at the 5' end of the sense strand; (iii) at least five A/U residues in the 5' terminal one-third of the antisense strand; and (iv) the absence of any GC stretch of more than 9 nt in length. And here are the codes: In sirna.rb # Ui-Tei's rule. def uitei?(target) return false unless /^.{2}[GC]/i =~ target #which rule is for this line ? return false unless /[AU].{2}$/i =~ target #which rule is for this line return false if /[GC]{9}/i =~ target # rule 4 #rule 3 one_third = target.size * 1 / 3 start_pos = @target_size - one_third - 1 remain_seq = target.subseq(start_pos, @target_size - 2) au_number = remain_seq.scan(/[AU]/i).size return false if au_number < 5 return true end from these codes I don't think I understand how rule 1 and rule 2 are implemented. I wonder if someone can explain them a little more. Thanks, Li From tomoakin at kenroku.kanazawa-u.ac.jp Wed Jun 17 23:55:12 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Thu, 18 Jun 2009 08:55:12 +0900 Subject: [BioRuby] help to understand the codes In-Reply-To: <551726.61194.qm@web36803.mail.mud.yahoo.com> References: <551726.61194.qm@web36803.mail.mud.yahoo.com> Message-ID: <967703C4-0B5C-4FA5-ADC8-A0BF427F152D@kenroku.kanazawa-u.ac.jp> Hi, Perhaps, an implicit assumption is used that the siRNA duplex has 2 nt overhang at the 3' ends and the "target" is written for one strand containing both: So, the sequence should be from the rule 1 and 2: SNNN...NNNNNNNNNNN NNNNNN...NNNNNNNNW (W: A or U, S: G or C) from the compliment rule this will be SNNN...NNNNNNNNWNN NNSNNN...NNNNNNNNW and if you write only the top strand (or the original mRNA sequence) NNSNNN...NNNNNNNNWNN thus > return false unless /^.{2}[GC]/i =~ target #which rule is > for this line ? is for rule 2 and > return false unless /[AU].{2}$/i =~ target #which rule is > for this line is for rule 1 -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan On 2009/06/17, at 23:32, chen li wrote: > > Hi all, > > I read source codes in sirna.rb in Bioruby. It implements the codes > based on the following 4 rules( I copy the ruels from the paper): > These rules indicate that siRNAs which > simultaneously satisfy all four of the following > sequence conditions are capable of inducing highly > effective gene silencing in mammalian cells: > > (i) A/U at the 5' end of the antisense strand; > (ii) G/C at the 5' end of the sense strand; > (iii) at least five A/U residues in the 5' terminal one-third of > the antisense > strand; > and (iv) the absence of any GC stretch of more than 9 nt in length. > > > And here are the codes: > In sirna.rb > # Ui-Tei's rule. > def uitei?(target) > return false unless /^.{2}[GC]/i =~ target #which rule is > for this line ? > return false unless /[AU].{2}$/i =~ target #which rule is > for this line > > return false if /[GC]{9}/i =~ target # rule 4 > > #rule 3 > one_third = target.size * 1 / 3 > start_pos = @target_size - one_third - 1 > remain_seq = target.subseq(start_pos, @target_size - 2) > au_number = remain_seq.scan(/[AU]/i).size > return false if au_number < 5 > > return true > end > > > from these codes I don't think I understand how rule 1 and rule 2 > are implemented. I wonder if someone can explain them a little more. > > > Thanks, > > Li > > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > From hlapp at gmx.net Wed Jun 17 22:26:49 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 17 Jun 2009 17:26:49 -0500 Subject: [BioRuby] [Wg-phyloinformatics] Bioruby PhyloXML: method for iterating to the next tree, without returning anything In-Reply-To: <4A367D5D.3010102@burnham.org> References: <4057d3bf0906150737s733f7f67hd62242a689328ecc@mail.gmail.com> <4A367D5D.3010102@burnham.org> Message-ID: <2FE711A4-CC95-4B9A-B9D5-CBA887902567@gmx.net> On Jun 15, 2009, at 11:57 AM, Christian M Zmasek wrote: > Although, it will only be useful if the order of the trees is known > beforehand (i.e. you know you want the 5th tree). Right - I recognize that this would be useful for your unit testing, but frankly I'm not sure what the "normal" use case for this function would be. > Something to think about (if you have enough time): Since trees in > phyloxml can have names and/or ids -- what about having the parser > return trees with a matching name/id? E.g. from a file with 100 trees > return those named "erk gene tree". I agree, being able to pass a filter function (e.g., one that accepts the unparsed XML and returns true or false?) would indeed be pretty useful. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From chen_li3 at yahoo.com Thu Jun 18 15:42:03 2009 From: chen_li3 at yahoo.com (chen li) Date: Thu, 18 Jun 2009 08:42:03 -0700 (PDT) Subject: [BioRuby] help to understand the codes Message-ID: <897032.22567.qm@web36806.mail.mud.yahoo.com> Hi Tomoaki, Thank you for the explanation. For rule 1: /[AU].{2}$/i =~ target Based on my understanding of regular expression, it will match the following nts: N---NA/UAA N---NA/UTT N---NA/UGG N---NA/UCC but will not match the following nts: N---NA/UAT N---NA/UTA which mean the last two nts are identical, is that right? The similar situation applies to rule 2: /^.{2}[GC]/i =~ target starting with two identical nts followed by G/C at the third position. If this is the case I wonder where the paper mentions that the last two nts are the same and the first two nts are identical. Do I miss something when I read the paper? Thanks, Li --- On Wed, 6/17/09, Tomoaki NISHIYAMA wrote: > From: Tomoaki NISHIYAMA > Subject: Re: [BioRuby] help to understand the codes > To: "chen li" > Cc: "Tomoaki NISHIYAMA" , bioruby at lists.open-bio.org > Date: Wednesday, June 17, 2009, 7:55 PM > Hi, > > Perhaps, an implicit assumption is used that the siRNA > duplex > has 2 nt overhang at the 3' ends and the "target" is > written for one strand containing both: > So, the sequence should be > from the rule 1 and 2: > ? SNNN...NNNNNNNNNNN > NNNNNN...NNNNNNNNW > > (W: A or U, S: G or C) > > from the compliment rule > this will be > ? SNNN...NNNNNNNNWNN > NNSNNN...NNNNNNNNW > > and if you write only the top strand (or the original mRNA > sequence) > NNSNNN...NNNNNNNNWNN > > thus > >? ? ???return false unless > /^.{2}[GC]/i =~ target? #which rule is for this line ? > is for rule 2 > and > >? ? ???return false unless > /[AU].{2}$/i =~ target???#which rule is for > this line > is for rule 1 > --Tomoaki NISHIYAMA > > Advanced Science Research Center, > Kanazawa University, > 13-1 Takara-machi, > Kanazawa, 920-0934, Japan > From tomoakin at kenroku.kanazawa-u.ac.jp Thu Jun 18 23:47:10 2009 From: tomoakin at kenroku.kanazawa-u.ac.jp (Tomoaki NISHIYAMA) Date: Fri, 19 Jun 2009 08:47:10 +0900 Subject: [BioRuby] help to understand the codes In-Reply-To: <897032.22567.qm@web36806.mail.mud.yahoo.com> References: <897032.22567.qm@web36806.mail.mud.yahoo.com> Message-ID: <8242DF74-07D3-4EF1-AE85-6E494DAE3CBB@kenroku.kanazawa-u.ac.jp> Hi, > but will not match the following nts: > N---NA/UAT > N---NA/UTA > > which mean the last two nts are identical, is that right? .{2} is equivalent to .. and should match any two characters, identical or different. -- Tomoaki NISHIYAMA Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa, 920-0934, Japan On 2009/06/19, at 0:42, chen li wrote: > > Hi Tomoaki, > > Thank you for the explanation. > > For rule 1: /[AU].{2}$/i =~ target > Based on my understanding of regular expression, it will match the > following nts: > N---NA/UAA > N---NA/UTT > N---NA/UGG > N---NA/UCC > but will not match the following nts: > N---NA/UAT > N---NA/UTA > > which mean the last two nts are identical, is that right? > The similar situation applies to rule 2: /^.{2}[GC]/i =~ target > starting with two identical nts followed by G/C at the third > position. > > If this is the case I wonder where the paper mentions that the last > two nts are the same and the first two nts are identical. Do I miss > something when I read the paper? > > > > Thanks, > > Li > > > > > > > > > --- On Wed, 6/17/09, Tomoaki NISHIYAMA u.ac.jp> wrote: > >> From: Tomoaki NISHIYAMA >> Subject: Re: [BioRuby] help to understand the codes >> To: "chen li" >> Cc: "Tomoaki NISHIYAMA" , >> bioruby at lists.open-bio.org >> Date: Wednesday, June 17, 2009, 7:55 PM >> Hi, >> >> Perhaps, an implicit assumption is used that the siRNA >> duplex >> has 2 nt overhang at the 3' ends and the "target" is >> written for one strand containing both: >> So, the sequence should be >> from the rule 1 and 2: >> SNNN...NNNNNNNNNNN >> NNNNNN...NNNNNNNNW >> >> (W: A or U, S: G or C) >> >> from the compliment rule >> this will be >> SNNN...NNNNNNNNWNN >> NNSNNN...NNNNNNNNW >> >> and if you write only the top strand (or the original mRNA >> sequence) >> NNSNNN...NNNNNNNNWNN >> >> thus >>> return false unless >> /^.{2}[GC]/i =~ target #which rule is for this line ? >> is for rule 2 >> and >>> return false unless >> /[AU].{2}$/i =~ target #which rule is for >> this line >> is for rule 1 >> --Tomoaki NISHIYAMA >> >> Advanced Science Research Center, >> Kanazawa University, >> 13-1 Takara-machi, >> Kanazawa, 920-0934, Japan >> > > > > From chen_li3 at yahoo.com Fri Jun 19 15:21:27 2009 From: chen_li3 at yahoo.com (chen li) Date: Fri, 19 Jun 2009 08:21:27 -0700 (PDT) Subject: [BioRuby] help to understand the codes Message-ID: <318484.35858.qm@web36801.mail.mud.yahoo.com> Hi Tomoaki, Thank you for the info. Now I think I understand much better about the codes: What the script does is to search a stretch nts of 23 bp and check if it fits the rules: the first two nts and the last two nts are actually the overhanged nts and the middle part is the core of the sirna. One more question: When I read method # uitei?(target) I see an instant variable called @target_size but it is defined in another method # design(rule='uitei'). Since Ruby reads codes from top to bottom, isn't' it better to define #design(rule='uitei') first then followed by # uitei?(target)? Or it is just personal preference? Li # Ui-Tei's rule. def uitei?(target) ...line code.... start_pos = @target_size - one_third - 1 return true end # rule can be one of 'uitei' (default) and 'reynolds'. def design(rule = 'uitei') @target_size = @antisense_size + 2 ....line code.... end From rozziite at gmail.com Mon Jun 22 16:27:11 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Mon, 22 Jun 2009 12:27:11 -0400 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Code Review Message-ID: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> Hi all, In the Google Summer of Code project I have reached a stage where most of the code has been written for PhyloXML parser and I would like to ask for code review. I would like to know answers to these questions: * What parts should have more documentation? * Are there any places where code could be made more rubyish? * Are the structure of unit tests fine, or there are some conventions which my code doesn't follow? * Is code readable? * Are there any conventions that I don't follow? (like lines should strictly fit into 80 columns)? Any comments would be appreciated. Code is available on github http://github.com/latvianlinuxgirl/bioruby/tree/dev in * lib/bio/db/phyloxml.rb* and *test/unit/bio/db/test_phyloxml.rb* files. Diana From czmasek at burnham.org Tue Jun 23 19:29:42 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Tue, 23 Jun 2009 12:29:42 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Code Review In-Reply-To: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> References: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> Message-ID: <4A412D26.30800@burnham.org> Hi, Diana: Diana Jaunzeikare wrote: > Hi all, > > In the Google Summer of Code project I have reached a stage where most > of the code has been written for PhyloXML parser and I would like to > ask for code review. > > I would like to know answers to these questions: > > * What parts should have more documentation? Node might benefit from a more detailed description of all its (sub-) elements. Also, you don't always use the "rdoc" format. Some documentation sare hard to read (such as the one for Sequence), simply because of the way the text is formatted. In general, I would point out the dependency on libxml2 more prominently. > > * Are there any places where code could be made more rubyish? Maybe core-BioRuby developers can give an answer for this one. Looks like Ruby to me, but I started off programing with C++ and Java -- so, I might be biased ;) > > * Are the structure of unit tests fine, or there are some conventions > which my code doesn't follow? I think it would be best to add more tests for "marginal"/error cases (for parsing). Listed in increasing severity: Are empty elements handled properly (e.g. )? What about new-lines, tabs, non-printable ascii characters in place where text is expected? Trailing and leading whitespaces? Does this get trimmed of? Valid XML documents violating phyloXML specs? Invalid XML? All these should be handled gracefully. > > * Is code readable? Yes. > * Are there any conventions that I don't follow? (like lines should > strictly fit into 80 columns)? > > Any comments would be appreciated. > > Code is available on github > http://github.com/latvianlinuxgirl/bioruby/tree/dev in > *lib/bio/db/phyloxml.rb* and *test/unit/bio/db/test_phyloxml.rb* files. > > > Diana > Furthermore, it might be a good time to start testing your parser/objects against really large files. This might help to uncover potential hidden problems. Obviously, you could not add such large files to BioRuby's test files. But it would still be nice to know how your parser and objects scale.... Also, I am not sure if it's such a great idea to have all your classes in the same file/directory (i.e. both parser _and_ data objects). Right now, if the libxml2 gem is not install the test for the whole of bioruby exits. Christian From ngoto at gen-info.osaka-u.ac.jp Sat Jun 27 08:43:12 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 27 Jun 2009 17:43:12 +0900 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Code Review In-Reply-To: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> References: <4057d3bf0906220927y7a4e8ee4y49ee40e63c2e007d@mail.gmail.com> Message-ID: <20090627084313.4B51D1CBC4ED@idnmail.gen-info.osaka-u.ac.jp> Hi, On Mon, 22 Jun 2009 12:27:11 -0400 Diana Jaunzeikare wrote: > Hi all, > > In the Google Summer of Code project I have reached a stage where most of > the code has been written for PhyloXML parser and I would like to ask for > code review. > > I would like to know answers to these questions: > > * What parts should have more documentation? For each attribute and methods, not only the return value's class but also description for the attribute will be needed, although it will be nearly the same as the phyloxml's description. > * Are there any places where code could be made more rubyish? Currently, no problem. > > * Are the structure of unit tests fine, or there are some conventions which > my code doesn't follow? It is good that the module TestPhyloXMLData is defined inside the module Bio namespace. > > * Is code readable? Yes. > > * Are there any conventions that I don't follow? (like lines should strictly > fit into 80 columns)? There are no strict conventions, especially for tests which may depend on test data variety. > > Any comments would be appreciated. > > Code is available on github > http://github.com/latvianlinuxgirl/bioruby/tree/dev in * > lib/bio/db/phyloxml.rb* and *test/unit/bio/db/test_phyloxml.rb* files. > > > Diana > In my environment, (Debian lenny i386, Ruby 1.8.7-p160, libxml-ruby 1.1.3) % ruby -r rubygems test/unit/bio/db/test_phyloxml.rb Loaded suite test/unit/bio/db/test_phyloxml Started ............................ Finished in 1.375441 seconds. 28 tests, 91 assertions, 0 failures, 0 errors -- Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org From rozziite at gmail.com Sun Jun 28 19:55:44 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Sun, 28 Jun 2009 15:55:44 -0400 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Profiling Message-ID: <4057d3bf0906281255t86fe2a7m4eaa9e047efc2e10@mail.gmail.com> Hi all, I did some profiling of the code. My system is Ubuntu 9.04, ruby 1.8.7 [i486-linux], Intel Core 2 Duo P8600 @2.4GHz I created test_phyloxml_big.rb test file. It has test_next_tree method which calls next_tree on the phyloxml file until end of file is reached. Here follow results on the ncbi_taxonomy_mollusca.xml file which is 1.5MB large with 5632 external nodes. It takes around 7.5min to finish test_phyloxml_big.rb test. (Finished in 443.231507 seconds. Finished in 457.255576 seconds. ) output of the top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21222 diana 20 0 29020 25m 1928 R 95 0.9 5:32.92 ruby So it looks like memory footprint is small ~ 25MBs. CPU usage is 95% (i have two processors, so it is completely using one of them). I did the same thing for tol_life_xml but it took forever to finish. (more than 3 hours) For curiosity I created a method next_tree_dummy. All it does is to reader.read from file until it reaches element. tree of life xml (file size: 45.1MB) - Finished in 3.177743 seconds. mollusca xml (1.5MB) - Finished in 0.252993 seconds. metazoa xml (32.3MB) - Finished in 3.393467 seconds. I think this shows that libxml is really fast. I also did profiling with ruby-prof on ncbi mollusca taxonomy file. Here is partial output: diana at diana-ubuntu:~/bioruby$ ruby-prof -p graph test/unit/bio/db/test_phyloxml_big.rb Loaded suite /usr/bin/ruby-prof Started . Finished in 1345.4039 seconds. 1 tests, 0 assertions, 0 failures, 0 errors Thread ID: 3084157360 Total Time: 1257.6 [..] ------------------------------------------------------------------------------- 1257.56 0.60 0.00 1256.96 2/2 Bio::TestPhyloXMLBig#test_next_tree 100.00% 0.05% 1257.56 0.60 0.00 1256.96 2 Bio::PhyloXML#next_tree 0.08 0.03 0.00 0.05 8107/16210 Bio::PhyloXML#parse_attributes 0.00 0.00 0.00 0.00 1/243188 String#== 0.00 0.00 0.00 0.00 8104/24322 LibXML::XML::Reader#[] 6.18 0.97 0.00 5.21 48616/48616 Bio::PhyloXML#parse_clade_elements 0.14 0.14 0.00 0.00 48623/97244 LibXML::XML::Reader#read 0.27 0.17 0.00 0.10 48644/875134 Bio::PhyloXML#is_element? 0.11 0.04 0.00 0.07 16206/32442 Class#new 0.04 0.04 0.00 0.00 16206/116034 Kernel#== 0.00 0.00 0.00 0.00 4/72929 Bio::PhyloXML#parse_simple_elements 0.07 0.01 0.00 0.06 8102/8102 Bio::Tree#add_node 0.44 0.31 0.00 0.13 97243/137758 Bio::PhyloXML#is_end_element? 1249.05 0.02 0.00 1249.03 8102/8102 Bio::Tree#parent 0.58 0.07 0.00 0.51 8102/8102 Bio::Tree#add_edge ----------------------------------------------------------------------------- 1249.05 0.02 0.00 1249.03 8102/8102 Bio::PhyloXML#next_tree 99.32% 0.00% 1249.05 0.02 0.00 1249.03 8102 Bio::Tree#parent 1249.03 0.13 0.00 1248.90 8102/8102 Bio::Tree#path 0.00 0.00 0.00 0.00 8102/72975 Array#[] -------------------------------------------------------------------------------- 1249.03 0.13 0.00 1248.90 8102/8102 Bio::Tree#parent 99.32% 0.01% 1249.03 0.13 0.00 1248.90 8102 Bio::Tree#path 0.04 0.01 0.00 0.03 16204/164638052 Hash#[] 1248.82 0.27 0.00 1248.55 8102/8102 Bio::Pathway#bfs_shortest_path 0.03 0.03 0.00 0.00 24306/116034 Kernel#== 0.01 0.01 0.00 0.00 16204/72975 Array#[] -------------------------------------------------------------------------------- 1248.82 0.27 0.00 1248.55 8102/8102 Bio::Tree#path 99.30% 0.02% 1248.82 0.27 0.00 1248.55 8102 Bio::Pathway#bfs_shortest_path 0.26 0.19 0.00 0.07 142736/164638052 Hash#[] 0.07 0.07 0.00 0.00 75419/116034 Kernel#== 1248.18 115.50 0.00 1132.68 8102/8102 Bio::Pathway#breadth_first_search 0.04 0.04 0.00 0.00 67317/67330 Array#unshift -------------------------------------------------------------------------------- 1248.18 115.50 0.00 1132.68 8102/8102 Bio::Pathway#bfs_shortest_path 99.25% 9.18% 1248.18 115.50 0.00 1132.68 8102 Bio::Pathway#breadth_first_search 136.52 92.65 0.00 43.8765785140/164638052 Hash#[] 22.53 22.53 0.00 0.0032900672/32900681 Array#shift 973.59 324.56 0.00 649.0332892570/32892570 Hash#each_key 0.04 0.03 0.00 0.01 24306/98702064 Hash#[]= [..] 99.32% of the total time is spent in Bio::Tree#parent method and the methods it calls. Bio::Tree#parent calls Bio::Tree#path which calls Bio::Pathways#bfs_shortest_path which in turn calls Bio::Pathway#breadth_first_search (99.25% of total time is spent in this method and its sub calls). This was a huge surprise for me. Why would breadth first search be needed if I just want to know the parent node of the current node. The reason I am using Bio::Tree#parent is because I have to keep track of the current node I am parsing. When I have reached element i set the current_node to the parent of the node I just parsed. I see here two options. 1) Keep track of the current node myself (by putting references in an array and pushing and poping accordingly). Thus I won't have to call the Bio::Tree#parent method. 2) Update Bio::Tree/ Bio::Node class so that nodes contain references to their parents. (thus not needing to call breadth first search). What do you think? Diana From czmasek at burnham.org Mon Jun 29 21:26:23 2009 From: czmasek at burnham.org (Christian M Zmasek) Date: Mon, 29 Jun 2009 14:26:23 -0700 Subject: [BioRuby] GSOC: phyloXML for BioRuby: Profiling In-Reply-To: <4057d3bf0906281255t86fe2a7m4eaa9e047efc2e10@mail.gmail.com> References: <4057d3bf0906281255t86fe2a7m4eaa9e047efc2e10@mail.gmail.com> Message-ID: <4A49317F.3030708@burnham.org> Hi, Diana: Great analysis! > > The reason I am using Bio::Tree#parent is because I have to keep track > of the current node I am parsing. When I have reached element > i set the current_node to the parent of the node I just parsed. > > I see here two options. > > 1) Keep track of the current node myself (by putting references in an > array and pushing and poping accordingly). Thus I won't have to call > the Bio::Tree#parent method. As a temporary solution, you could try this. > > 2) Update Bio::Tree/ Bio::Node class so that nodes contain references > to their parents. (thus not needing to call breadth first search). This seems a better (long term) solution, but _might_ be out of scope for this summer project. Christian From donttrustben at gmail.com Tue Jun 30 00:12:50 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 30 Jun 2009 10:12:50 +1000 Subject: [BioRuby] Bio::NCBI:REST:EFetch Message-ID: Hi, I was just googling how to download a genbank sequence using bioruby, and somehow got pointed to this example code: # == Usage # # Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") But this doesn't seem to work in irb: $ gem list bio *** LOCAL GEMS *** bio (1.3.0) $ irb -rubygems irb(main):001:0> require 'bio' => true irb(main):002:0> Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") NameError: uninitialized constant Bio::NCBI::REST::EFetch from (irb):2 Then I noticed by looking at the code I could just do Bio::NCBI::REST::efetch("EF489424", {:rettype => 'fasta', :db => 'sequences'}) So it seems there is some redundancy. What is going on? Should there be a pointer to Bio::NCBI::REST::efetch from Bio::NCBI::REST::EFetch in the rdoc? That would have made me understand a lot quicker, and I wouldn't have had to look at the code to figure it out. Thanks, ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place. From rozziite at gmail.com Tue Jun 30 01:26:28 2009 From: rozziite at gmail.com (Diana Jaunzeikare) Date: Mon, 29 Jun 2009 21:26:28 -0400 Subject: [BioRuby] Bioruby PhyloXML update 6 Message-ID: <4057d3bf0906291826r7242237dh5c249fd762b7b6be@mail.gmail.com> Hi all, there is update of the last week: ? Asked for a code review. I got very good suggestions on what and how to improve things. Some of them did this week, some will come later. ? Documented requirement of libxml-ruby. ? Documented more PhyloXML::Node element. ? Wrote code so that phyloxml test suite exits if libxml-ruby library is not present. (This took me quite a long time to figure it out. Eventually i sent email to ruby-talk mailing list and got a great help.) ? Created a branch testbig. There created file test_phyloxml_big.rb wrote method parse_tree_dummy. ? Did code profiling. Discovered that ~99% of the time is spent in Bio::Tree#parent. Changed the code to keep track myself of the current node in an array. Speed increase was tremendous. When parsing mollusca xml (1.5MB of data) it went down from 443 to 2 seconds. When parsing tree of life xml (45MB of data) it took 34 seconds instead of more than 3 hours. Plan for next week: ? Continue working on documentation ? write usage cases like phyloxml.each do |tree| end ; Calculate total branch lengths? (Any other uses? ) Look at Perl Phyloxml implementation and port those usage cases in Bioruby. ? Adding tests for marginal cases. (decide what to do with invalid xml files). ? Will do some more code profiling (its fun :) ) But it looks like we are in pretty good shape. ? Change organization of classes a bit. Split code in several files. Have a module PhyloXML. Have a class PhyloXMLParser (in phyloxml_parser.rb) in it. Have all the phyloxml element classes defined in phyloxml_elements.rb file (under PhyloXML module). And then later will have PhyloXMLWriter class. ? Other tweaks to prepare for PhyloXML parser deliverable. Diana From ngoto at gen-info.osaka-u.ac.jp Tue Jun 30 12:19:25 2009 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Tue, 30 Jun 2009 21:19:25 +0900 Subject: [BioRuby] Bio::NCBI:REST:EFetch In-Reply-To: References: Message-ID: <20090630121925.E048F1CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 30 Jun 2009 10:12:50 +1000 Ben Woodcroft wrote: > Hi, > > I was just googling how to download a genbank sequence using bioruby, and > somehow got pointed to this example code: > > # == Usage > # > # Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") > > But this doesn't seem to work in irb: > > $ gem list bio > > *** LOCAL GEMS *** > > bio (1.3.0) > $ irb -rubygems > irb(main):001:0> require 'bio' > => true > irb(main):002:0> > Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") > NameError: uninitialized constant Bio::NCBI::REST::EFetch > from (irb):2 In my machine, it works correctly. $ ruby --version ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-linux] lng[ngoto<3>]$ gem --version 1.3.4 $ gem install bio Successfully installed bio-1.3.0 1 gem installed Installing ri documentation for bio-1.3.0... Installing RDoc documentation for bio-1.3.0... $ irb -r rubygems irb(main):001:0> require 'bio' => true irb(main):002:0> Bio::BIORUBY_VERSION => [1, 3, 0] irb(main):003:0> Bio::BIORUBY_VERSION_ID => "1.3.0" irb(main):004:0> Bio::NCBI::REST::EFetch.sequence("123,U12345,U12345.1,gb|U12345|") => "LOCUS X63139 854 bp DNA linear MAM 17-DEC-1991\nDEFINITION B.taurus beta-lactoglobulin gene 5'-region and partial exon 1.\ (snip) The NameError may be caused by old version of BioRuby which may exist somewhere in the $LOAD_PATH. Please check the following version identifiers of BioRuby. p Bio::BIORUBY_VERSION p Bio::BIORUBY_VERSION_ID p Bio::BIORUBY_EXTRA_VERSION > Then I noticed by looking at the code I could just do > > Bio::NCBI::REST::efetch("EF489424", {:rettype => 'fasta', :db => > 'sequences'}) > This also works. > So it seems there is some redundancy. What is going on? Should there be a > pointer to Bio::NCBI::REST::efetch from Bio::NCBI::REST::EFetch in the rdoc? > That would have made me understand a lot quicker, and I wouldn't have had to > look at the code to figure it out. Both should work, and I think redundancy is not severe problem. Methods about EFetch is defined in Bio::NCBI::REST::EFetch::Methods and documents for the methods are also available. http://bioruby.org/rdoc/classes/Bio/NCBI/REST/EFetch/Methods.html But, the hierarchy of the documentation may be difficult to know for most users. Contributions and suggestions for documentation are welcome. Naohisa Goto ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org > > Thanks, > ben > -- > FYI: My email addresses at unimelb, uq and gmail all redirect to the same > place. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From donttrustben at gmail.com Tue Jun 30 12:40:57 2009 From: donttrustben at gmail.com (Ben Woodcroft) Date: Tue, 30 Jun 2009 22:40:57 +1000 Subject: [BioRuby] Fwd: Bio::NCBI:REST:EFetch In-Reply-To: References: <20090630121925.E048F1CBC3DA@idnmail.gen-info.osaka-u.ac.jp> Message-ID: oops - forgot to post back to the mailing list. ---------- Forwarded message ---------- From: Ben Woodcroft Date: 2009/6/30 Subject: Re: [BioRuby] Bio::NCBI:REST:EFetch To: Naohisa GOTO Hi, > The NameError may be caused by old version of BioRuby which > may exist somewhere in the $LOAD_PATH. You are smart and I am stupid. Both should work, and I think redundancy is not severe problem. > Methods about EFetch is defined in Bio::NCBI::REST::EFetch::Methods > and documents for the methods are also available. > > http://bioruby.org/rdoc/classes/Bio/NCBI/REST/EFetch/Methods.html > > But, the hierarchy of the documentation may be difficult to know > for most users. Contributions and suggestions for documentation > are welcome. It is a bit misleading that they are redundant, but if they both work, then I don't mind so much. One suggestion I do have is that the returned objects shouldn't just be strings, but should automatically be parsed. It seems redundant to call Bio::FastaFormat.new(Bio::NCBI::REST::efetch("EF489424", {:rettype => 'fasta', :db =>'sequences'})[0]) But it isn't too much of a big deal. In the end I've got my pipeline up and bioruby is automating the things I want it to, so thanks! ben -- FYI: My email addresses at unimelb, uq and gmail all redirect to the same place.