From email2ants at gmail.com Wed May 2 12:38:31 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Wed, 2 May 2007 17:38:31 +0100 Subject: [BioRuby] EMBL parsing Message-ID: <9EF29B4E-EDFC-4106-A736-52195314F374@gmail.com> Dear all, I am having a problem when parsing EMBL genome files. embl = open(filename) entry = embl.gets(Bio::EMBL::DELIMITER) seq_obj = Bio::EMBL.new(entry) puts seq_obj.sequence_length This takes a long time (minutes) and reports a sequence_length of 0 When taking an equivalent genbank file and changing the code appropriately it parses the file in seconds and reports the correct length. I am new to bioruby having used bioperl until now. Please can anybody let me know if they have had similar problems and any possible solutions. Many thanks Anthony From n at bioruby.org Wed May 2 12:47:27 2007 From: n at bioruby.org (Mitsuteru Nakao) Date: Thu, 3 May 2007 01:47:27 +0900 Subject: [BioRuby] EMBL parsing In-Reply-To: <9EF29B4E-EDFC-4106-A736-52195314F374@gmail.com> References: <9EF29B4E-EDFC-4106-A736-52195314F374@gmail.com> Message-ID: <90ca35f70705020947g66d6edf9pc552d79935973e75@mail.gmail.com> Hi Anthony, Thank you for your bug report. Please let me know the accession numbers of EMBL genome files you parsed. Thanks in advance. Mitsuteru the Bio::EMBL maintainer. > I am having a problem when parsing EMBL genome files. > > embl = open(filename) > entry = embl.gets(Bio::EMBL::DELIMITER) > seq_obj = Bio::EMBL.new(entry) > puts seq_obj.sequence_length > > > This takes a long time (minutes) and reports a sequence_length of 0 From email2ants at gmail.com Thu May 3 07:48:03 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Thu, 3 May 2007 12:48:03 +0100 Subject: [BioRuby] EMBL parsing Message-ID: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> Hi Mitsiteru, Any of the embl files downloaded from the ebi site have this problem. for example http://www.ebi.ac.uk/cgi-bin/dbfetch? db=embl&style=raw&id=CP000360 Ruby takes all of the cpu power :( But with the equivalent file from NCBI in genbank format there is no problem. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? cmd=Retrieve&db=Nucleotide&list_uids=94549081&dopt=GenBank Many thanks for having a look at this Anthony From jdudley at stanford.edu Thu May 3 09:51:52 2007 From: jdudley at stanford.edu (Joel Dudley) Date: Thu, 3 May 2007 06:51:52 -0700 Subject: [BioRuby] Ruby in Science Message-ID: I'm giving a talk tonight at the SDForum Ruby SIG at Google Headquarters on the topic of Ruby and Ruby on Rails in scientific computing. For part of the talk I am going to highlight companies, research groups, products, and other resources that are deployed or currently under development using Ruby and/or Ruby on Rails. If you've got a scientific application or resource that fits this and would like me to give it mention to please send me the details off- list. I can't guarantee that I'll mention it (time for the talk is limited), but I'll do my best. Thanks, Joel Dudley Stanford Medical Informatics From s-merchant at northwestern.edu Fri May 4 17:52:11 2007 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Fri, 4 May 2007 16:52:11 -0500 Subject: [BioRuby] BOF at Ruby on Rails conference Message-ID: <000601c78e96$77f5bb20$c2987ca5@pc13> Hello Everyone, I am organizing a Birds of a Feather (BoF) session at the upcoming Rails conference in Portland, Oregon. Here are the details: Venue: Rails conf 2007, Oregon Convention Center (OCC) Date: Saturday, May 19 Time: 7:30-8:30pm Room: c125. I hope to meet some of you guys there. Cheers, Sohel Merchant. From ngoto at gen-info.osaka-u.ac.jp Sat May 5 02:57:28 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 5 May 2007 15:57:28 +0900 Subject: [BioRuby] EMBL parsing In-Reply-To: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> References: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> Message-ID: <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> Hi, On Thu, 3 May 2007 12:48:03 +0100 Anthony Underwood wrote: > Hi Mitsiteru, > > Any of the embl files downloaded from the ebi site have this problem. > > for example http://www.ebi.ac.uk/cgi-bin/dbfetch? > db=embl&style=raw&id=CP000360 > > Ruby takes all of the cpu power :( It seems it is caused by thousands of iterations of str1 += str2 because it creates a new string object every time. A patch is attached. (Ruby 1.8.0 or newer version required) --- lib/bio/db.rb 5 Apr 2007 23:35:39 -0000 0.37 +++ lib/bio/db.rb 5 May 2007 06:08:39 -0000 @@ -313,12 +313,12 @@ # Returns the contents of the entry as a Hash. def entry2hash(entry) - hash = Hash.new('') + hash = Hash.new { |h, k| h[k] = '' } entry.each_line do |line| tag = tag_get(line) next if tag == 'XX' tag = 'R' if tag =~ /^R./ # Reference lines - hash[tag] += line + hash[tag].concat line end return hash end Naohisa Goto ng at bioruby.org From email2ants at gmail.com Tue May 8 07:53:37 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 8 May 2007 12:53:37 +0100 Subject: [BioRuby] EMBL parsing In-Reply-To: <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> References: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4045DF0D-6F4C-47DF-A4D7-1B31D61C3A7D@gmail.com> Hi Naohisa, Thanks for the patch. This certainly appears to solve the problem of slow embl entry reading. However the sequence length is still reported as 0. I found this was due to the idline not being interpreted correctly on line 97 tmp['SEQUENCE_LENGTH'] = idline[3].strip.split(' ').first.to_i was changed to tmp['SEQUENCE_LENGTH'] = idline.last.strip.split(' ').first.to_i This was OK for my purposes, but I think the whole idline interpretation needs to be looked at see (http://www.ebi.ac.uk/embl/ Documentation/User_manual/usrman.html#3_4_1). I could have a look at this if appropriate. Thanks Anthony On 5 May 2007, at 07:57, Naohisa GOTO wrote: > Hi, > > On Thu, 3 May 2007 12:48:03 +0100 > Anthony Underwood wrote: > >> Hi Mitsiteru, >> >> Any of the embl files downloaded from the ebi site have this problem. >> >> for example http://www.ebi.ac.uk/cgi-bin/dbfetch? >> db=embl&style=raw&id=CP000360 >> >> Ruby takes all of the cpu power :( > > It seems it is caused by thousands of iterations of str1 += str2 > because it creates a new string object every time. > A patch is attached. (Ruby 1.8.0 or newer version required) > > --- lib/bio/db.rb 5 Apr 2007 23:35:39 -0000 0.37 > +++ lib/bio/db.rb 5 May 2007 06:08:39 -0000 > @@ -313,12 +313,12 @@ > > # Returns the contents of the entry as a Hash. > def entry2hash(entry) > - hash = Hash.new('') > + hash = Hash.new { |h, k| h[k] = '' } > entry.each_line do |line| > tag = tag_get(line) > next if tag == 'XX' > tag = 'R' if tag =~ /^R./ # Reference lines > - hash[tag] += line > + hash[tag].concat line > end > return hash > end > > > Naohisa Goto > ng at bioruby.org > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From n at bioruby.org Tue May 8 13:09:47 2007 From: n at bioruby.org (Mitsuteru Nakao) Date: Wed, 9 May 2007 02:09:47 +0900 Subject: [BioRuby] EMBL parsing In-Reply-To: <4045DF0D-6F4C-47DF-A4D7-1B31D61C3A7D@gmail.com> References: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> <4045DF0D-6F4C-47DF-A4D7-1B31D61C3A7D@gmail.com> Message-ID: <90ca35f70705081009r7719c562u36a902a84f88566c@mail.gmail.com> Hi Anthony, Thank you for your suggestions. You can use the bioruby CVS HEAD which contains new idline parser for EMBL rel89. > This was OK for my purposes, but I think the whole idline > interpretation needs to be looked at see (http://www.ebi.ac.uk/embl/ > Documentation/User_manual/usrman.html#3_4_1). I could have a look at > this if appropriate. Thanks Mitsuteru - Mitsuteur Nakao n at bioruby.org From jan.aerts at bbsrc.ac.uk Wed May 9 09:14:52 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 9 May 2007 14:14:52 +0100 Subject: [BioRuby] Ensembl API Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F3@rie2ksrv1.ri.bbsrc.ac.uk> Has anyone worked on an Ensembl API? There is a perl API and the database schema is well-documented. On first impression, it seems straightforward to make one using ActiveRecord, but I wouldn't want to waste efforts on that if someone else is already working on it. See http://www.ensembl.org/info/software/core/index.html Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 skype: aerts_ri ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From jan.aerts at bbsrc.ac.uk Wed May 9 09:49:50 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 9 May 2007 14:49:50 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <1E75E5B2-D515-4EE4-9B2D-2E2D034E1EA1@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F6@rie2ksrv1.ri.bbsrc.ac.uk> A post (in Japanese) about this shows a good primer: http://itoshi.tv/d/?date=20060829 I'm testing this out on some tables of the Ensembl core and variation databases, and things look promising... As you say it might well be that we can't cover everything, but at least we can get quite far. jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 09 May 2007 14:38 To: jan aerts (RI) Subject: Re: [BioRuby] Ensembl API On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: > Has anyone worked on an Ensembl API? There is a perl API and the > database schema is well-documented. On first impression, it seems > straightforward to make one using ActiveRecord, but I wouldn't want to > waste efforts on that if someone else is already working on it. > > See http://www.ensembl.org/info/software/core/index.html > > Dr Jan Aerts > Bioinformatics Group Hi Jan, I am not sure if you can cover everything with a the default active- record behavior. But I would be a happy user of a ruby EnsEMBL API. If you need/want help with it, I would also volunteer. Michael From jan.aerts at bbsrc.ac.uk Wed May 9 10:10:24 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 9 May 2007 15:10:24 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <58E34B0F-64FF-4CEB-9E36-7AF0B7159D08@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F9@rie2ksrv1.ri.bbsrc.ac.uk> I've tried a connection similar to the ones used by the perl API and am able to connect directly to the core and other databases. Am currently playing around with the cow features, actually :-) jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 09 May 2007 14:57 To: jan aerts (RI) Cc: bioruby at lists.open-bio.org Subject: Re: [BioRuby] Ensembl API On 9 May 2007, at 14:49, jan aerts ((RI)) wrote: > A post (in Japanese) about this shows a good primer: > http://itoshi.tv/d/?date=20060829 my Japanese is not so good, but isn't that run on a BioMart and not an ensembl-core schema? > I'm testing this out on some tables of the Ensembl core and variation > databases, and things look promising... As you say it might well be > that we can't cover everything, but at least we can get quite far. > > jan. > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 09 May 2007 14:38 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >> Has anyone worked on an Ensembl API? There is a perl API and the >> database schema is well-documented. On first impression, it seems >> straightforward to make one using ActiveRecord, but I wouldn't want >> to > >> waste efforts on that if someone else is already working on it. >> >> See http://www.ensembl.org/info/software/core/index.html >> >> Dr Jan Aerts >> Bioinformatics Group > > Hi Jan, > > I am not sure if you can cover everything with a the default active- > record behavior. > But I would be a happy user of a ruby EnsEMBL API. > If you need/want help with it, I would also volunteer. > > Michael From mh6 at sanger.ac.uk Wed May 9 09:57:18 2007 From: mh6 at sanger.ac.uk (Michael Han) Date: Wed, 9 May 2007 14:57:18 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F6@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F6@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <58E34B0F-64FF-4CEB-9E36-7AF0B7159D08@sanger.ac.uk> On 9 May 2007, at 14:49, jan aerts ((RI)) wrote: > A post (in Japanese) about this shows a good primer: > http://itoshi.tv/d/?date=20060829 my Japanese is not so good, but isn't that run on a BioMart and not an ensembl-core schema? > I'm testing this out on some tables of the Ensembl core and variation > databases, and things look promising... As you say it might well be > that > we can't cover everything, but at least we can get quite far. > > jan. > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 09 May 2007 14:38 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >> Has anyone worked on an Ensembl API? There is a perl API and the >> database schema is well-documented. On first impression, it seems >> straightforward to make one using ActiveRecord, but I wouldn't >> want to > >> waste efforts on that if someone else is already working on it. >> >> See http://www.ensembl.org/info/software/core/index.html >> >> Dr Jan Aerts >> Bioinformatics Group > > Hi Jan, > > I am not sure if you can cover everything with a the default active- > record behavior. > But I would be a happy user of a ruby EnsEMBL API. > If you need/want help with it, I would also volunteer. > > Michael From jan.aerts at bbsrc.ac.uk Thu May 10 03:58:02 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 08:58:02 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <1E75E5B2-D515-4EE4-9B2D-2E2D034E1EA1@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16FE@rie2ksrv1.ri.bbsrc.ac.uk> Michael, As you mention that we maybe won't be able to cover everything with the default activerecord behaviour: what problems are you thinking of? Note: I'd like to use the perl API as a guide. And indeed working out the Slice object was quite simple... jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 09 May 2007 14:38 To: jan aerts (RI) Subject: Re: [BioRuby] Ensembl API On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: > Has anyone worked on an Ensembl API? There is a perl API and the > database schema is well-documented. On first impression, it seems > straightforward to make one using ActiveRecord, but I wouldn't want to > waste efforts on that if someone else is already working on it. > > See http://www.ensembl.org/info/software/core/index.html > > Dr Jan Aerts > Bioinformatics Group Hi Jan, I am not sure if you can cover everything with a the default active- record behavior. But I would be a happy user of a ruby EnsEMBL API. If you need/want help with it, I would also volunteer. Michael From jan.aerts at bbsrc.ac.uk Thu May 10 05:09:50 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 10:09:50 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <834748BE-64D8-4642-AF70-5098589F91EB@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB1701@rie2ksrv1.ri.bbsrc.ac.uk> I actually just started working on this API last night (in between some deadlines I got to catch), so haven't gotten so far as to think about caching. I'm basically working through the perl API tutorial (http://www.ensembl.org/info/software/core/core_tutorial.html) and try to implement all those examples. (At the moment, I'm at the bit that says "Break chromosomal slices into smaller 100k component slices"...) Some hurdles that I see coming are the caching and projecting features from one coord system to another. We'll see what happens when we get there. As for a public place: I would _very_ much appreciate help with the API, although QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: * Do you think it would be best to create a sourceforge project for this, or should I add it directly into bioruby (e.g. Bio::Api::Ensembl)? I suppose the second option would be best, but the stuff I have is probably not polished enough yet... and *far* from complete. * Secondly: if a new release is coming: would it be best to wait untill _after_ that release? jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 10 May 2007 09:51 To: jan aerts (RI) Subject: Re: [BioRuby] Ensembl API On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: > Michael, > > As you mention that we maybe won't be able to cover everything with > the default activerecord behaviour: what problems are you thinking of? > > Note: I'd like to use the perl API as a guide. And indeed working out > the Slice object was quite simple... Yes, I was thinking of the slices in combination with the different assembly levels. A change to the mapping part of that broke the last EnsEMBL release. There were some cases where a seq_region maps one-to-many to other seq_regions (also with gaps). Did you put all the caching stuff from the Perl API into it? And if not, is the performance ok? It would be also great if you could put the code into some public place (RubyForge as example), then it would be easier to see what is already working/being worked at and what not. > jan. > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 09 May 2007 14:38 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >> Has anyone worked on an Ensembl API? There is a perl API and the >> database schema is well-documented. On first impression, it seems >> straightforward to make one using ActiveRecord, but I wouldn't want >> to > >> waste efforts on that if someone else is already working on it. >> >> See http://www.ensembl.org/info/software/core/index.html >> >> Dr Jan Aerts >> Bioinformatics Group > > Hi Jan, > > I am not sure if you can cover everything with a the default active- > record behavior. > But I would be a happy user of a ruby EnsEMBL API. > If you need/want help with it, I would also volunteer. > > Michael From jan.aerts at bbsrc.ac.uk Thu May 10 06:18:22 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 11:18:22 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB1703@rie2ksrv1.ri.bbsrc.ac.uk> OK. -----Original Message----- From: Toshiaki Katayama [mailto:ktym at hgc.jp] Sent: 10 May 2007 11:14 To: jan aerts (RI) Cc: Michael Han; bioruby at lists.open-bio.org Subject: Re: [BioRuby] Ensembl API Jan, In that case, I would like you to consider to use the rubyforge repository 'bioruby-annex' which Nakao-san had set up. http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html When your modules matured and we, core developers, have decided how to integrate Rails dependent modules in BioRuby, you can put them in the BioRuby distribution. Toshiaki On 2007/05/10, at 18:09, jan aerts (RI) wrote: > I actually just started working on this API last night (in between > some deadlines I got to catch), so haven't gotten so far as to think > about caching. I'm basically working through the perl API tutorial > (http://www.ensembl.org/info/software/core/core_tutorial.html) and try > to implement all those examples. (At the moment, I'm at the bit that > says "Break chromosomal slices into smaller 100k component slices"...) > > Some hurdles that I see coming are the caching and projecting features > from one coord system to another. We'll see what happens when we get > there. > > As for a public place: I would _very_ much appreciate help with the > API, although > > QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: > * Do you think it would be best to create a sourceforge project for > this, or should I add it directly into bioruby (e.g. Bio::Api::Ensembl)? > I suppose the second option would be best, but the stuff I have is > probably not polished enough yet... and *far* from complete. > * Secondly: if a new release is coming: would it be best to wait > untill _after_ that release? > > jan. > > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 10 May 2007 09:51 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > > On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >> Michael, >> >> As you mention that we maybe won't be able to cover everything with >> the default activerecord behaviour: what problems are you thinking of? >> >> Note: I'd like to use the perl API as a guide. And indeed working out >> the Slice object was quite simple... > > Yes, I was thinking of the slices in combination with the different > assembly levels. > A change to the mapping part of that broke the last EnsEMBL release. > There were some cases where a seq_region maps one-to-many to other > seq_regions (also with gaps). > Did you put all the caching stuff from the Perl API into it? And if > not, is the performance ok? > > It would be also great if you could put the code into some public > place (RubyForge as example), then it would be easier to see what is > already working/being worked at and what not. > >> jan. >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 09 May 2007 14:38 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>> Has anyone worked on an Ensembl API? There is a perl API and the >>> database schema is well-documented. On first impression, it seems >>> straightforward to make one using ActiveRecord, but I wouldn't want >>> to >> >>> waste efforts on that if someone else is already working on it. >>> >>> See http://www.ensembl.org/info/software/core/index.html >>> >>> Dr Jan Aerts >>> Bioinformatics Group >> >> Hi Jan, >> >> I am not sure if you can cover everything with a the default active- >> record behavior. >> But I would be a happy user of a ruby EnsEMBL API. >> If you need/want help with it, I would also volunteer. >> >> Michael > From ktym at hgc.jp Thu May 10 06:13:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 19:13:34 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB45104FB1701@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1701@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: Jan, In that case, I would like you to consider to use the rubyforge repository 'bioruby-annex' which Nakao-san had set up. http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html When your modules matured and we, core developers, have decided how to integrate Rails dependent modules in BioRuby, you can put them in the BioRuby distribution. Toshiaki On 2007/05/10, at 18:09, jan aerts (RI) wrote: > I actually just started working on this API last night (in between some > deadlines I got to catch), so haven't gotten so far as to think about > caching. I'm basically working through the perl API tutorial > (http://www.ensembl.org/info/software/core/core_tutorial.html) and try > to implement all those examples. (At the moment, I'm at the bit that > says "Break chromosomal slices into smaller 100k component slices"...) > > Some hurdles that I see coming are the caching and projecting features > from one coord system to another. We'll see what happens when we get > there. > > As for a public place: I would _very_ much appreciate help with the API, > although > > QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: > * Do you think it would be best to create a sourceforge project for > this, or should I add it directly into bioruby (e.g. Bio::Api::Ensembl)? > I suppose the second option would be best, but the stuff I have is > probably not polished enough yet... and *far* from complete. > * Secondly: if a new release is coming: would it be best to wait untill > _after_ that release? > > jan. > > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 10 May 2007 09:51 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > > On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >> Michael, >> >> As you mention that we maybe won't be able to cover everything with >> the default activerecord behaviour: what problems are you thinking of? >> >> Note: I'd like to use the perl API as a guide. And indeed working out >> the Slice object was quite simple... > > Yes, I was thinking of the slices in combination with the different > assembly levels. > A change to the mapping part of that broke the last EnsEMBL release. > There were some cases where a seq_region maps one-to-many to other > seq_regions (also with gaps). > Did you put all the caching stuff from the Perl API into it? And if not, > is the performance ok? > > It would be also great if you could put the code into some public place > (RubyForge as example), then it would be easier to see what is already > working/being worked at and what not. > >> jan. >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 09 May 2007 14:38 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>> Has anyone worked on an Ensembl API? There is a perl API and the >>> database schema is well-documented. On first impression, it seems >>> straightforward to make one using ActiveRecord, but I wouldn't want >>> to >> >>> waste efforts on that if someone else is already working on it. >>> >>> See http://www.ensembl.org/info/software/core/index.html >>> >>> Dr Jan Aerts >>> Bioinformatics Group >> >> Hi Jan, >> >> I am not sure if you can cover everything with a the default active- >> record behavior. >> But I would be a happy user of a ruby EnsEMBL API. >> If you need/want help with it, I would also volunteer. >> >> Michael > From ktym at hgc.jp Thu May 10 07:42:53 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 20:42:53 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: Jan, You are added as a annex developer. > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails When I discussed with Nakao-san previously, we have agreed to include any Rails (or ActiveRecord etc.) dependent bioinformatics projects in the bioruby-annex repository (maybe we need to refine The purpose is 1. The number of codes which depend on Rails related libraries is increasing, but we have not yet decided how to integrate such modules in BioRuby. 2. We thought correcting various add-ons to BioRuby in one place might be useful (so that users can find your bioinformatics modules easily, and/or some of them might be integrated into core BioRuby library in the future). > (apart from the fact that it uses ActiveRecord in the background). So, I thought your project is appropriate to be included. > This might get confusing for possible users. Yes, we may need to re-write our project description to clarify. Volunteers? As the bioruby-annex seems to be functioning now, we need to discuss some rules for how to develop, release, and use sub products in it. My idea is still vague, but how about to have codes in SVN as /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : and each sub projects will release the package as a gem (or tar.gz) file prefixed with 'bioruby-' or 'bioruby-annex-' so that, for instance, you could release bioruby-ensembl-api-1.0.gem for downlaod from http://rubyforge.org/projects/bioruby-annex/ Toshiaki On 2007/05/10, at 19:34, jan aerts (RI) wrote: > Toshiaki, > > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails > (apart from the fact that it uses ActiveRecord in the background). > This might get confusing for possible users. > > jan. > > -----Original Message----- > From: Toshiaki Katayama [mailto:ktym at hgc.jp] > Sent: 10 May 2007 11:14 > To: jan aerts (RI) > Cc: Michael Han; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] Ensembl API > > Jan, > > In that case, I would like you to consider to use the rubyforge > repository 'bioruby-annex' which Nakao-san had set up. > > http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html > > When your modules matured and we, core developers, have decided how to > integrate Rails dependent modules in BioRuby, you can put them in the > BioRuby distribution. > > Toshiaki > > On 2007/05/10, at 18:09, jan aerts (RI) wrote: > >> I actually just started working on this API last night (in between >> some deadlines I got to catch), so haven't gotten so far as to think >> about caching. I'm basically working through the perl API tutorial >> (http://www.ensembl.org/info/software/core/core_tutorial.html) and try > >> to implement all those examples. (At the moment, I'm at the bit that >> says "Break chromosomal slices into smaller 100k component slices"...) >> >> Some hurdles that I see coming are the caching and projecting features > >> from one coord system to another. We'll see what happens when we get >> there. >> >> As for a public place: I would _very_ much appreciate help with the >> API, although >> >> QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: >> * Do you think it would be best to create a sourceforge project for >> this, or should I add it directly into bioruby (e.g. > Bio::Api::Ensembl)? >> I suppose the second option would be best, but the stuff I have is >> probably not polished enough yet... and *far* from complete. >> * Secondly: if a new release is coming: would it be best to wait >> untill _after_ that release? >> >> jan. >> >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 10 May 2007 09:51 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> >> On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >>> Michael, >>> >>> As you mention that we maybe won't be able to cover everything with >>> the default activerecord behaviour: what problems are you thinking > of? >>> >>> Note: I'd like to use the perl API as a guide. And indeed working out > >>> the Slice object was quite simple... >> >> Yes, I was thinking of the slices in combination with the different >> assembly levels. >> A change to the mapping part of that broke the last EnsEMBL release. >> There were some cases where a seq_region maps one-to-many to other >> seq_regions (also with gaps). >> Did you put all the caching stuff from the Perl API into it? And if >> not, is the performance ok? >> >> It would be also great if you could put the code into some public >> place (RubyForge as example), then it would be easier to see what is >> already working/being worked at and what not. >> >>> jan. >>> >>> -----Original Message----- >>> From: Michael Han [mailto:mh6 at sanger.ac.uk] >>> Sent: 09 May 2007 14:38 >>> To: jan aerts (RI) >>> Subject: Re: [BioRuby] Ensembl API >>> >>> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>>> Has anyone worked on an Ensembl API? There is a perl API and the >>>> database schema is well-documented. On first impression, it seems >>>> straightforward to make one using ActiveRecord, but I wouldn't want >>>> to >>> >>>> waste efforts on that if someone else is already working on it. >>>> >>>> See http://www.ensembl.org/info/software/core/index.html >>>> >>>> Dr Jan Aerts >>>> Bioinformatics Group >>> >>> Hi Jan, >>> >>> I am not sure if you can cover everything with a the default active- >>> record behavior. >>> But I would be a happy user of a ruby EnsEMBL API. >>> If you need/want help with it, I would also volunteer. >>> >>> Michael >> > From ktym at hgc.jp Thu May 10 07:45:40 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 20:45:40 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <057D48B5-57EC-464E-ABB7-05B059569253@hgc.jp> On 2007/05/10, at 20:42, Toshiaki Katayama wrote: > When I discussed with Nakao-san previously, we have agreed to > include any Rails (or ActiveRecord etc.) dependent bioinformatics > projects in the bioruby-annex repository (maybe we need to > refine refine the project description). Toshiaki From ktym at hgc.jp Thu May 10 07:49:15 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 20:49:15 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <38E0CF6A-14CD-4905-947E-410D389F9CD9@hgc.jp> On 2007/05/10, at 20:42, Toshiaki Katayama wrote: > 2. We thought correcting various add-ons to BioRuby in one place s/correct/collect/ ... there might be many other mistakes as always :) Toshiaki From jan.aerts at bbsrc.ac.uk Thu May 10 09:29:01 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 14:29:01 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB170C@rie2ksrv1.ri.bbsrc.ac.uk> Removing the 'rails' from the project description would be a good idea. It would be great if this repository could also be the home for modules like Bio::Graphics (which is completely unrelated to rails or activerecord) jan. -----Original Message----- From: Toshiaki Katayama [mailto:ktym at hgc.jp] Sent: 10 May 2007 12:43 To: jan aerts (RI) Cc: BioRubyML Subject: Re: [BioRuby] Ensembl API Jan, You are added as a annex developer. > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails When I discussed with Nakao-san previously, we have agreed to include any Rails (or ActiveRecord etc.) dependent bioinformatics projects in the bioruby-annex repository (maybe we need to refine The purpose is 1. The number of codes which depend on Rails related libraries is increasing, but we have not yet decided how to integrate such modules in BioRuby. 2. We thought correcting various add-ons to BioRuby in one place might be useful (so that users can find your bioinformatics modules easily, and/or some of them might be integrated into core BioRuby library in the future). > (apart from the fact that it uses ActiveRecord in the background). So, I thought your project is appropriate to be included. > This might get confusing for possible users. Yes, we may need to re-write our project description to clarify. Volunteers? As the bioruby-annex seems to be functioning now, we need to discuss some rules for how to develop, release, and use sub products in it. My idea is still vague, but how about to have codes in SVN as /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : and each sub projects will release the package as a gem (or tar.gz) file prefixed with 'bioruby-' or 'bioruby-annex-' so that, for instance, you could release bioruby-ensembl-api-1.0.gem for downlaod from http://rubyforge.org/projects/bioruby-annex/ Toshiaki On 2007/05/10, at 19:34, jan aerts (RI) wrote: > Toshiaki, > > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails > (apart from the fact that it uses ActiveRecord in the background). > This might get confusing for possible users. > > jan. > > -----Original Message----- > From: Toshiaki Katayama [mailto:ktym at hgc.jp] > Sent: 10 May 2007 11:14 > To: jan aerts (RI) > Cc: Michael Han; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] Ensembl API > > Jan, > > In that case, I would like you to consider to use the rubyforge > repository 'bioruby-annex' which Nakao-san had set up. > > http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html > > When your modules matured and we, core developers, have decided how to > integrate Rails dependent modules in BioRuby, you can put them in the > BioRuby distribution. > > Toshiaki > > On 2007/05/10, at 18:09, jan aerts (RI) wrote: > >> I actually just started working on this API last night (in between >> some deadlines I got to catch), so haven't gotten so far as to think >> about caching. I'm basically working through the perl API tutorial >> (http://www.ensembl.org/info/software/core/core_tutorial.html) and >> try > >> to implement all those examples. (At the moment, I'm at the bit that >> says "Break chromosomal slices into smaller 100k component >> slices"...) >> >> Some hurdles that I see coming are the caching and projecting >> features > >> from one coord system to another. We'll see what happens when we get >> there. >> >> As for a public place: I would _very_ much appreciate help with the >> API, although >> >> QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: >> * Do you think it would be best to create a sourceforge project for >> this, or should I add it directly into bioruby (e.g. > Bio::Api::Ensembl)? >> I suppose the second option would be best, but the stuff I have is >> probably not polished enough yet... and *far* from complete. >> * Secondly: if a new release is coming: would it be best to wait >> untill _after_ that release? >> >> jan. >> >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 10 May 2007 09:51 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> >> On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >>> Michael, >>> >>> As you mention that we maybe won't be able to cover everything with >>> the default activerecord behaviour: what problems are you thinking > of? >>> >>> Note: I'd like to use the perl API as a guide. And indeed working >>> out > >>> the Slice object was quite simple... >> >> Yes, I was thinking of the slices in combination with the different >> assembly levels. >> A change to the mapping part of that broke the last EnsEMBL release. >> There were some cases where a seq_region maps one-to-many to other >> seq_regions (also with gaps). >> Did you put all the caching stuff from the Perl API into it? And if >> not, is the performance ok? >> >> It would be also great if you could put the code into some public >> place (RubyForge as example), then it would be easier to see what is >> already working/being worked at and what not. >> >>> jan. >>> >>> -----Original Message----- >>> From: Michael Han [mailto:mh6 at sanger.ac.uk] >>> Sent: 09 May 2007 14:38 >>> To: jan aerts (RI) >>> Subject: Re: [BioRuby] Ensembl API >>> >>> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>>> Has anyone worked on an Ensembl API? There is a perl API and the >>>> database schema is well-documented. On first impression, it seems >>>> straightforward to make one using ActiveRecord, but I wouldn't want >>>> to >>> >>>> waste efforts on that if someone else is already working on it. >>>> >>>> See http://www.ensembl.org/info/software/core/index.html >>>> >>>> Dr Jan Aerts >>>> Bioinformatics Group >>> >>> Hi Jan, >>> >>> I am not sure if you can cover everything with a the default active- >>> record behavior. >>> But I would be a happy user of a ruby EnsEMBL API. >>> If you need/want help with it, I would also volunteer. >>> >>> Michael >> > From jan.aerts at bbsrc.ac.uk Thu May 10 09:32:22 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 14:32:22 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB170D@rie2ksrv1.ri.bbsrc.ac.uk> Toshiaki, Your idea of a directory structure as this seems a good choice. /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : OK if I add a subdirectory for 'ensembl_api'? (so that would be: /bioruby-annex/ensembl_api/) jan. -----Original Message----- From: Toshiaki Katayama [mailto:ktym at hgc.jp] Sent: 10 May 2007 12:43 To: jan aerts (RI) Cc: BioRubyML Subject: Re: [BioRuby] Ensembl API Jan, You are added as a annex developer. > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails When I discussed with Nakao-san previously, we have agreed to include any Rails (or ActiveRecord etc.) dependent bioinformatics projects in the bioruby-annex repository (maybe we need to refine The purpose is 1. The number of codes which depend on Rails related libraries is increasing, but we have not yet decided how to integrate such modules in BioRuby. 2. We thought correcting various add-ons to BioRuby in one place might be useful (so that users can find your bioinformatics modules easily, and/or some of them might be integrated into core BioRuby library in the future). > (apart from the fact that it uses ActiveRecord in the background). So, I thought your project is appropriate to be included. > This might get confusing for possible users. Yes, we may need to re-write our project description to clarify. Volunteers? As the bioruby-annex seems to be functioning now, we need to discuss some rules for how to develop, release, and use sub products in it. My idea is still vague, but how about to have codes in SVN as /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : and each sub projects will release the package as a gem (or tar.gz) file prefixed with 'bioruby-' or 'bioruby-annex-' so that, for instance, you could release bioruby-ensembl-api-1.0.gem for downlaod from http://rubyforge.org/projects/bioruby-annex/ Toshiaki On 2007/05/10, at 19:34, jan aerts (RI) wrote: > Toshiaki, > > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails > (apart from the fact that it uses ActiveRecord in the background). > This might get confusing for possible users. > > jan. > > -----Original Message----- > From: Toshiaki Katayama [mailto:ktym at hgc.jp] > Sent: 10 May 2007 11:14 > To: jan aerts (RI) > Cc: Michael Han; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] Ensembl API > > Jan, > > In that case, I would like you to consider to use the rubyforge > repository 'bioruby-annex' which Nakao-san had set up. > > http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html > > When your modules matured and we, core developers, have decided how to > integrate Rails dependent modules in BioRuby, you can put them in the > BioRuby distribution. > > Toshiaki > > On 2007/05/10, at 18:09, jan aerts (RI) wrote: > >> I actually just started working on this API last night (in between >> some deadlines I got to catch), so haven't gotten so far as to think >> about caching. I'm basically working through the perl API tutorial >> (http://www.ensembl.org/info/software/core/core_tutorial.html) and >> try > >> to implement all those examples. (At the moment, I'm at the bit that >> says "Break chromosomal slices into smaller 100k component >> slices"...) >> >> Some hurdles that I see coming are the caching and projecting >> features > >> from one coord system to another. We'll see what happens when we get >> there. >> >> As for a public place: I would _very_ much appreciate help with the >> API, although >> >> QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: >> * Do you think it would be best to create a sourceforge project for >> this, or should I add it directly into bioruby (e.g. > Bio::Api::Ensembl)? >> I suppose the second option would be best, but the stuff I have is >> probably not polished enough yet... and *far* from complete. >> * Secondly: if a new release is coming: would it be best to wait >> untill _after_ that release? >> >> jan. >> >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 10 May 2007 09:51 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> >> On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >>> Michael, >>> >>> As you mention that we maybe won't be able to cover everything with >>> the default activerecord behaviour: what problems are you thinking > of? >>> >>> Note: I'd like to use the perl API as a guide. And indeed working >>> out > >>> the Slice object was quite simple... >> >> Yes, I was thinking of the slices in combination with the different >> assembly levels. >> A change to the mapping part of that broke the last EnsEMBL release. >> There were some cases where a seq_region maps one-to-many to other >> seq_regions (also with gaps). >> Did you put all the caching stuff from the Perl API into it? And if >> not, is the performance ok? >> >> It would be also great if you could put the code into some public >> place (RubyForge as example), then it would be easier to see what is >> already working/being worked at and what not. >> >>> jan. >>> >>> -----Original Message----- >>> From: Michael Han [mailto:mh6 at sanger.ac.uk] >>> Sent: 09 May 2007 14:38 >>> To: jan aerts (RI) >>> Subject: Re: [BioRuby] Ensembl API >>> >>> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>>> Has anyone worked on an Ensembl API? There is a perl API and the >>>> database schema is well-documented. On first impression, it seems >>>> straightforward to make one using ActiveRecord, but I wouldn't want >>>> to >>> >>>> waste efforts on that if someone else is already working on it. >>>> >>>> See http://www.ensembl.org/info/software/core/index.html >>>> >>>> Dr Jan Aerts >>>> Bioinformatics Group >>> >>> Hi Jan, >>> >>> I am not sure if you can cover everything with a the default active- >>> record behavior. >>> But I would be a happy user of a ruby EnsEMBL API. >>> If you need/want help with it, I would also volunteer. >>> >>> Michael >> > From fredjoha at bioreg.kyushu-u.ac.jp Tue May 15 01:54:46 2007 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Tue, 15 May 2007 14:54:46 +0900 Subject: [BioRuby] Bio::Blast not fully functional? Message-ID: <1179208486.7335.39.camel@fred-kyudai> Hello all, I am reading in the tutorial at http://dev.bioruby.org/wiki/en/?Tutorial.rd about BLAST and I try to use it according to this tutorial (see the code below). However, many of the entries in the 'hit' variable in the code below seems to be nil. hit.identity and hit.target_seq are for example two methods that just answer nil when I call them. Am I missing something? #!/usr/bin/env ruby require 'bio' factory = Bio::Blast.remote('blastp', 'nr-aa') Bio::FlatFile.open(Bio::FastaFormat, '/home/fred/pdb/test.fasta.txt') do |ff| ff.each do |entry| report = factory.query(entry) report.each do |hit| if hit.evalue < 0.001 puts hit.target_id puts hit.target_seq end end end end Thanks for any help! Best regards, Fredrik Johansson From hienle at club-internet.fr Tue May 15 11:30:27 2007 From: hienle at club-internet.fr (hienle at club-internet.fr) Date: Tue, 15 May 2007 17:30:27 +0200 Subject: [BioRuby] Parsing GFF3 attributes Message-ID: Hello all, I am working with a GFF3-formatted file and have noticed that the attributes field is not parsed properly. In bio/db/gff.rb, 75 def parse_attributes(attributes) 76 hash = Hash.new 77 attributes.split(/[^\\];/).each do |atr| 78 key, value = atr.split(' ', 2) 79 hash[key] = value 80 end 81 return hash 82 end 83 end I changed : 78 key, value = atr.split(' ', 2) to: 78 key, value = atr.split('=', 2) and it now appears to behave properly. However, I am not certain if this is appropriate for backward compatibility with GFF and GFF2. Is anyone working on parsing GFF3 files? Thank you in advance for your help, -Hien From mh6 at sanger.ac.uk Tue May 15 12:10:20 2007 From: mh6 at sanger.ac.uk (Michael Han) Date: Tue, 15 May 2007 17:10:20 +0100 Subject: [BioRuby] Parsing GFF3 attributes In-Reply-To: References: Message-ID: On 15 May 2007, at 16:30, hienle at club-internet.fr wrote: > Hello all, > > I am working with a GFF3-formatted file and have noticed that the > attributes field is not parsed properly. > > In bio/db/gff.rb, > > 75 def parse_attributes(attributes) > 76 hash = Hash.new > 77 attributes.split(/[^\\];/).each do |atr| > 78 key, value = atr.split(' ', 2) > 79 hash[key] = value > 80 end > 81 return hash > 82 end > 83 end > > I changed : > 78 key, value = atr.split(' ', 2) > to: > 78 key, value = atr.split('=', 2) > > and it now appears to behave properly. However, I am not certain if > this is appropriate for backward compatibility with GFF and GFF2. I use normally spaces between the key and the value of the attributes for GFF2 like: Gene "1234" ; Transcript "1234" as described in <"http://www.sanger.ac.uk/Software/formats/GFF/ GFF_Spec.shtml"> so it would break GFF2 / GFF parsing. Maybe you could create a separate GFF3 parser inheriting from the gff.rb . some GFF3 reference (note: last version from a few weeks ago) <"http://www.sequenceontology.org/gff3.shtml"> > Is anyone working on parsing GFF3 files? > > Thank you in advance for your help, > -Hien MIchael From hien.le at mail.mcgill.ca Wed May 16 03:01:54 2007 From: hien.le at mail.mcgill.ca (Hien Le) Date: Wed, 16 May 2007 03:01:54 -0400 Subject: [BioRuby] Parsing GFF3 attributes In-Reply-To: References: Message-ID: <20070516030154.esps58l1ws8c8o00@webmail.mcgill.ca> Quoting Michael Han : > so it would break GFF2 / GFF parsing. > Maybe you could create a separate GFF3 parser inheriting from the > gff.rb . OK, thanks Michael for the advice! -Hien From cihan at cihan.us Wed May 16 10:44:14 2007 From: cihan at cihan.us (cihan inan) Date: Wed, 16 May 2007 17:44:14 +0300 Subject: [BioRuby] Translate docs Message-ID: <5584352e0705160744w429281b6v3ccf3e8f105cac17@mail.gmail.com> hi I am cihan inan and I am from Turkey. I am a student at Biology. I am new in Ruby language. I want to translate some docs to Turkish. But I think I have to get permission. But I dont know who gives that permission? please help me about this topic. ( tell me the docs to start give me a way sth. ) thanks a lot. From fredjoha at bioreg.kyushu-u.ac.jp Thu May 17 21:17:12 2007 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Fri, 18 May 2007 10:17:12 +0900 Subject: [BioRuby] FASTA problem Message-ID: <1179451032.29172.33.camel@fred-kyudai> Hello, I had problem running a fasta search locally on my computer, and it turned out that Kernel.exec(*cmd) is not very happy to get the array cmd with nil as one of its element. This is however what happens in bio/app/fasta.rb, if ktup is not set. I changed this to make it work : --- fasta.rb.old 2007-05-18 09:55:01.000000000 +0900 +++ fasta.rb 2007-05-18 09:55:37.000000000 +0900 @@ -114,7 +114,8 @@ def exec_local(query) cmd = [ @program, *@options ] - cmd.concat([ '@', @db, @ktup ]) + cmd.concat([ '@', @db]) + cmd.push(@ktup) if @ktup report = nil Something to submit to the repository? Best regards, Fredrik From ktym at hgc.jp Fri May 18 11:23:41 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 00:23:41 +0900 Subject: [BioRuby] FASTA problem In-Reply-To: <1179451032.29172.33.camel@fred-kyudai> References: <1179451032.29172.33.camel@fred-kyudai> Message-ID: Hello Fredrik, Thank you for your fix! I have commited your patch to the CVS. Regards, Toshiaki Katayama On 2007/05/18, at 10:17, Fredrik Johansson wrote: > Hello, > I had problem running a fasta search locally on my computer, and it > turned out that Kernel.exec(*cmd) is not very happy to get the array cmd > with nil as one of its element. This is however what happens in > bio/app/fasta.rb, if ktup is not set. I changed this to make it work : > > --- fasta.rb.old 2007-05-18 09:55:01.000000000 +0900 > +++ fasta.rb 2007-05-18 09:55:37.000000000 +0900 > @@ -114,7 +114,8 @@ > > def exec_local(query) > cmd = [ @program, *@options ] > - cmd.concat([ '@', @db, @ktup ]) > + cmd.concat([ '@', @db]) > + cmd.push(@ktup) if @ktup > > report = nil > > Something to submit to the repository? > > Best regards, > Fredrik > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Fri May 18 11:24:24 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 00:24:24 +0900 Subject: [BioRuby] Translate docs In-Reply-To: <5584352e0705160744w429281b6v3ccf3e8f105cac17@mail.gmail.com> References: <5584352e0705160744w429281b6v3ccf3e8f105cac17@mail.gmail.com> Message-ID: <852FFA12-3475-46B7-82C6-250ACD7A6BD0@hgc.jp> Hi, You are free to translate the documents in the bioruby-x.x.x/doc/ directory or on web (wiki) pages. I apologize the main document, bioruby tutorial, is not updated (I've lost updated version with disk crash in this Feb and not restarted yet). Regards, Toshiaki Katayama On 2007/05/16, at 23:44, cihan inan wrote: > hi I am cihan inan and I am from Turkey. I am a student at Biology. I am new > in Ruby language. I want to translate some docs to Turkish. But I think I > have to get permission. But I dont know who gives that permission? please > help me about this topic. ( tell me the docs to start give me a way sth. ) > thanks a lot. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Fri May 18 11:23:51 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 00:23:51 +0900 Subject: [BioRuby] Parsing GFF3 attributes In-Reply-To: References: Message-ID: <10A4A7B4-063F-459F-B9D6-56222F8C26E9@hgc.jp> Hien, Thank you for your report. In bio/db/gff.rb, we have Bio::GFF::GFF2 for version 2 spec and Bio::GFF::GFF3 for version 3 and I added your modification to the Bio::GFF::GFF3 class. Personally, I have not yet use GFF3 intensively, so if you think the class should have more functionality to support new features in GFF3, please propose. Toshiaki On 2007/05/16, at 1:10, Michael Han wrote: > > On 15 May 2007, at 16:30, hienle at club-internet.fr wrote: >> Hello all, >> >> I am working with a GFF3-formatted file and have noticed that the >> attributes field is not parsed properly. >> >> In bio/db/gff.rb, >> >> 75 def parse_attributes(attributes) >> 76 hash = Hash.new >> 77 attributes.split(/[^\\];/).each do |atr| >> 78 key, value = atr.split(' ', 2) >> 79 hash[key] = value >> 80 end >> 81 return hash >> 82 end >> 83 end >> >> I changed : >> 78 key, value = atr.split(' ', 2) >> to: >> 78 key, value = atr.split('=', 2) >> >> and it now appears to behave properly. However, I am not certain if >> this is appropriate for backward compatibility with GFF and GFF2. > > I use normally spaces between the key and the value of the attributes > for GFF2 like: Gene "1234" ; Transcript "1234" > as described in <"http://www.sanger.ac.uk/Software/formats/GFF/ > GFF_Spec.shtml"> > > so it would break GFF2 / GFF parsing. > Maybe you could create a separate GFF3 parser inheriting from the > gff.rb . > > some GFF3 reference (note: last version from a few weeks ago) > <"http://www.sequenceontology.org/gff3.shtml"> > >> Is anyone working on parsing GFF3 files? >> >> Thank you in advance for your help, >> -Hien > > MIchael > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Fri May 18 12:00:33 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 01:00:33 +0900 Subject: [BioRuby] Bio::Blast not fully functional? In-Reply-To: <1179208486.7335.39.camel@fred-kyudai> References: <1179208486.7335.39.camel@fred-kyudai> Message-ID: <8B741702-DB3E-49D5-9920-9420336F4E61@hgc.jp> Fredrik, This is because Bio::Blast.remote uses '-m 8' option which returns a tabular output format without target sequences. You can use XML output for your purpose by changing the following line > factory = Bio::Blast.remote('blastp', 'nr-aa') to factory = Bio::Blast.remote('blastp', 'nr-aa', '-m 7') Regards, Toshiaki On 2007/05/15, at 14:54, Fredrik Johansson wrote: > Hello all, > I am reading in the tutorial at > http://dev.bioruby.org/wiki/en/?Tutorial.rd > about BLAST and I try to use it according to this tutorial (see the code > below). > However, many of the entries in the 'hit' variable in the code below > seems to be nil. hit.identity and hit.target_seq are for example two > methods that just answer nil when I call them. Am I missing something? > > #!/usr/bin/env ruby > require 'bio' > factory = Bio::Blast.remote('blastp', 'nr-aa') > Bio::FlatFile.open(Bio::FastaFormat, '/home/fred/pdb/test.fasta.txt') do > |ff| > ff.each do |entry| > report = factory.query(entry) > report.each do |hit| > if hit.evalue < 0.001 > puts hit.target_id > puts hit.target_seq > end > end > end > end > > Thanks for any help! > Best regards, > Fredrik Johansson > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From fredjoha at bioreg.kyushu-u.ac.jp Mon May 21 04:16:31 2007 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 21 May 2007 17:16:31 +0900 Subject: [BioRuby] Parsing FASTA result Message-ID: <1179735391.29172.91.camel@fred-kyudai> I have encountered a problem again when running FASTA. I got a huge amount of homologs from fasta (25 MB data) for one sequence, and then the Bio::Fasta::Report class gets this error when initializing: format10.rb:21:in `sub!': failed to allocate memory (NoMemoryError) so I made the following changes to my code. It is just a quick fix, and I am not sure about that 'else' case that I took away. It does not seem to be covered by the line that I added. Also I did not bother about the @list variable since it does not seem to be used anywhere. /Fredrik The patch: --- fasta/format10.rb 2007-05-21 16:50:38.000000000 +0900 +++ fasta/format10.new.rb 2007-05-21 16:52:55.000000000 +0900 @@ -17,13 +17,7 @@ def initialize(data) # header lines - brief list of the hits - if data.sub!(/.*\nThe best scores are/m, '') - data.sub!(/(.*)\n\n>>>/m, '') - @list = "The best scores are" + $1 - else - data.sub!(/.*\n!!\s+/m, '') - data.sub!(/.*/) { |x| @list = x; '' } - end + data = data[data.index("\n\n>>>")+5..data.size] # body lines - fasta execution result program, *hits = data.split(/\n>>/) From jan.aerts at bbsrc.ac.uk Mon May 21 10:05:17 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Mon, 21 May 2007 15:05:17 +0100 Subject: [BioRuby] Ensembl API for ruby Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB1784@rie2ksrv1.ri.bbsrc.ac.uk> All, I committed the first version of a ruby API to the bioruby-annex SVN on rubyforge. Although it is far from completed, it does already have its uses. Thought it would be good to commit as soon as possible so you guys get an idea of what the API would look like. I would like to follow the excellent perl API as much as possible (http://www.ensembl.org/info/software/core/core_tutorial.html). What it does at the moment: * All tables of the core database are covered by ActiveRecord classes. * A Slice object represents a continuous region of a genome. Slices can be used to obtain sequence, features or other information from a particular region of interest. * Coordinates can be transfered from one coordinate system to another. I hope I tested this thorougly enough (but still am a bit squeemish about it). What is not implemented yet: * A whole bunch of methods that are available to the perl objects that I would like to 'copy' to the ruby API. * The Variation and Compara databases. * The 'project' and 'transform' methods for features and slices (as explained on the perl tutorial mentioned above). To get this code, please go to http://rubyforge.org/projects/bioruby-annex/ You can export the code using SVN with the following command (without quotes): "svn checkout svn://rubyforge.org/var/svn/bioruby-annex". There should be a subdirectory 'ensembl-api', containing the API itself, the sample script, tests and of course all documentation. I've created a gem (available in the top-directory of the SVN export), but can't test if it actually works. Can someone please test it for me? Toshiaki: if the code works and the gem can be installed on other systems, could you please do a file release for the gem-file? Others: if people are interested to help me to develop this API, please let me know. Thanks, jan. Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 skype: aerts_ri ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From hien.le at mail.mcgill.ca Tue May 22 03:33:26 2007 From: hien.le at mail.mcgill.ca (Hien Le) Date: Tue, 22 May 2007 03:33:26 -0400 Subject: [BioRuby] Parsing GFF3 attributes Message-ID: <20070522033326.1zpegb2scgg0skws@webmail.mcgill.ca> On May 18, 2007, at 17:23, Toshiaki Katayama wrote: > and I added your modification to the Bio::GFF::GFF3 class. OK Thanks! > Personally, I have not yet use GFF3 intensively, so if you think the > class should have more functionality to support new features in GFF3, > please propose. Same for myself, I have just recently started using GFF3 formats. I'll let you know if I think the class needs added functionality. -Hien From christophercyll at gmail.com Thu May 24 18:22:56 2007 From: christophercyll at gmail.com (Topher Cyll) Date: Thu, 24 May 2007 18:22:56 -0400 Subject: [BioRuby] looking for a cool project to do using bioruby Message-ID: <2599499e0705241522k4e674a26q28ddc2bd021e17a8@mail.gmail.com> Hi BioRubyers, I'm a long time reader of this list, but I think this might be my first post. I'm in the process of writing a Ruby project book. Each chapter guides the reader through a new and unusual project they can code in Ruby. For comparison, in other chapters we do things like compose music, build a game, run simulations, implement Lisp, etc. I'd really like to include a project using BioRuby (since I think a lot of Rubyists would find it exciting). But with only one undergraduate bio-informatics class under my belt, I'm a little stuck on ideas for fun and interesting projects. So I thought I'd ask the experts! Can anyone think of a fun, interesting use for BioRuby that I could walk readers through? I like to have each project produce a final product, instead of just doing a tutorial, so I'm looking for a idea that would have some implementation work, but wouldn't be too, too difficult. Any ideas? Toph From s-merchant at northwestern.edu Fri May 25 11:39:55 2007 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Fri, 25 May 2007 10:39:55 -0500 Subject: [BioRuby] looking for a cool project to do using bioruby In-Reply-To: <2599499e0705241522k4e674a26q28ddc2bd021e17a8@mail.gmail.com> References: <2599499e0705241522k4e674a26q28ddc2bd021e17a8@mail.gmail.com> Message-ID: <000001c79ee2$f1eda7b0$c2987ca5@pc13> Hi Toph, I would say one of fun things to implement in Ruby would be Ontology visualization. The tool could be used to visualize any kind of ontology such as GO. It would be cool if this could be integrated in to a Rails app. Look at QuickGo which visualizes the Gene Ontology http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0042351 Hope this helps. Let me know if you any questions. Cheers, Sohel. -----Original Message----- From: bioruby-bounces at lists.open-bio.org [mailto:bioruby-bounces at lists.open-bio.org] On Behalf Of Topher Cyll Sent: Thursday, May 24, 2007 5:23 PM To: bioruby at lists.open-bio.org Subject: [BioRuby] looking for a cool project to do using bioruby Hi BioRubyers, I'm a long time reader of this list, but I think this might be my first post. I'm in the process of writing a Ruby project book. Each chapter guides the reader through a new and unusual project they can code in Ruby. For comparison, in other chapters we do things like compose music, build a game, run simulations, implement Lisp, etc. I'd really like to include a project using BioRuby (since I think a lot of Rubyists would find it exciting). But with only one undergraduate bio-informatics class under my belt, I'm a little stuck on ideas for fun and interesting projects. So I thought I'd ask the experts! Can anyone think of a fun, interesting use for BioRuby that I could walk readers through? I like to have each project produce a final product, instead of just doing a tutorial, so I'm looking for a idea that would have some implementation work, but wouldn't be too, too difficult. Any ideas? Toph _______________________________________________ BioRuby mailing list BioRuby at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioruby From email2ants at gmail.com Wed May 2 16:38:31 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Wed, 2 May 2007 17:38:31 +0100 Subject: [BioRuby] EMBL parsing Message-ID: <9EF29B4E-EDFC-4106-A736-52195314F374@gmail.com> Dear all, I am having a problem when parsing EMBL genome files. embl = open(filename) entry = embl.gets(Bio::EMBL::DELIMITER) seq_obj = Bio::EMBL.new(entry) puts seq_obj.sequence_length This takes a long time (minutes) and reports a sequence_length of 0 When taking an equivalent genbank file and changing the code appropriately it parses the file in seconds and reports the correct length. I am new to bioruby having used bioperl until now. Please can anybody let me know if they have had similar problems and any possible solutions. Many thanks Anthony From n at bioruby.org Wed May 2 16:47:27 2007 From: n at bioruby.org (Mitsuteru Nakao) Date: Thu, 3 May 2007 01:47:27 +0900 Subject: [BioRuby] EMBL parsing In-Reply-To: <9EF29B4E-EDFC-4106-A736-52195314F374@gmail.com> References: <9EF29B4E-EDFC-4106-A736-52195314F374@gmail.com> Message-ID: <90ca35f70705020947g66d6edf9pc552d79935973e75@mail.gmail.com> Hi Anthony, Thank you for your bug report. Please let me know the accession numbers of EMBL genome files you parsed. Thanks in advance. Mitsuteru the Bio::EMBL maintainer. > I am having a problem when parsing EMBL genome files. > > embl = open(filename) > entry = embl.gets(Bio::EMBL::DELIMITER) > seq_obj = Bio::EMBL.new(entry) > puts seq_obj.sequence_length > > > This takes a long time (minutes) and reports a sequence_length of 0 From email2ants at gmail.com Thu May 3 11:48:03 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Thu, 3 May 2007 12:48:03 +0100 Subject: [BioRuby] EMBL parsing Message-ID: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> Hi Mitsiteru, Any of the embl files downloaded from the ebi site have this problem. for example http://www.ebi.ac.uk/cgi-bin/dbfetch? db=embl&style=raw&id=CP000360 Ruby takes all of the cpu power :( But with the equivalent file from NCBI in genbank format there is no problem. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? cmd=Retrieve&db=Nucleotide&list_uids=94549081&dopt=GenBank Many thanks for having a look at this Anthony From jdudley at stanford.edu Thu May 3 13:51:52 2007 From: jdudley at stanford.edu (Joel Dudley) Date: Thu, 3 May 2007 06:51:52 -0700 Subject: [BioRuby] Ruby in Science Message-ID: I'm giving a talk tonight at the SDForum Ruby SIG at Google Headquarters on the topic of Ruby and Ruby on Rails in scientific computing. For part of the talk I am going to highlight companies, research groups, products, and other resources that are deployed or currently under development using Ruby and/or Ruby on Rails. If you've got a scientific application or resource that fits this and would like me to give it mention to please send me the details off- list. I can't guarantee that I'll mention it (time for the talk is limited), but I'll do my best. Thanks, Joel Dudley Stanford Medical Informatics From s-merchant at northwestern.edu Fri May 4 21:52:11 2007 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Fri, 4 May 2007 16:52:11 -0500 Subject: [BioRuby] BOF at Ruby on Rails conference Message-ID: <000601c78e96$77f5bb20$c2987ca5@pc13> Hello Everyone, I am organizing a Birds of a Feather (BoF) session at the upcoming Rails conference in Portland, Oregon. Here are the details: Venue: Rails conf 2007, Oregon Convention Center (OCC) Date: Saturday, May 19 Time: 7:30-8:30pm Room: c125. I hope to meet some of you guys there. Cheers, Sohel Merchant. From ngoto at gen-info.osaka-u.ac.jp Sat May 5 06:57:28 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Sat, 5 May 2007 15:57:28 +0900 Subject: [BioRuby] EMBL parsing In-Reply-To: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> References: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> Message-ID: <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> Hi, On Thu, 3 May 2007 12:48:03 +0100 Anthony Underwood wrote: > Hi Mitsiteru, > > Any of the embl files downloaded from the ebi site have this problem. > > for example http://www.ebi.ac.uk/cgi-bin/dbfetch? > db=embl&style=raw&id=CP000360 > > Ruby takes all of the cpu power :( It seems it is caused by thousands of iterations of str1 += str2 because it creates a new string object every time. A patch is attached. (Ruby 1.8.0 or newer version required) --- lib/bio/db.rb 5 Apr 2007 23:35:39 -0000 0.37 +++ lib/bio/db.rb 5 May 2007 06:08:39 -0000 @@ -313,12 +313,12 @@ # Returns the contents of the entry as a Hash. def entry2hash(entry) - hash = Hash.new('') + hash = Hash.new { |h, k| h[k] = '' } entry.each_line do |line| tag = tag_get(line) next if tag == 'XX' tag = 'R' if tag =~ /^R./ # Reference lines - hash[tag] += line + hash[tag].concat line end return hash end Naohisa Goto ng at bioruby.org From email2ants at gmail.com Tue May 8 11:53:37 2007 From: email2ants at gmail.com (Anthony Underwood) Date: Tue, 8 May 2007 12:53:37 +0100 Subject: [BioRuby] EMBL parsing In-Reply-To: <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> References: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> Message-ID: <4045DF0D-6F4C-47DF-A4D7-1B31D61C3A7D@gmail.com> Hi Naohisa, Thanks for the patch. This certainly appears to solve the problem of slow embl entry reading. However the sequence length is still reported as 0. I found this was due to the idline not being interpreted correctly on line 97 tmp['SEQUENCE_LENGTH'] = idline[3].strip.split(' ').first.to_i was changed to tmp['SEQUENCE_LENGTH'] = idline.last.strip.split(' ').first.to_i This was OK for my purposes, but I think the whole idline interpretation needs to be looked at see (http://www.ebi.ac.uk/embl/ Documentation/User_manual/usrman.html#3_4_1). I could have a look at this if appropriate. Thanks Anthony On 5 May 2007, at 07:57, Naohisa GOTO wrote: > Hi, > > On Thu, 3 May 2007 12:48:03 +0100 > Anthony Underwood wrote: > >> Hi Mitsiteru, >> >> Any of the embl files downloaded from the ebi site have this problem. >> >> for example http://www.ebi.ac.uk/cgi-bin/dbfetch? >> db=embl&style=raw&id=CP000360 >> >> Ruby takes all of the cpu power :( > > It seems it is caused by thousands of iterations of str1 += str2 > because it creates a new string object every time. > A patch is attached. (Ruby 1.8.0 or newer version required) > > --- lib/bio/db.rb 5 Apr 2007 23:35:39 -0000 0.37 > +++ lib/bio/db.rb 5 May 2007 06:08:39 -0000 > @@ -313,12 +313,12 @@ > > # Returns the contents of the entry as a Hash. > def entry2hash(entry) > - hash = Hash.new('') > + hash = Hash.new { |h, k| h[k] = '' } > entry.each_line do |line| > tag = tag_get(line) > next if tag == 'XX' > tag = 'R' if tag =~ /^R./ # Reference lines > - hash[tag] += line > + hash[tag].concat line > end > return hash > end > > > Naohisa Goto > ng at bioruby.org > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From n at bioruby.org Tue May 8 17:09:47 2007 From: n at bioruby.org (Mitsuteru Nakao) Date: Wed, 9 May 2007 02:09:47 +0900 Subject: [BioRuby] EMBL parsing In-Reply-To: <4045DF0D-6F4C-47DF-A4D7-1B31D61C3A7D@gmail.com> References: <0AEBC5E8-5128-4448-AE2E-8FA7E8691332@gmail.com> <20070505065729.4ECD41CBC510@idnmail.gen-info.osaka-u.ac.jp> <4045DF0D-6F4C-47DF-A4D7-1B31D61C3A7D@gmail.com> Message-ID: <90ca35f70705081009r7719c562u36a902a84f88566c@mail.gmail.com> Hi Anthony, Thank you for your suggestions. You can use the bioruby CVS HEAD which contains new idline parser for EMBL rel89. > This was OK for my purposes, but I think the whole idline > interpretation needs to be looked at see (http://www.ebi.ac.uk/embl/ > Documentation/User_manual/usrman.html#3_4_1). I could have a look at > this if appropriate. Thanks Mitsuteru - Mitsuteur Nakao n at bioruby.org From jan.aerts at bbsrc.ac.uk Wed May 9 13:14:52 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 9 May 2007 14:14:52 +0100 Subject: [BioRuby] Ensembl API Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F3@rie2ksrv1.ri.bbsrc.ac.uk> Has anyone worked on an Ensembl API? There is a perl API and the database schema is well-documented. On first impression, it seems straightforward to make one using ActiveRecord, but I wouldn't want to waste efforts on that if someone else is already working on it. See http://www.ensembl.org/info/software/core/index.html Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 skype: aerts_ri ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From jan.aerts at bbsrc.ac.uk Wed May 9 13:49:50 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 9 May 2007 14:49:50 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <1E75E5B2-D515-4EE4-9B2D-2E2D034E1EA1@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F6@rie2ksrv1.ri.bbsrc.ac.uk> A post (in Japanese) about this shows a good primer: http://itoshi.tv/d/?date=20060829 I'm testing this out on some tables of the Ensembl core and variation databases, and things look promising... As you say it might well be that we can't cover everything, but at least we can get quite far. jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 09 May 2007 14:38 To: jan aerts (RI) Subject: Re: [BioRuby] Ensembl API On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: > Has anyone worked on an Ensembl API? There is a perl API and the > database schema is well-documented. On first impression, it seems > straightforward to make one using ActiveRecord, but I wouldn't want to > waste efforts on that if someone else is already working on it. > > See http://www.ensembl.org/info/software/core/index.html > > Dr Jan Aerts > Bioinformatics Group Hi Jan, I am not sure if you can cover everything with a the default active- record behavior. But I would be a happy user of a ruby EnsEMBL API. If you need/want help with it, I would also volunteer. Michael From jan.aerts at bbsrc.ac.uk Wed May 9 14:10:24 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 9 May 2007 15:10:24 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <58E34B0F-64FF-4CEB-9E36-7AF0B7159D08@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F9@rie2ksrv1.ri.bbsrc.ac.uk> I've tried a connection similar to the ones used by the perl API and am able to connect directly to the core and other databases. Am currently playing around with the cow features, actually :-) jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 09 May 2007 14:57 To: jan aerts (RI) Cc: bioruby at lists.open-bio.org Subject: Re: [BioRuby] Ensembl API On 9 May 2007, at 14:49, jan aerts ((RI)) wrote: > A post (in Japanese) about this shows a good primer: > http://itoshi.tv/d/?date=20060829 my Japanese is not so good, but isn't that run on a BioMart and not an ensembl-core schema? > I'm testing this out on some tables of the Ensembl core and variation > databases, and things look promising... As you say it might well be > that we can't cover everything, but at least we can get quite far. > > jan. > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 09 May 2007 14:38 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >> Has anyone worked on an Ensembl API? There is a perl API and the >> database schema is well-documented. On first impression, it seems >> straightforward to make one using ActiveRecord, but I wouldn't want >> to > >> waste efforts on that if someone else is already working on it. >> >> See http://www.ensembl.org/info/software/core/index.html >> >> Dr Jan Aerts >> Bioinformatics Group > > Hi Jan, > > I am not sure if you can cover everything with a the default active- > record behavior. > But I would be a happy user of a ruby EnsEMBL API. > If you need/want help with it, I would also volunteer. > > Michael From mh6 at sanger.ac.uk Wed May 9 13:57:18 2007 From: mh6 at sanger.ac.uk (Michael Han) Date: Wed, 9 May 2007 14:57:18 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F6@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB45104FB16F6@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <58E34B0F-64FF-4CEB-9E36-7AF0B7159D08@sanger.ac.uk> On 9 May 2007, at 14:49, jan aerts ((RI)) wrote: > A post (in Japanese) about this shows a good primer: > http://itoshi.tv/d/?date=20060829 my Japanese is not so good, but isn't that run on a BioMart and not an ensembl-core schema? > I'm testing this out on some tables of the Ensembl core and variation > databases, and things look promising... As you say it might well be > that > we can't cover everything, but at least we can get quite far. > > jan. > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 09 May 2007 14:38 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >> Has anyone worked on an Ensembl API? There is a perl API and the >> database schema is well-documented. On first impression, it seems >> straightforward to make one using ActiveRecord, but I wouldn't >> want to > >> waste efforts on that if someone else is already working on it. >> >> See http://www.ensembl.org/info/software/core/index.html >> >> Dr Jan Aerts >> Bioinformatics Group > > Hi Jan, > > I am not sure if you can cover everything with a the default active- > record behavior. > But I would be a happy user of a ruby EnsEMBL API. > If you need/want help with it, I would also volunteer. > > Michael From jan.aerts at bbsrc.ac.uk Thu May 10 07:58:02 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 08:58:02 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <1E75E5B2-D515-4EE4-9B2D-2E2D034E1EA1@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB16FE@rie2ksrv1.ri.bbsrc.ac.uk> Michael, As you mention that we maybe won't be able to cover everything with the default activerecord behaviour: what problems are you thinking of? Note: I'd like to use the perl API as a guide. And indeed working out the Slice object was quite simple... jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 09 May 2007 14:38 To: jan aerts (RI) Subject: Re: [BioRuby] Ensembl API On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: > Has anyone worked on an Ensembl API? There is a perl API and the > database schema is well-documented. On first impression, it seems > straightforward to make one using ActiveRecord, but I wouldn't want to > waste efforts on that if someone else is already working on it. > > See http://www.ensembl.org/info/software/core/index.html > > Dr Jan Aerts > Bioinformatics Group Hi Jan, I am not sure if you can cover everything with a the default active- record behavior. But I would be a happy user of a ruby EnsEMBL API. If you need/want help with it, I would also volunteer. Michael From jan.aerts at bbsrc.ac.uk Thu May 10 09:09:50 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 10:09:50 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: <834748BE-64D8-4642-AF70-5098589F91EB@sanger.ac.uk> Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB1701@rie2ksrv1.ri.bbsrc.ac.uk> I actually just started working on this API last night (in between some deadlines I got to catch), so haven't gotten so far as to think about caching. I'm basically working through the perl API tutorial (http://www.ensembl.org/info/software/core/core_tutorial.html) and try to implement all those examples. (At the moment, I'm at the bit that says "Break chromosomal slices into smaller 100k component slices"...) Some hurdles that I see coming are the caching and projecting features from one coord system to another. We'll see what happens when we get there. As for a public place: I would _very_ much appreciate help with the API, although QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: * Do you think it would be best to create a sourceforge project for this, or should I add it directly into bioruby (e.g. Bio::Api::Ensembl)? I suppose the second option would be best, but the stuff I have is probably not polished enough yet... and *far* from complete. * Secondly: if a new release is coming: would it be best to wait untill _after_ that release? jan. -----Original Message----- From: Michael Han [mailto:mh6 at sanger.ac.uk] Sent: 10 May 2007 09:51 To: jan aerts (RI) Subject: Re: [BioRuby] Ensembl API On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: > Michael, > > As you mention that we maybe won't be able to cover everything with > the default activerecord behaviour: what problems are you thinking of? > > Note: I'd like to use the perl API as a guide. And indeed working out > the Slice object was quite simple... Yes, I was thinking of the slices in combination with the different assembly levels. A change to the mapping part of that broke the last EnsEMBL release. There were some cases where a seq_region maps one-to-many to other seq_regions (also with gaps). Did you put all the caching stuff from the Perl API into it? And if not, is the performance ok? It would be also great if you could put the code into some public place (RubyForge as example), then it would be easier to see what is already working/being worked at and what not. > jan. > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 09 May 2007 14:38 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >> Has anyone worked on an Ensembl API? There is a perl API and the >> database schema is well-documented. On first impression, it seems >> straightforward to make one using ActiveRecord, but I wouldn't want >> to > >> waste efforts on that if someone else is already working on it. >> >> See http://www.ensembl.org/info/software/core/index.html >> >> Dr Jan Aerts >> Bioinformatics Group > > Hi Jan, > > I am not sure if you can cover everything with a the default active- > record behavior. > But I would be a happy user of a ruby EnsEMBL API. > If you need/want help with it, I would also volunteer. > > Michael From jan.aerts at bbsrc.ac.uk Thu May 10 10:18:22 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 11:18:22 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB1703@rie2ksrv1.ri.bbsrc.ac.uk> OK. -----Original Message----- From: Toshiaki Katayama [mailto:ktym at hgc.jp] Sent: 10 May 2007 11:14 To: jan aerts (RI) Cc: Michael Han; bioruby at lists.open-bio.org Subject: Re: [BioRuby] Ensembl API Jan, In that case, I would like you to consider to use the rubyforge repository 'bioruby-annex' which Nakao-san had set up. http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html When your modules matured and we, core developers, have decided how to integrate Rails dependent modules in BioRuby, you can put them in the BioRuby distribution. Toshiaki On 2007/05/10, at 18:09, jan aerts (RI) wrote: > I actually just started working on this API last night (in between > some deadlines I got to catch), so haven't gotten so far as to think > about caching. I'm basically working through the perl API tutorial > (http://www.ensembl.org/info/software/core/core_tutorial.html) and try > to implement all those examples. (At the moment, I'm at the bit that > says "Break chromosomal slices into smaller 100k component slices"...) > > Some hurdles that I see coming are the caching and projecting features > from one coord system to another. We'll see what happens when we get > there. > > As for a public place: I would _very_ much appreciate help with the > API, although > > QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: > * Do you think it would be best to create a sourceforge project for > this, or should I add it directly into bioruby (e.g. Bio::Api::Ensembl)? > I suppose the second option would be best, but the stuff I have is > probably not polished enough yet... and *far* from complete. > * Secondly: if a new release is coming: would it be best to wait > untill _after_ that release? > > jan. > > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 10 May 2007 09:51 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > > On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >> Michael, >> >> As you mention that we maybe won't be able to cover everything with >> the default activerecord behaviour: what problems are you thinking of? >> >> Note: I'd like to use the perl API as a guide. And indeed working out >> the Slice object was quite simple... > > Yes, I was thinking of the slices in combination with the different > assembly levels. > A change to the mapping part of that broke the last EnsEMBL release. > There were some cases where a seq_region maps one-to-many to other > seq_regions (also with gaps). > Did you put all the caching stuff from the Perl API into it? And if > not, is the performance ok? > > It would be also great if you could put the code into some public > place (RubyForge as example), then it would be easier to see what is > already working/being worked at and what not. > >> jan. >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 09 May 2007 14:38 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>> Has anyone worked on an Ensembl API? There is a perl API and the >>> database schema is well-documented. On first impression, it seems >>> straightforward to make one using ActiveRecord, but I wouldn't want >>> to >> >>> waste efforts on that if someone else is already working on it. >>> >>> See http://www.ensembl.org/info/software/core/index.html >>> >>> Dr Jan Aerts >>> Bioinformatics Group >> >> Hi Jan, >> >> I am not sure if you can cover everything with a the default active- >> record behavior. >> But I would be a happy user of a ruby EnsEMBL API. >> If you need/want help with it, I would also volunteer. >> >> Michael > From ktym at hgc.jp Thu May 10 10:13:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 19:13:34 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB45104FB1701@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1701@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: Jan, In that case, I would like you to consider to use the rubyforge repository 'bioruby-annex' which Nakao-san had set up. http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html When your modules matured and we, core developers, have decided how to integrate Rails dependent modules in BioRuby, you can put them in the BioRuby distribution. Toshiaki On 2007/05/10, at 18:09, jan aerts (RI) wrote: > I actually just started working on this API last night (in between some > deadlines I got to catch), so haven't gotten so far as to think about > caching. I'm basically working through the perl API tutorial > (http://www.ensembl.org/info/software/core/core_tutorial.html) and try > to implement all those examples. (At the moment, I'm at the bit that > says "Break chromosomal slices into smaller 100k component slices"...) > > Some hurdles that I see coming are the caching and projecting features > from one coord system to another. We'll see what happens when we get > there. > > As for a public place: I would _very_ much appreciate help with the API, > although > > QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: > * Do you think it would be best to create a sourceforge project for > this, or should I add it directly into bioruby (e.g. Bio::Api::Ensembl)? > I suppose the second option would be best, but the stuff I have is > probably not polished enough yet... and *far* from complete. > * Secondly: if a new release is coming: would it be best to wait untill > _after_ that release? > > jan. > > > -----Original Message----- > From: Michael Han [mailto:mh6 at sanger.ac.uk] > Sent: 10 May 2007 09:51 > To: jan aerts (RI) > Subject: Re: [BioRuby] Ensembl API > > > On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >> Michael, >> >> As you mention that we maybe won't be able to cover everything with >> the default activerecord behaviour: what problems are you thinking of? >> >> Note: I'd like to use the perl API as a guide. And indeed working out >> the Slice object was quite simple... > > Yes, I was thinking of the slices in combination with the different > assembly levels. > A change to the mapping part of that broke the last EnsEMBL release. > There were some cases where a seq_region maps one-to-many to other > seq_regions (also with gaps). > Did you put all the caching stuff from the Perl API into it? And if not, > is the performance ok? > > It would be also great if you could put the code into some public place > (RubyForge as example), then it would be easier to see what is already > working/being worked at and what not. > >> jan. >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 09 May 2007 14:38 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>> Has anyone worked on an Ensembl API? There is a perl API and the >>> database schema is well-documented. On first impression, it seems >>> straightforward to make one using ActiveRecord, but I wouldn't want >>> to >> >>> waste efforts on that if someone else is already working on it. >>> >>> See http://www.ensembl.org/info/software/core/index.html >>> >>> Dr Jan Aerts >>> Bioinformatics Group >> >> Hi Jan, >> >> I am not sure if you can cover everything with a the default active- >> record behavior. >> But I would be a happy user of a ruby EnsEMBL API. >> If you need/want help with it, I would also volunteer. >> >> Michael > From ktym at hgc.jp Thu May 10 11:42:53 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 20:42:53 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: Jan, You are added as a annex developer. > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails When I discussed with Nakao-san previously, we have agreed to include any Rails (or ActiveRecord etc.) dependent bioinformatics projects in the bioruby-annex repository (maybe we need to refine The purpose is 1. The number of codes which depend on Rails related libraries is increasing, but we have not yet decided how to integrate such modules in BioRuby. 2. We thought correcting various add-ons to BioRuby in one place might be useful (so that users can find your bioinformatics modules easily, and/or some of them might be integrated into core BioRuby library in the future). > (apart from the fact that it uses ActiveRecord in the background). So, I thought your project is appropriate to be included. > This might get confusing for possible users. Yes, we may need to re-write our project description to clarify. Volunteers? As the bioruby-annex seems to be functioning now, we need to discuss some rules for how to develop, release, and use sub products in it. My idea is still vague, but how about to have codes in SVN as /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : and each sub projects will release the package as a gem (or tar.gz) file prefixed with 'bioruby-' or 'bioruby-annex-' so that, for instance, you could release bioruby-ensembl-api-1.0.gem for downlaod from http://rubyforge.org/projects/bioruby-annex/ Toshiaki On 2007/05/10, at 19:34, jan aerts (RI) wrote: > Toshiaki, > > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails > (apart from the fact that it uses ActiveRecord in the background). > This might get confusing for possible users. > > jan. > > -----Original Message----- > From: Toshiaki Katayama [mailto:ktym at hgc.jp] > Sent: 10 May 2007 11:14 > To: jan aerts (RI) > Cc: Michael Han; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] Ensembl API > > Jan, > > In that case, I would like you to consider to use the rubyforge > repository 'bioruby-annex' which Nakao-san had set up. > > http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html > > When your modules matured and we, core developers, have decided how to > integrate Rails dependent modules in BioRuby, you can put them in the > BioRuby distribution. > > Toshiaki > > On 2007/05/10, at 18:09, jan aerts (RI) wrote: > >> I actually just started working on this API last night (in between >> some deadlines I got to catch), so haven't gotten so far as to think >> about caching. I'm basically working through the perl API tutorial >> (http://www.ensembl.org/info/software/core/core_tutorial.html) and try > >> to implement all those examples. (At the moment, I'm at the bit that >> says "Break chromosomal slices into smaller 100k component slices"...) >> >> Some hurdles that I see coming are the caching and projecting features > >> from one coord system to another. We'll see what happens when we get >> there. >> >> As for a public place: I would _very_ much appreciate help with the >> API, although >> >> QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: >> * Do you think it would be best to create a sourceforge project for >> this, or should I add it directly into bioruby (e.g. > Bio::Api::Ensembl)? >> I suppose the second option would be best, but the stuff I have is >> probably not polished enough yet... and *far* from complete. >> * Secondly: if a new release is coming: would it be best to wait >> untill _after_ that release? >> >> jan. >> >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 10 May 2007 09:51 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> >> On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >>> Michael, >>> >>> As you mention that we maybe won't be able to cover everything with >>> the default activerecord behaviour: what problems are you thinking > of? >>> >>> Note: I'd like to use the perl API as a guide. And indeed working out > >>> the Slice object was quite simple... >> >> Yes, I was thinking of the slices in combination with the different >> assembly levels. >> A change to the mapping part of that broke the last EnsEMBL release. >> There were some cases where a seq_region maps one-to-many to other >> seq_regions (also with gaps). >> Did you put all the caching stuff from the Perl API into it? And if >> not, is the performance ok? >> >> It would be also great if you could put the code into some public >> place (RubyForge as example), then it would be easier to see what is >> already working/being worked at and what not. >> >>> jan. >>> >>> -----Original Message----- >>> From: Michael Han [mailto:mh6 at sanger.ac.uk] >>> Sent: 09 May 2007 14:38 >>> To: jan aerts (RI) >>> Subject: Re: [BioRuby] Ensembl API >>> >>> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>>> Has anyone worked on an Ensembl API? There is a perl API and the >>>> database schema is well-documented. On first impression, it seems >>>> straightforward to make one using ActiveRecord, but I wouldn't want >>>> to >>> >>>> waste efforts on that if someone else is already working on it. >>>> >>>> See http://www.ensembl.org/info/software/core/index.html >>>> >>>> Dr Jan Aerts >>>> Bioinformatics Group >>> >>> Hi Jan, >>> >>> I am not sure if you can cover everything with a the default active- >>> record behavior. >>> But I would be a happy user of a ruby EnsEMBL API. >>> If you need/want help with it, I would also volunteer. >>> >>> Michael >> > From ktym at hgc.jp Thu May 10 11:45:40 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 20:45:40 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <057D48B5-57EC-464E-ABB7-05B059569253@hgc.jp> On 2007/05/10, at 20:42, Toshiaki Katayama wrote: > When I discussed with Nakao-san previously, we have agreed to > include any Rails (or ActiveRecord etc.) dependent bioinformatics > projects in the bioruby-annex repository (maybe we need to > refine refine the project description). Toshiaki From ktym at hgc.jp Thu May 10 11:49:15 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 10 May 2007 20:49:15 +0900 Subject: [BioRuby] Ensembl API In-Reply-To: References: <84DA9D8AC9B05F4B889E7C70238CB45104FB1706@rie2ksrv1.ri.bbsrc.ac.uk> Message-ID: <38E0CF6A-14CD-4905-947E-410D389F9CD9@hgc.jp> On 2007/05/10, at 20:42, Toshiaki Katayama wrote: > 2. We thought correcting various add-ons to BioRuby in one place s/correct/collect/ ... there might be many other mistakes as always :) Toshiaki From jan.aerts at bbsrc.ac.uk Thu May 10 13:29:01 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 14:29:01 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB170C@rie2ksrv1.ri.bbsrc.ac.uk> Removing the 'rails' from the project description would be a good idea. It would be great if this repository could also be the home for modules like Bio::Graphics (which is completely unrelated to rails or activerecord) jan. -----Original Message----- From: Toshiaki Katayama [mailto:ktym at hgc.jp] Sent: 10 May 2007 12:43 To: jan aerts (RI) Cc: BioRubyML Subject: Re: [BioRuby] Ensembl API Jan, You are added as a annex developer. > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails When I discussed with Nakao-san previously, we have agreed to include any Rails (or ActiveRecord etc.) dependent bioinformatics projects in the bioruby-annex repository (maybe we need to refine The purpose is 1. The number of codes which depend on Rails related libraries is increasing, but we have not yet decided how to integrate such modules in BioRuby. 2. We thought correcting various add-ons to BioRuby in one place might be useful (so that users can find your bioinformatics modules easily, and/or some of them might be integrated into core BioRuby library in the future). > (apart from the fact that it uses ActiveRecord in the background). So, I thought your project is appropriate to be included. > This might get confusing for possible users. Yes, we may need to re-write our project description to clarify. Volunteers? As the bioruby-annex seems to be functioning now, we need to discuss some rules for how to develop, release, and use sub products in it. My idea is still vague, but how about to have codes in SVN as /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : and each sub projects will release the package as a gem (or tar.gz) file prefixed with 'bioruby-' or 'bioruby-annex-' so that, for instance, you could release bioruby-ensembl-api-1.0.gem for downlaod from http://rubyforge.org/projects/bioruby-annex/ Toshiaki On 2007/05/10, at 19:34, jan aerts (RI) wrote: > Toshiaki, > > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails > (apart from the fact that it uses ActiveRecord in the background). > This might get confusing for possible users. > > jan. > > -----Original Message----- > From: Toshiaki Katayama [mailto:ktym at hgc.jp] > Sent: 10 May 2007 11:14 > To: jan aerts (RI) > Cc: Michael Han; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] Ensembl API > > Jan, > > In that case, I would like you to consider to use the rubyforge > repository 'bioruby-annex' which Nakao-san had set up. > > http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html > > When your modules matured and we, core developers, have decided how to > integrate Rails dependent modules in BioRuby, you can put them in the > BioRuby distribution. > > Toshiaki > > On 2007/05/10, at 18:09, jan aerts (RI) wrote: > >> I actually just started working on this API last night (in between >> some deadlines I got to catch), so haven't gotten so far as to think >> about caching. I'm basically working through the perl API tutorial >> (http://www.ensembl.org/info/software/core/core_tutorial.html) and >> try > >> to implement all those examples. (At the moment, I'm at the bit that >> says "Break chromosomal slices into smaller 100k component >> slices"...) >> >> Some hurdles that I see coming are the caching and projecting >> features > >> from one coord system to another. We'll see what happens when we get >> there. >> >> As for a public place: I would _very_ much appreciate help with the >> API, although >> >> QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: >> * Do you think it would be best to create a sourceforge project for >> this, or should I add it directly into bioruby (e.g. > Bio::Api::Ensembl)? >> I suppose the second option would be best, but the stuff I have is >> probably not polished enough yet... and *far* from complete. >> * Secondly: if a new release is coming: would it be best to wait >> untill _after_ that release? >> >> jan. >> >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 10 May 2007 09:51 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> >> On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >>> Michael, >>> >>> As you mention that we maybe won't be able to cover everything with >>> the default activerecord behaviour: what problems are you thinking > of? >>> >>> Note: I'd like to use the perl API as a guide. And indeed working >>> out > >>> the Slice object was quite simple... >> >> Yes, I was thinking of the slices in combination with the different >> assembly levels. >> A change to the mapping part of that broke the last EnsEMBL release. >> There were some cases where a seq_region maps one-to-many to other >> seq_regions (also with gaps). >> Did you put all the caching stuff from the Perl API into it? And if >> not, is the performance ok? >> >> It would be also great if you could put the code into some public >> place (RubyForge as example), then it would be easier to see what is >> already working/being worked at and what not. >> >>> jan. >>> >>> -----Original Message----- >>> From: Michael Han [mailto:mh6 at sanger.ac.uk] >>> Sent: 09 May 2007 14:38 >>> To: jan aerts (RI) >>> Subject: Re: [BioRuby] Ensembl API >>> >>> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>>> Has anyone worked on an Ensembl API? There is a perl API and the >>>> database schema is well-documented. On first impression, it seems >>>> straightforward to make one using ActiveRecord, but I wouldn't want >>>> to >>> >>>> waste efforts on that if someone else is already working on it. >>>> >>>> See http://www.ensembl.org/info/software/core/index.html >>>> >>>> Dr Jan Aerts >>>> Bioinformatics Group >>> >>> Hi Jan, >>> >>> I am not sure if you can cover everything with a the default active- >>> record behavior. >>> But I would be a happy user of a ruby EnsEMBL API. >>> If you need/want help with it, I would also volunteer. >>> >>> Michael >> > From jan.aerts at bbsrc.ac.uk Thu May 10 13:32:22 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Thu, 10 May 2007 14:32:22 +0100 Subject: [BioRuby] Ensembl API In-Reply-To: Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB170D@rie2ksrv1.ri.bbsrc.ac.uk> Toshiaki, Your idea of a directory structure as this seems a good choice. /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : OK if I add a subdirectory for 'ensembl_api'? (so that would be: /bioruby-annex/ensembl_api/) jan. -----Original Message----- From: Toshiaki Katayama [mailto:ktym at hgc.jp] Sent: 10 May 2007 12:43 To: jan aerts (RI) Cc: BioRubyML Subject: Re: [BioRuby] Ensembl API Jan, You are added as a annex developer. > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails When I discussed with Nakao-san previously, we have agreed to include any Rails (or ActiveRecord etc.) dependent bioinformatics projects in the bioruby-annex repository (maybe we need to refine The purpose is 1. The number of codes which depend on Rails related libraries is increasing, but we have not yet decided how to integrate such modules in BioRuby. 2. We thought correcting various add-ons to BioRuby in one place might be useful (so that users can find your bioinformatics modules easily, and/or some of them might be integrated into core BioRuby library in the future). > (apart from the fact that it uses ActiveRecord in the background). So, I thought your project is appropriate to be included. > This might get confusing for possible users. Yes, we may need to re-write our project description to clarify. Volunteers? As the bioruby-annex seems to be functioning now, we need to discuss some rules for how to develop, release, and use sub products in it. My idea is still vague, but how about to have codes in SVN as /bioruby-annex/rails/plugins <- for Rails' script/plugin source /bioruby-annex/project1/ /bioruby-annex/project2/ : and each sub projects will release the package as a gem (or tar.gz) file prefixed with 'bioruby-' or 'bioruby-annex-' so that, for instance, you could release bioruby-ensembl-api-1.0.gem for downlaod from http://rubyforge.org/projects/bioruby-annex/ Toshiaki On 2007/05/10, at 19:34, jan aerts (RI) wrote: > Toshiaki, > > I noticed that the bioruby-annex repository is for bioruby *rails* > plugins. However, the Ensembl API would have nothing to do with rails > (apart from the fact that it uses ActiveRecord in the background). > This might get confusing for possible users. > > jan. > > -----Original Message----- > From: Toshiaki Katayama [mailto:ktym at hgc.jp] > Sent: 10 May 2007 11:14 > To: jan aerts (RI) > Cc: Michael Han; bioruby at lists.open-bio.org > Subject: Re: [BioRuby] Ensembl API > > Jan, > > In that case, I would like you to consider to use the rubyforge > repository 'bioruby-annex' which Nakao-san had set up. > > http://lists.open-bio.org/pipermail/bioruby/2007-April/000355.html > > When your modules matured and we, core developers, have decided how to > integrate Rails dependent modules in BioRuby, you can put them in the > BioRuby distribution. > > Toshiaki > > On 2007/05/10, at 18:09, jan aerts (RI) wrote: > >> I actually just started working on this API last night (in between >> some deadlines I got to catch), so haven't gotten so far as to think >> about caching. I'm basically working through the perl API tutorial >> (http://www.ensembl.org/info/software/core/core_tutorial.html) and >> try > >> to implement all those examples. (At the moment, I'm at the bit that >> says "Break chromosomal slices into smaller 100k component >> slices"...) >> >> Some hurdles that I see coming are the caching and projecting >> features > >> from one coord system to another. We'll see what happens when we get >> there. >> >> As for a public place: I would _very_ much appreciate help with the >> API, although >> >> QUESTION FOR BIORUBY COMMUNITY AND DEVELOPERS: >> * Do you think it would be best to create a sourceforge project for >> this, or should I add it directly into bioruby (e.g. > Bio::Api::Ensembl)? >> I suppose the second option would be best, but the stuff I have is >> probably not polished enough yet... and *far* from complete. >> * Secondly: if a new release is coming: would it be best to wait >> untill _after_ that release? >> >> jan. >> >> >> -----Original Message----- >> From: Michael Han [mailto:mh6 at sanger.ac.uk] >> Sent: 10 May 2007 09:51 >> To: jan aerts (RI) >> Subject: Re: [BioRuby] Ensembl API >> >> >> On 10 May 2007, at 08:58, jan aerts ((RI)) wrote: >>> Michael, >>> >>> As you mention that we maybe won't be able to cover everything with >>> the default activerecord behaviour: what problems are you thinking > of? >>> >>> Note: I'd like to use the perl API as a guide. And indeed working >>> out > >>> the Slice object was quite simple... >> >> Yes, I was thinking of the slices in combination with the different >> assembly levels. >> A change to the mapping part of that broke the last EnsEMBL release. >> There were some cases where a seq_region maps one-to-many to other >> seq_regions (also with gaps). >> Did you put all the caching stuff from the Perl API into it? And if >> not, is the performance ok? >> >> It would be also great if you could put the code into some public >> place (RubyForge as example), then it would be easier to see what is >> already working/being worked at and what not. >> >>> jan. >>> >>> -----Original Message----- >>> From: Michael Han [mailto:mh6 at sanger.ac.uk] >>> Sent: 09 May 2007 14:38 >>> To: jan aerts (RI) >>> Subject: Re: [BioRuby] Ensembl API >>> >>> On 9 May 2007, at 14:14, jan aerts ((RI)) wrote: >>>> Has anyone worked on an Ensembl API? There is a perl API and the >>>> database schema is well-documented. On first impression, it seems >>>> straightforward to make one using ActiveRecord, but I wouldn't want >>>> to >>> >>>> waste efforts on that if someone else is already working on it. >>>> >>>> See http://www.ensembl.org/info/software/core/index.html >>>> >>>> Dr Jan Aerts >>>> Bioinformatics Group >>> >>> Hi Jan, >>> >>> I am not sure if you can cover everything with a the default active- >>> record behavior. >>> But I would be a happy user of a ruby EnsEMBL API. >>> If you need/want help with it, I would also volunteer. >>> >>> Michael >> > From fredjoha at bioreg.kyushu-u.ac.jp Tue May 15 05:54:46 2007 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Tue, 15 May 2007 14:54:46 +0900 Subject: [BioRuby] Bio::Blast not fully functional? Message-ID: <1179208486.7335.39.camel@fred-kyudai> Hello all, I am reading in the tutorial at http://dev.bioruby.org/wiki/en/?Tutorial.rd about BLAST and I try to use it according to this tutorial (see the code below). However, many of the entries in the 'hit' variable in the code below seems to be nil. hit.identity and hit.target_seq are for example two methods that just answer nil when I call them. Am I missing something? #!/usr/bin/env ruby require 'bio' factory = Bio::Blast.remote('blastp', 'nr-aa') Bio::FlatFile.open(Bio::FastaFormat, '/home/fred/pdb/test.fasta.txt') do |ff| ff.each do |entry| report = factory.query(entry) report.each do |hit| if hit.evalue < 0.001 puts hit.target_id puts hit.target_seq end end end end Thanks for any help! Best regards, Fredrik Johansson From hienle at club-internet.fr Tue May 15 15:30:27 2007 From: hienle at club-internet.fr (hienle at club-internet.fr) Date: Tue, 15 May 2007 17:30:27 +0200 Subject: [BioRuby] Parsing GFF3 attributes Message-ID: Hello all, I am working with a GFF3-formatted file and have noticed that the attributes field is not parsed properly. In bio/db/gff.rb, 75 def parse_attributes(attributes) 76 hash = Hash.new 77 attributes.split(/[^\\];/).each do |atr| 78 key, value = atr.split(' ', 2) 79 hash[key] = value 80 end 81 return hash 82 end 83 end I changed : 78 key, value = atr.split(' ', 2) to: 78 key, value = atr.split('=', 2) and it now appears to behave properly. However, I am not certain if this is appropriate for backward compatibility with GFF and GFF2. Is anyone working on parsing GFF3 files? Thank you in advance for your help, -Hien From mh6 at sanger.ac.uk Tue May 15 16:10:20 2007 From: mh6 at sanger.ac.uk (Michael Han) Date: Tue, 15 May 2007 17:10:20 +0100 Subject: [BioRuby] Parsing GFF3 attributes In-Reply-To: References: Message-ID: On 15 May 2007, at 16:30, hienle at club-internet.fr wrote: > Hello all, > > I am working with a GFF3-formatted file and have noticed that the > attributes field is not parsed properly. > > In bio/db/gff.rb, > > 75 def parse_attributes(attributes) > 76 hash = Hash.new > 77 attributes.split(/[^\\];/).each do |atr| > 78 key, value = atr.split(' ', 2) > 79 hash[key] = value > 80 end > 81 return hash > 82 end > 83 end > > I changed : > 78 key, value = atr.split(' ', 2) > to: > 78 key, value = atr.split('=', 2) > > and it now appears to behave properly. However, I am not certain if > this is appropriate for backward compatibility with GFF and GFF2. I use normally spaces between the key and the value of the attributes for GFF2 like: Gene "1234" ; Transcript "1234" as described in <"http://www.sanger.ac.uk/Software/formats/GFF/ GFF_Spec.shtml"> so it would break GFF2 / GFF parsing. Maybe you could create a separate GFF3 parser inheriting from the gff.rb . some GFF3 reference (note: last version from a few weeks ago) <"http://www.sequenceontology.org/gff3.shtml"> > Is anyone working on parsing GFF3 files? > > Thank you in advance for your help, > -Hien MIchael From hien.le at mail.mcgill.ca Wed May 16 07:01:54 2007 From: hien.le at mail.mcgill.ca (Hien Le) Date: Wed, 16 May 2007 03:01:54 -0400 Subject: [BioRuby] Parsing GFF3 attributes In-Reply-To: References: Message-ID: <20070516030154.esps58l1ws8c8o00@webmail.mcgill.ca> Quoting Michael Han : > so it would break GFF2 / GFF parsing. > Maybe you could create a separate GFF3 parser inheriting from the > gff.rb . OK, thanks Michael for the advice! -Hien From cihan at cihan.us Wed May 16 14:44:14 2007 From: cihan at cihan.us (cihan inan) Date: Wed, 16 May 2007 17:44:14 +0300 Subject: [BioRuby] Translate docs Message-ID: <5584352e0705160744w429281b6v3ccf3e8f105cac17@mail.gmail.com> hi I am cihan inan and I am from Turkey. I am a student at Biology. I am new in Ruby language. I want to translate some docs to Turkish. But I think I have to get permission. But I dont know who gives that permission? please help me about this topic. ( tell me the docs to start give me a way sth. ) thanks a lot. From fredjoha at bioreg.kyushu-u.ac.jp Fri May 18 01:17:12 2007 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Fri, 18 May 2007 10:17:12 +0900 Subject: [BioRuby] FASTA problem Message-ID: <1179451032.29172.33.camel@fred-kyudai> Hello, I had problem running a fasta search locally on my computer, and it turned out that Kernel.exec(*cmd) is not very happy to get the array cmd with nil as one of its element. This is however what happens in bio/app/fasta.rb, if ktup is not set. I changed this to make it work : --- fasta.rb.old 2007-05-18 09:55:01.000000000 +0900 +++ fasta.rb 2007-05-18 09:55:37.000000000 +0900 @@ -114,7 +114,8 @@ def exec_local(query) cmd = [ @program, *@options ] - cmd.concat([ '@', @db, @ktup ]) + cmd.concat([ '@', @db]) + cmd.push(@ktup) if @ktup report = nil Something to submit to the repository? Best regards, Fredrik From ktym at hgc.jp Fri May 18 15:23:41 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 00:23:41 +0900 Subject: [BioRuby] FASTA problem In-Reply-To: <1179451032.29172.33.camel@fred-kyudai> References: <1179451032.29172.33.camel@fred-kyudai> Message-ID: Hello Fredrik, Thank you for your fix! I have commited your patch to the CVS. Regards, Toshiaki Katayama On 2007/05/18, at 10:17, Fredrik Johansson wrote: > Hello, > I had problem running a fasta search locally on my computer, and it > turned out that Kernel.exec(*cmd) is not very happy to get the array cmd > with nil as one of its element. This is however what happens in > bio/app/fasta.rb, if ktup is not set. I changed this to make it work : > > --- fasta.rb.old 2007-05-18 09:55:01.000000000 +0900 > +++ fasta.rb 2007-05-18 09:55:37.000000000 +0900 > @@ -114,7 +114,8 @@ > > def exec_local(query) > cmd = [ @program, *@options ] > - cmd.concat([ '@', @db, @ktup ]) > + cmd.concat([ '@', @db]) > + cmd.push(@ktup) if @ktup > > report = nil > > Something to submit to the repository? > > Best regards, > Fredrik > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Fri May 18 15:24:24 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 00:24:24 +0900 Subject: [BioRuby] Translate docs In-Reply-To: <5584352e0705160744w429281b6v3ccf3e8f105cac17@mail.gmail.com> References: <5584352e0705160744w429281b6v3ccf3e8f105cac17@mail.gmail.com> Message-ID: <852FFA12-3475-46B7-82C6-250ACD7A6BD0@hgc.jp> Hi, You are free to translate the documents in the bioruby-x.x.x/doc/ directory or on web (wiki) pages. I apologize the main document, bioruby tutorial, is not updated (I've lost updated version with disk crash in this Feb and not restarted yet). Regards, Toshiaki Katayama On 2007/05/16, at 23:44, cihan inan wrote: > hi I am cihan inan and I am from Turkey. I am a student at Biology. I am new > in Ruby language. I want to translate some docs to Turkish. But I think I > have to get permission. But I dont know who gives that permission? please > help me about this topic. ( tell me the docs to start give me a way sth. ) > thanks a lot. > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Fri May 18 15:23:51 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 00:23:51 +0900 Subject: [BioRuby] Parsing GFF3 attributes In-Reply-To: References: Message-ID: <10A4A7B4-063F-459F-B9D6-56222F8C26E9@hgc.jp> Hien, Thank you for your report. In bio/db/gff.rb, we have Bio::GFF::GFF2 for version 2 spec and Bio::GFF::GFF3 for version 3 and I added your modification to the Bio::GFF::GFF3 class. Personally, I have not yet use GFF3 intensively, so if you think the class should have more functionality to support new features in GFF3, please propose. Toshiaki On 2007/05/16, at 1:10, Michael Han wrote: > > On 15 May 2007, at 16:30, hienle at club-internet.fr wrote: >> Hello all, >> >> I am working with a GFF3-formatted file and have noticed that the >> attributes field is not parsed properly. >> >> In bio/db/gff.rb, >> >> 75 def parse_attributes(attributes) >> 76 hash = Hash.new >> 77 attributes.split(/[^\\];/).each do |atr| >> 78 key, value = atr.split(' ', 2) >> 79 hash[key] = value >> 80 end >> 81 return hash >> 82 end >> 83 end >> >> I changed : >> 78 key, value = atr.split(' ', 2) >> to: >> 78 key, value = atr.split('=', 2) >> >> and it now appears to behave properly. However, I am not certain if >> this is appropriate for backward compatibility with GFF and GFF2. > > I use normally spaces between the key and the value of the attributes > for GFF2 like: Gene "1234" ; Transcript "1234" > as described in <"http://www.sanger.ac.uk/Software/formats/GFF/ > GFF_Spec.shtml"> > > so it would break GFF2 / GFF parsing. > Maybe you could create a separate GFF3 parser inheriting from the > gff.rb . > > some GFF3 reference (note: last version from a few weeks ago) > <"http://www.sequenceontology.org/gff3.shtml"> > >> Is anyone working on parsing GFF3 files? >> >> Thank you in advance for your help, >> -Hien > > MIchael > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Fri May 18 16:00:33 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 19 May 2007 01:00:33 +0900 Subject: [BioRuby] Bio::Blast not fully functional? In-Reply-To: <1179208486.7335.39.camel@fred-kyudai> References: <1179208486.7335.39.camel@fred-kyudai> Message-ID: <8B741702-DB3E-49D5-9920-9420336F4E61@hgc.jp> Fredrik, This is because Bio::Blast.remote uses '-m 8' option which returns a tabular output format without target sequences. You can use XML output for your purpose by changing the following line > factory = Bio::Blast.remote('blastp', 'nr-aa') to factory = Bio::Blast.remote('blastp', 'nr-aa', '-m 7') Regards, Toshiaki On 2007/05/15, at 14:54, Fredrik Johansson wrote: > Hello all, > I am reading in the tutorial at > http://dev.bioruby.org/wiki/en/?Tutorial.rd > about BLAST and I try to use it according to this tutorial (see the code > below). > However, many of the entries in the 'hit' variable in the code below > seems to be nil. hit.identity and hit.target_seq are for example two > methods that just answer nil when I call them. Am I missing something? > > #!/usr/bin/env ruby > require 'bio' > factory = Bio::Blast.remote('blastp', 'nr-aa') > Bio::FlatFile.open(Bio::FastaFormat, '/home/fred/pdb/test.fasta.txt') do > |ff| > ff.each do |entry| > report = factory.query(entry) > report.each do |hit| > if hit.evalue < 0.001 > puts hit.target_id > puts hit.target_seq > end > end > end > end > > Thanks for any help! > Best regards, > Fredrik Johansson > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From fredjoha at bioreg.kyushu-u.ac.jp Mon May 21 08:16:31 2007 From: fredjoha at bioreg.kyushu-u.ac.jp (Fredrik Johansson) Date: Mon, 21 May 2007 17:16:31 +0900 Subject: [BioRuby] Parsing FASTA result Message-ID: <1179735391.29172.91.camel@fred-kyudai> I have encountered a problem again when running FASTA. I got a huge amount of homologs from fasta (25 MB data) for one sequence, and then the Bio::Fasta::Report class gets this error when initializing: format10.rb:21:in `sub!': failed to allocate memory (NoMemoryError) so I made the following changes to my code. It is just a quick fix, and I am not sure about that 'else' case that I took away. It does not seem to be covered by the line that I added. Also I did not bother about the @list variable since it does not seem to be used anywhere. /Fredrik The patch: --- fasta/format10.rb 2007-05-21 16:50:38.000000000 +0900 +++ fasta/format10.new.rb 2007-05-21 16:52:55.000000000 +0900 @@ -17,13 +17,7 @@ def initialize(data) # header lines - brief list of the hits - if data.sub!(/.*\nThe best scores are/m, '') - data.sub!(/(.*)\n\n>>>/m, '') - @list = "The best scores are" + $1 - else - data.sub!(/.*\n!!\s+/m, '') - data.sub!(/.*/) { |x| @list = x; '' } - end + data = data[data.index("\n\n>>>")+5..data.size] # body lines - fasta execution result program, *hits = data.split(/\n>>/) From jan.aerts at bbsrc.ac.uk Mon May 21 14:05:17 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Mon, 21 May 2007 15:05:17 +0100 Subject: [BioRuby] Ensembl API for ruby Message-ID: <84DA9D8AC9B05F4B889E7C70238CB45104FB1784@rie2ksrv1.ri.bbsrc.ac.uk> All, I committed the first version of a ruby API to the bioruby-annex SVN on rubyforge. Although it is far from completed, it does already have its uses. Thought it would be good to commit as soon as possible so you guys get an idea of what the API would look like. I would like to follow the excellent perl API as much as possible (http://www.ensembl.org/info/software/core/core_tutorial.html). What it does at the moment: * All tables of the core database are covered by ActiveRecord classes. * A Slice object represents a continuous region of a genome. Slices can be used to obtain sequence, features or other information from a particular region of interest. * Coordinates can be transfered from one coordinate system to another. I hope I tested this thorougly enough (but still am a bit squeemish about it). What is not implemented yet: * A whole bunch of methods that are available to the perl objects that I would like to 'copy' to the ruby API. * The Variation and Compara databases. * The 'project' and 'transform' methods for features and slices (as explained on the perl tutorial mentioned above). To get this code, please go to http://rubyforge.org/projects/bioruby-annex/ You can export the code using SVN with the following command (without quotes): "svn checkout svn://rubyforge.org/var/svn/bioruby-annex". There should be a subdirectory 'ensembl-api', containing the API itself, the sample script, tests and of course all documentation. I've created a gem (available in the top-directory of the SVN export), but can't test if it actually works. Can someone please test it for me? Toshiaki: if the code works and the gem can be installed on other systems, could you please do a file release for the gem-file? Others: if people are interested to help me to develop this API, please let me know. Thanks, jan. Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 skype: aerts_ri ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From hien.le at mail.mcgill.ca Tue May 22 07:33:26 2007 From: hien.le at mail.mcgill.ca (Hien Le) Date: Tue, 22 May 2007 03:33:26 -0400 Subject: [BioRuby] Parsing GFF3 attributes Message-ID: <20070522033326.1zpegb2scgg0skws@webmail.mcgill.ca> On May 18, 2007, at 17:23, Toshiaki Katayama wrote: > and I added your modification to the Bio::GFF::GFF3 class. OK Thanks! > Personally, I have not yet use GFF3 intensively, so if you think the > class should have more functionality to support new features in GFF3, > please propose. Same for myself, I have just recently started using GFF3 formats. I'll let you know if I think the class needs added functionality. -Hien From christophercyll at gmail.com Thu May 24 22:22:56 2007 From: christophercyll at gmail.com (Topher Cyll) Date: Thu, 24 May 2007 18:22:56 -0400 Subject: [BioRuby] looking for a cool project to do using bioruby Message-ID: <2599499e0705241522k4e674a26q28ddc2bd021e17a8@mail.gmail.com> Hi BioRubyers, I'm a long time reader of this list, but I think this might be my first post. I'm in the process of writing a Ruby project book. Each chapter guides the reader through a new and unusual project they can code in Ruby. For comparison, in other chapters we do things like compose music, build a game, run simulations, implement Lisp, etc. I'd really like to include a project using BioRuby (since I think a lot of Rubyists would find it exciting). But with only one undergraduate bio-informatics class under my belt, I'm a little stuck on ideas for fun and interesting projects. So I thought I'd ask the experts! Can anyone think of a fun, interesting use for BioRuby that I could walk readers through? I like to have each project produce a final product, instead of just doing a tutorial, so I'm looking for a idea that would have some implementation work, but wouldn't be too, too difficult. Any ideas? Toph From s-merchant at northwestern.edu Fri May 25 15:39:55 2007 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Fri, 25 May 2007 10:39:55 -0500 Subject: [BioRuby] looking for a cool project to do using bioruby In-Reply-To: <2599499e0705241522k4e674a26q28ddc2bd021e17a8@mail.gmail.com> References: <2599499e0705241522k4e674a26q28ddc2bd021e17a8@mail.gmail.com> Message-ID: <000001c79ee2$f1eda7b0$c2987ca5@pc13> Hi Toph, I would say one of fun things to implement in Ruby would be Ontology visualization. The tool could be used to visualize any kind of ontology such as GO. It would be cool if this could be integrated in to a Rails app. Look at QuickGo which visualizes the Gene Ontology http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0042351 Hope this helps. Let me know if you any questions. Cheers, Sohel. -----Original Message----- From: bioruby-bounces at lists.open-bio.org [mailto:bioruby-bounces at lists.open-bio.org] On Behalf Of Topher Cyll Sent: Thursday, May 24, 2007 5:23 PM To: bioruby at lists.open-bio.org Subject: [BioRuby] looking for a cool project to do using bioruby Hi BioRubyers, I'm a long time reader of this list, but I think this might be my first post. I'm in the process of writing a Ruby project book. Each chapter guides the reader through a new and unusual project they can code in Ruby. For comparison, in other chapters we do things like compose music, build a game, run simulations, implement Lisp, etc. I'd really like to include a project using BioRuby (since I think a lot of Rubyists would find it exciting). But with only one undergraduate bio-informatics class under my belt, I'm a little stuck on ideas for fun and interesting projects. So I thought I'd ask the experts! Can anyone think of a fun, interesting use for BioRuby that I could walk readers through? I like to have each project produce a final product, instead of just doing a tutorial, so I'm looking for a idea that would have some implementation work, but wouldn't be too, too difficult. Any ideas? Toph _______________________________________________ BioRuby mailing list BioRuby at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioruby