From mkikkawa at gmail.com Sun Nov 4 01:01:14 2007 From: mkikkawa at gmail.com (Masahide Kikkawa) Date: Sun, 4 Nov 2007 15:01:14 +0900 Subject: [BioRuby] pubmed bug? Message-ID: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com> Hi, I'm new to bioruby mailing list. While ago, I reported a bug of bioruby to rubyforge. Seems like the bug was not fixed in the new bioruby release (1.1). Could someone take a look the following report? http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037 Thanks in advance. --------------------------------------------------------------- Masahide Kikkawa, M.D., Ph. D. Professor Structural Biology Graduate School of Science Kyoto University Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 JAPAN http://structure.biophys.kyoto-u.ac.jp/ Tel: +81-75-753-9421 FAX: +81-75-753-4218 --------------------------------------------------------------- From jan.aerts at bbsrc.ac.uk Sun Nov 4 06:51:47 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Sun, 4 Nov 2007 11:51:47 -0000 Subject: [BioRuby] pubmed bug? References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com> Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk> Hi Masahide. Sorry about not spotting this earlier. The bug fix has been committed to CVS now. @Toshiaki: could you set the status of the bug report on rubyforge to "Closed"? jan. -----Original Message----- From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa Sent: Sun 04/11/2007 06:01 To: bioruby at lists.open-bio.org Subject: [BioRuby] pubmed bug? Hi, I'm new to bioruby mailing list. While ago, I reported a bug of bioruby to rubyforge. Seems like the bug was not fixed in the new bioruby release (1.1). Could someone take a look the following report? http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037 Thanks in advance. --------------------------------------------------------------- Masahide Kikkawa, M.D., Ph. D. Professor Structural Biology Graduate School of Science Kyoto University Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 JAPAN http://structure.biophys.kyoto-u.ac.jp/ Tel: +81-75-753-9421 FAX: +81-75-753-4218 --------------------------------------------------------------- _______________________________________________ BioRuby mailing list BioRuby at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioruby From baj2107 at columbia.edu Sun Nov 4 15:32:39 2007 From: baj2107 at columbia.edu (Bernd Jagla) Date: Sun, 4 Nov 2007 15:32:39 -0500 Subject: [BioRuby] transcription factor binding site identification Message-ID: <01bb01c81f21$d871d620$0500a8c0@berndhome> Hi there, Is it possible with bioruby/ruby to scan a nucleotide sequence and search for binding sites of TFs? How would I do this? (I looked in the documentation but couldn't find it.) Thanks, Bernd From ktym at hgc.jp Sun Nov 4 21:03:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 5 Nov 2007 11:03:34 +0900 Subject: [BioRuby] pubmed bug? In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk> References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com> <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk> Message-ID: <97798D54-E2FC-43B1-A8A0-17905E39AEB1@hgc.jp> Kikkawa-san, I'm sorry I have never used the tracker on rubyforge as I just used the site to provide our BioRuby gem package. Jan, thanks for the fix. I changed the status to closed. Regards, Toshiaki Katayama On 2007/11/04, at 20:51, jan aerts (RI) wrote: > Hi Masahide. > > Sorry about not spotting this earlier. The bug fix has been committed to CVS now. > > @Toshiaki: could you set the status of the bug report on rubyforge to "Closed"? > > jan. > > > -----Original Message----- > From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa > Sent: Sun 04/11/2007 06:01 > To: bioruby at lists.open-bio.org > Subject: [BioRuby] pubmed bug? > > Hi, > I'm new to bioruby mailing list. While ago, I reported a bug of > bioruby to rubyforge. Seems like the bug was not fixed in the new > bioruby release (1.1). > Could someone take a look the following report? > http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037 > > Thanks in advance. > --------------------------------------------------------------- > Masahide Kikkawa, M.D., Ph. D. > Professor > Structural Biology > Graduate School of Science > Kyoto University > Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 > JAPAN > http://structure.biophys.kyoto-u.ac.jp/ > Tel: +81-75-753-9421 > FAX: +81-75-753-4218 > --------------------------------------------------------------- > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From kpatil at science.uva.nl Tue Nov 6 05:00:46 2007 From: kpatil at science.uva.nl (Kaustubh Patil) Date: Tue, 06 Nov 2007 11:00:46 +0100 Subject: [BioRuby] count parameter in Bio::PubMed.esearch Message-ID: <47303B4E.8020103@staff.science.uva.nl> Hi, Here is a suggestion/feature for Bio::PubMed.esearch. Currently it is not possible to use rettype=count (through options hash) in Bio::PubMed.esearch. To get this feature replace the following line in pubmed.rb (approx. line 97) result = result.scan(/(.*?)<\/Id>/m).flatten by if(hash['rettype']=="count") result = result.scan(/(.*?)<\/Count>/m).flatten result = result[0] else result = result.scan(/(.*?)<\/Id>/m).flatten end and it will return the count as a string, which can be easily converted to an integer by "result.to_i" I hope it is useful. Cheers, Kaustubh Patil PS: for more details on Entrez esearch parameters, please refer to; http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html From kpatil at science.uva.nl Tue Nov 6 04:54:00 2007 From: kpatil at science.uva.nl (Kaustubh Patil) Date: Tue, 06 Nov 2007 10:54:00 +0100 Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch Message-ID: <473039B8.1060600@staff.science.uva.nl> Hi, I would like to thank you for the BioRuby library, it is a very useful tool. I am doing some literature mining using Ruby and I use PubMed as my source. Here is some background for my question; It is not possible to search PubMed with logical operators, e.g. HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it returns empty result). It is due to the url encoding (i.e. CGI.escape) of the search term (approx. line 89 in pubmed.rb). If we remove this url encoding it is possible to make such queries. Now my question is, is it safe to remove this CGI.escape ? Thank you and regards, Kaustubh Patil From georgkam at gmail.com Thu Nov 8 02:09:47 2007 From: georgkam at gmail.com (George) Date: Thu, 08 Nov 2007 10:09:47 +0300 Subject: [BioRuby] English translation Message-ID: <4732B63B.8030702@gmail.com> Hi Nakao, Please how can i translate your blog to English? Thanks George From jan.aerts at bbsrc.ac.uk Thu Nov 8 04:06:02 2007 From: jan.aerts at bbsrc.ac.uk (Jan Aerts) Date: Thu, 08 Nov 2007 09:06:02 +0000 Subject: [BioRuby] biographics In-Reply-To: <4732B309.2050008@gmail.com> References: <4732B309.2050008@gmail.com> Message-ID: <1194512762.6300.19.camel@rilxvm05> Hey George. Thanks again for your interest in using Bio::Graphics. Concerning your first question: I'm trying to implement the notion of subfeatures in Bio::Graphics at the moment. I think that would serve your purpose. Unfortunately, this requires some refactoring of one of the core-classes in bioruby itself: Bio::Feature. I'm waiting for the big guys at bioruby for their ideas on implementing that. So at the moment, the best way of displaying this is to either display the domains separately, or to use the spliced glyph: even though they're not exons, this would at least link them up later. Do you want to display that protein in its genomic environment as well? Or do you just want to have the protein on its own with the domains? Could you send us a mockup of how you'd like to have this type of information (i.e. proteins and their domains) represented? Just a simple drawing will do. I haven't had to do this type of visualization yet myself, so would be interested in how you experts would like to do that. Concerning your second question: it looks like you're referencing a version of the library that I sent out a while ago on the mailing list. All code development is now run via rubyforge. The moment I put it on rubyforge the namespace was changed from BioExt (*bio*ruby *ext*ensions) to just Bio. Did you install a version via rubyforge (i.e. following the instructions on bio-graphics.rubyforge.org)? If so: change all references to BioExt::Graphics to Bio::Graphics. So the line my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600) would become my_panel = Bio::Graphics::Panel.new(1000, 1200, false, 1, 600) jan. PS: I've CC'd this reply to the bioruby mailing list if that's OK... On Thu, 2007-11-08 at 09:56 +0300, George wrote: > Hi Dr Jan. > > I have a chado based database system running on ruby on rails for > storing sequence and annotation data. > The Feature table contains the biological sequences represented as > features and the Feature location table contains the locations or bio > coordinates for each feature. > Let me explain with an example, a protein sequence is a feature. call it > prot_A. Our Prot_A can have domains A1, A2, etc. Now these domains are > actually features by themselves but they happen to be located within Prot_A. > > So in the feature table i have Prot_A, Domain A1, A2. > > In the Feature locations table call it Featureloc, (chado style) > > --------------------------------------------- > featureloc_id| featuresrc_id |fmin |fmax| > --------------------------------------------- > 1 null 1 200 > 2 1 1 20 > 3 1 30 60 > ---------------------------------------------- > > My aim is to represent these features graphically such that a user can > view a feature with its domains. > I would like to generate simple graphics for these features from a gff > formatted file which can be created on the fly from the database tables. > Any idea on how i can do that in rails and using the bio-graphics module? > > Secondly am getting the error > "F:/Netbeans_folder/vargene/lib/biographics.rb:6: uninitialized constant > BioExt (NameError) when i try to access the Bioext::Graphics::Panel.new > method while running the following code. > > require 'stringio' > require 'base64' > gem 'bio-graphics' > require 'bio-graphics' > > my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600) > > #Create and configure tracks > track_SNP = my_panel.add_track('SNP') > track_gene = my_panel.add_track('gene') > track_transcript = my_panel.add_track('transcript') > > track_SNP.feature_colour = [1,0,0] > track_SNP.feature_glyph = 'triangle' > track_gene.feature_glyph = 'directed_spliced' > track_transcript.feature_glyph = 'spliced' > track_transcript.feature_colour = [0,0.5,0] > > # Add data to tracks > DATA.each do |line| > line.chomp! > ref, type, name, location, link = line.split(/\s+/) > if link == '' > link = nil > end > if type == 'SNP' > track_SNP.add_feature(name, location, link) > elsif type == 'gene' > track_gene.add_feature(name, location, link) > elsif type == 'transcript' > track_transcript.add_feature(name, location, link) > end > end > > # And draw > my_panel.draw('c:/my_panel.png') > > __END__ > chr1 gene CYP2D6 complement(80..120) > chr1 gene ALDH 100..449 > chr1 SNP rs1234 107 > chr1 gene bla complement(400..430) > chr1 SNP rs9876 44 > chr1 gene some_gene > complement(join(170..231,264..299,350..360,409..445)) > chr1 transcript transcript1 join(250..300,390..425) > chr1 transcript transcript2 253..330 > chr1 transcript transcript3 266..344 > chr1 transcript transcript4 > complement(join(410..430,239..286,129..151)) > > Is the Bioext module really available within the current implementation > of the biographics gem? > > Thanks > > George > -- Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 skype: aerts_ri website: http://saaientist.blogspot.com ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From ngoto at gen-info.osaka-u.ac.jp Fri Nov 9 07:30:10 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 9 Nov 2007 21:30:10 +0900 Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch In-Reply-To: <473039B8.1060600@staff.science.uva.nl> References: <473039B8.1060600@staff.science.uva.nl> Message-ID: <20071109123012.8128D1CBC408@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 06 Nov 2007 10:54:00 +0100 Kaustubh Patil wrote: > Hi, > > I would like to thank you for the BioRuby library, it is a very useful > tool. I am doing some literature mining using Ruby and I use PubMed as > my source. Here is some background for my question; > > It is not possible to search PubMed with logical operators, e.g. > HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it > returns empty result). Probably you mean Bio::PubMed.esearch("HIV AND drug") Bio::PubMed.esearch("geneA OR geneB") More complicated example: Bio::PubMed.esearch("((p53 AND apoptosis) 2007/11[dp]) OR bioperl") You can use the same search terms as of NCBI PubMed seaech without any care about URL encoding. > It is due to the url encoding (i.e. CGI.escape) of the search term > (approx. line 89 in pubmed.rb). If we remove this url encoding it is > possible to make such queries. > > Now my question is, is it safe to remove this CGI.escape ? I think it is unsafe and should not be removed. -- Naohisa Goto ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp From n at bioruby.org Fri Nov 9 09:30:24 2007 From: n at bioruby.org (Mitsuteru Nakao) Date: Fri, 9 Nov 2007 23:30:24 +0900 Subject: [BioRuby] English translation In-Reply-To: <4732B63B.8030702@gmail.com> References: <4732B63B.8030702@gmail.com> Message-ID: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> Hi George, Of course OK. Please let me know the URL of my blog you mention. :-) On 11/8/07, George wrote: > Hi Nakao, > Please how can i translate your blog to English? Thanks Mitsuteru - Mitsuteru Nakao mn at kazusa.or.jp / n at bioruby.org From georgkam at gmail.com Sat Nov 10 03:39:24 2007 From: georgkam at gmail.com (George Githinji) Date: Sat, 10 Nov 2007 11:39:24 +0300 Subject: [BioRuby] English translation In-Reply-To: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> References: <4732B63B.8030702@gmail.com> <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> Message-ID: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com> Hi Nakao The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/ On Nov 9, 2007 5:30 PM, Mitsuteru Nakao wrote: > Hi George, > > Of course OK. > Please let me know the URL of my blog you mention. :-) > > On 11/8/07, George wrote: > > Hi Nakao, > > Please how can i translate your blog to English? > > Thanks > Mitsuteru > - > Mitsuteru Nakao > mn at kazusa.or.jp / n at bioruby.org > -- --------------- Sincerely George Skype: george_g2 Website: http://biorelated.wordpress.com/ From ktym at hgc.jp Sat Nov 10 03:40:09 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 10 Nov 2007 17:40:09 +0900 Subject: [BioRuby] count parameter in Bio::PubMed.esearch In-Reply-To: <47303B4E.8020103@staff.science.uva.nl> References: <47303B4E.8020103@staff.science.uva.nl> Message-ID: Hi Kaustubh, Thank you for your suggestion. I applied your changes to the CVS. During this process, I found that the previous fix applied by Jan was wrong. Developers, please do the test before you commit your changes. :) The change should be made to the Bio::PubMed.query method, however, the search method is also needed to be rewritten because the HTML structure returned by NCBI was reformatted. Anyway, in Bio::PubMed module, use of the esearch/efetch methods pair is strongly recommended compared to the search/query methods pair. bioruby> Bio::PubMed.search("(genome AND analysis) OR bioinformatics)") ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368"] bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)") ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368", "17988176", "17988086", "17987666", "17987374", "17987257", "17987048", "17986781", "17986522", "17986471", "17986460", "17986440", "17986356", "17986355", "17986329", "17986320", "17986282", "17986185", "17986079", "17985162", "17984568", "17984549", "17984548", "17984520", "17984228", "17984226", "17984208", "17984205", "17984085", "17984084", "17984080", "17983847", "17983807", "17983802", "17983573", "17983493", "17983269", "17983268", "17983157", "17982457", "17982456", "17982442", "17982427", "17982176", "17982123", "17981990", "17981981", "17981974", "17981891", "17981844", "17981816", "17981801", "17981746", "17981579", "17981546", "17981477", "17981060", "17981052", "17980519", "17980517", "17980477", "17980146", "17980047", "17980028", "17980019", "17979886", "17979725", "17979297", "17979181", "17978887", "17978880", "17978572", "17978498", "17978310", "17978184", "17978179", "17977886", "17977881", "17977850", "17977831", "17977670"] bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)", {'rettype' => 'count'}) ==> 286139 Regards, Toshiaki Katayama On 2007/11/06, at 19:00, Kaustubh Patil wrote: > Hi, > > Here is a suggestion/feature for Bio::PubMed.esearch. > > Currently it is not possible to use rettype=count (through options hash) in Bio::PubMed.esearch. > > To get this feature replace the following line in pubmed.rb (approx. line 97) > > result = result.scan(/(.*?)<\/Id>/m).flatten > > by > > if(hash['rettype']=="count") > result = result.scan(/(.*?)<\/Count>/m).flatten > result = result[0] > else > result = result.scan(/(.*?)<\/Id>/m).flatten > end > > > and it will return the count as a string, which can be easily converted to an integer by "result.to_i" > > I hope it is useful. > > Cheers, > Kaustubh Patil > > PS: for more details on Entrez esearch parameters, please refer to; > > http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Sun Nov 11 09:10:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sun, 11 Nov 2007 23:10:34 +0900 Subject: [BioRuby] English translation In-Reply-To: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com> References: <4732B63B.8030702@gmail.com> <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com> Message-ID: Hi George, Which did you mean? 1. you just want to read his blog in English 2. you want to translate his blog and make it publicly available In the case of 1, you can use free web translators http://www.google.com/language_tools http://babelfish.altavista.com/ http://www.worldlingo.com/en/products_services/worldlingo_translator.html http://www.freetranslation.com/ http://www.excite.co.jp/world/url/ quality of those machine translation are not good, though. In the case of 2, you can do it freely as Mitsuteru wrote. Toshiaki On 2007/11/10, at 17:39, George Githinji wrote: > Hi Nakao > The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/ > > On Nov 9, 2007 5:30 PM, Mitsuteru Nakao wrote: > >> Hi George, >> >> Of course OK. >> Please let me know the URL of my blog you mention. :-) >> >> On 11/8/07, George wrote: >>> Hi Nakao, >>> Please how can i translate your blog to English? >> >> Thanks >> Mitsuteru >> - >> Mitsuteru Nakao >> mn at kazusa.or.jp / n at bioruby.org >> > > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Website: http://biorelated.wordpress.com/ > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Sun Nov 11 10:12:44 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 12 Nov 2007 00:12:44 +0900 Subject: [BioRuby] transcription factor binding site identification In-Reply-To: <01bb01c81f21$d871d620$0500a8c0@berndhome> References: <01bb01c81f21$d871d620$0500a8c0@berndhome> Message-ID: <82F3F8EA-81DC-4B60-9715-8E968F123975@hgc.jp> Hi, If you want to search with TRANSFAC motifs, you can use the tfscan command in the EMBOSS package. Otherwise, you may need to define your own algorithm to search your motif. If your motif is in profile format, you need to develop profile search method. If your motif is simple and can be converted to regexp, the task would be relatively easy. # to find all occurrences results = seq.scan(regexp) # to find positions of match pos = 0 while pos = seq.index(regexp, pos + 1) puts pos end You may also interested in the Bio::Sequence#window_search method. Thanks, Toshiaki On 2007/11/05, at 5:32, Bernd Jagla wrote: > Hi there, > > > > Is it possible with bioruby/ruby to scan a nucleotide sequence and search > for binding sites of TFs? > > > > How would I do this? (I looked in the documentation but couldn't find it.) > > > > Thanks, > > > > Bernd > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From kpatil at science.uva.nl Mon Nov 12 06:09:33 2007 From: kpatil at science.uva.nl (Kaustubh Patil) Date: Mon, 12 Nov 2007 12:09:33 +0100 Subject: [BioRuby] Bio::PubMed efetch xml support and other options Message-ID: <4738346D.4060206@staff.science.uva.nl> Hi, XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method; # Kaustubh Patil: 6 Nov. 2007 # options hash here is different than options hash in esearch def self.efetch(ids, hash = {} ) return [] if ids.empty? # default options hash['retmode'] = 'xml' unless hash['retmode'] hash['rettype'] = 'medline' unless hash['rettype'] # create options array in required format opts = [] hash.each do |k, v| opts << "#{k}=#{v}" end host = "eutils.ncbi.nlm.nih.gov" path = "/entrez/eutils/efetch.fcgi?tool=bioruby&db=pubmed&#{opts.join('&')}&id=" ids = ids.join(",") http = Net::HTTP.new(host) response, = http.get(path + ids) result = response.body if(hash['retmode']=='text') result = result.split(/\n\n+/) end return result end I hope it is useful. Cheers, Kaustubh PS: for details of entrez efetch parameters http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html From jan.aerts at bbsrc.ac.uk Wed Nov 14 15:05:50 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 14 Nov 2007 20:05:50 -0000 Subject: [BioRuby] named arguments Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk> Hi staff, We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...): picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green) A good workaround is to use named parameter lists, which would make the previous code look like: picture.add_gene(:feature => my_gene, :colour => :green) which is much more readable. However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea? Really looking forward to your comments. jan. From ktym at hgc.jp Thu Nov 15 02:44:03 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 15 Nov 2007 16:44:03 +0900 Subject: [BioRuby] Bio::PubMed efetch xml support and other options In-Reply-To: <4738346D.4060206@staff.science.uva.nl> References: <4738346D.4060206@staff.science.uva.nl> Message-ID: <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp> Hi Patil, On 2007/11/12, at 20:09, Kaustubh Patil wrote: > XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method; Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods. Both methods are able to accept any E-Utils options as a hash. I will remove the suffix "2" from these method if the following incompatibility can be accepted. * changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility currently all of 1. efetch("123") 2. efetch("123", "456") 3. efetch(["123", "456"]) are accepted but 2. will be unavailable. Other notes: * default value for the retmode option remains "text" for the backward compatibility * both methods are rewritten to use Bio::Command.post_form to make the code clear * Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID) puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)") puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"}) puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"}) puts Bio::PubMed.efetch2("10592173") puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"}) Thanks, Toshiaki Katayama From ktym at hgc.jp Thu Nov 15 03:51:29 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 15 Nov 2007 17:51:29 +0900 Subject: [BioRuby] named arguments In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk> References: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk> Message-ID: <38F1FC51-FE7D-4F85-B7D1-DC4B5777E1E6@hgc.jp> Jan, There are several methods which accept hash as the last argument, so you are OK to proceed with it. Toshiaki On 2007/11/15, at 5:05, jan aerts ((RI)) wrote: > Hi staff, > > We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...): > > picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green) > > A good workaround is to use named parameter lists, which would make the previous code look like: > > picture.add_gene(:feature => my_gene, :colour => :green) > > which is much more readable. > > However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea? > > Really looking forward to your comments. > jan. From jan.aerts at bbsrc.ac.uk Mon Nov 19 05:35:59 2007 From: jan.aerts at bbsrc.ac.uk (Jan Aerts) Date: Mon, 19 Nov 2007 10:35:59 +0000 Subject: [BioRuby] [Fwd: Using BioRuby for parsing a .ptt file] Message-ID: <1195468559.25265.11.camel@rilxvm05> A post from Abhik Khanra. Could anyone help him out? Thanks, jan. -------- Forwarded Message -------- > From: Abhik Khanra > To: jan.aerts at bbsrc.ac.uk > Subject: Using BioRuby for parsing a .ptt file > Date: Sat, 10 Nov 2007 07:18:05 +0530 > > Hi. > > I came across your blog recently. It is a really good source of information. > > I have a query and have posted the same in the BioRuby mailing-list too. > It's just that i'm in a time-crunch. Hence i'm sending it to you as well. > Hope that would not be a problem for you. > > I'm working on a sample visualization application and leveraging > BioRuby for extracting target sequence origin and endpoints from BLAST > results. I obtained an example of this from the BioRuby tutorial. > Could you please let me know if there is any similar example using BioRuby for > extracting useful information from parsing a .ptt file? > > Thanks > Abhik -- Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From ktym at hgc.jp Tue Nov 20 10:38:32 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 21 Nov 2007 00:38:32 +0900 Subject: [BioRuby] Bio::PubMed efetch xml support and other options In-Reply-To: <4742EEE5.90400@staff.science.uva.nl> References: <4738346D.4060206@staff.science.uva.nl> <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp> <473C1708.9020306@staff.science.uva.nl> <4742EEE5.90400@staff.science.uva.nl> Message-ID: <577009FA-1E77-493A-A036-B7939230345A@hgc.jp> Hi Kaustubh, I've just committed the change that Bio::PubMed.efetch and esearch to wait for 3 seconds between consequent queries. I also renamed efetch2 and esearch2 (newer version, which accepts E-Util options as a hash) to efetch and esearch (old version). New version of efetch method breaks backward compatibility which could accept a list of ids as variable length arguments. >>>> 1. efetch("123") --> OK >>>> 2. efetch("123", "456") --> NG >>>> 3. efetch(["123", "456"]) --> OK Here, the pubmed IDs can be (array of) string or numeric. By the way, currently efetch method returns the following error. % ruby lib/bio/io/pubmed.rb : --- Retrieve PubMed entry by E-Utils --- Wed Nov 21 00:23:20 +0900 2007 1: id: 16381885 Error occurred: PubMed article server is not avaliable Wed Nov 21 00:23:23 +0900 2007 1: id: 16381885 Error occurred: PubMed article server is not avaliable Is this a temporal problem? I believe efetch2 was working when I have implemented. Regards, Toshiaki Katayama On 2007/11/20, at 23:27, Kaustubh Patil wrote: > Hi Toshiaki, > > Thanks for your email. Please find my answers embedded below; > > Thanks, > kaustubh > > Toshiaki Katayama wrote: > >> Hi Kaustubh, >> >> On 2007/11/15, at 18:53, Kaustubh Patil wrote: >> >> >>> Hi Toshiaki, >>> >>> Thank you very much for the improvements. There are some other desirable improvements; >>> >>> 1. PubMed has some timing restrictions on two consequitive queries. So it will be very nice if it can be implemented inside a function, like, esearch/efetch. >>> >> >> How about to have following method and call it within efetch and esearch methods before the Bio::Command.post_form? >> >> -------------------------------------------------- >> # Make no more than one request every 3 seconds. @@ncbi_interval = 3 >> @@last_accessed = nil >> >> def wait_access >> if @@last_accessed >> duration = Time.now - @@last_accessed >> if duration > @@ncbi_interval >> sleep @@ncbi_interval - duration >> end >> else >> @@last_accessed = Time.now >> end >> end >> -------------------------------------------------- >> > This could be a very good and quick implementation. In fact I use something similar for my usgae now. > >> By the way, NCBI also have another restriction: >> >> http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html >> >>> Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. >>> >>> Do you think this should also be taken care automatically? >>> >>> > I am aware of those restrictions. I will be very nice if this can be taken care automatically. There is a very good Library for accessing/using Medline through R, called MedlineR (btw currentl its not downloadable as their erver is down). MedlineR handles this automatically. > > There is another improvement I am thinking about. It is not possible to fetch a large number of documents in one go. I suppose this is mainly because on the practical restrictions on URL length, e.g. IE supports max 2,048 characters (although, I am not aware if PubMed imposes any limits). It will be useful (under some conditions) to cut the fetches into a number of parts and then return the combined result. What do you think? > >>> 2. Mapping terms to MeSH (I couldn't find this!). >>> >> >> >> I'm not sure how to accomplish this. >> > I will do bit more research on this and then get back to you. > >> >> >>> I will post other comments as I recollect them. I have another question (though it is not very appropriate place for it); >>> >>> Is there any Ruby library which can do some basic text mining tasks, like, tokenization, sentence boundary discrimination etc. ? >>> >> >> I think yes, but I'm not doing text mining for now, sorry ;-) >> > Yet I haven't find a Ruby library for that. I will keep on searching. > > Cheers, > Kaustubh > >> Thanks, >> Toshiaki >> >> >> >>> Cheers, >>> Kaustubh >>> >>> Toshiaki Katayama wrote: >>> >>> >>>> Hi Patil, >>>> >>>> On 2007/11/12, at 20:09, Kaustubh Patil wrote: >>>> >>>>> XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method; >>>>> >>>> Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods. >>>> >>>> Both methods are able to accept any E-Utils options as a hash. >>>> >>>> I will remove the suffix "2" from these method if the following incompatibility can be accepted. >>>> >>>> * changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility >>>> currently all of >>>> 1. efetch("123") >>>> 2. efetch("123", "456") >>>> 3. efetch(["123", "456"]) >>>> are accepted but 2. will be unavailable. >>>> >>>> Other notes: >>>> >>>> * default value for the retmode option remains "text" for the backward compatibility >>>> * both methods are rewritten to use Bio::Command.post_form to make the code clear >>>> * Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID) >>>> >>>> >>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)") >>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"}) >>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"}) >>>> >>>> puts Bio::PubMed.efetch2("10592173") >>>> puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"}) >>>> >>>> >>>> Thanks, >>>> Toshiaki Katayama >>>> >>>> _______________________________________________ >>>> BioRuby mailing list >>>> BioRuby at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioruby >>>> >>>> >> >> >> From ktym at hgc.jp Tue Nov 20 11:20:23 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 21 Nov 2007 01:20:23 +0900 Subject: [BioRuby] Bio::PubMed efetch xml support and other options In-Reply-To: <47430436.3070005@staff.science.uva.nl> References: <4738346D.4060206@staff.science.uva.nl> <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp> <473C1708.9020306@staff.science.uva.nl> <4742EEE5.90400@staff.science.uva.nl> <577009FA-1E77-493A-A036-B7939230345A@hgc.jp> <47430436.3070005@staff.science.uva.nl> Message-ID: <7787216D-B5E7-4EC3-B467-C62489CFDD4C@hgc.jp> Hi, On 2007/11/21, at 0:58, Kaustubh Patil wrote: > The problem was temporary (solved by now). I guess it was part of maintainance. Thank you. I've confirmed the tests are now working. Another issue: Most of the BioRuby classes (which access server) are designed to create a factory object first, e.g. server = Bio::Blast.remote(...) result = server.query(...) server = Bio::KEGG::API.new result = server.get_genes_by_pathway(...) However, Bio::PubMed is not. result = Bio::PubMed.esearch(...) I think this was caused only by a historical reason. Should I change this design to unify? server = Bio::PubMed.new result = server.esearch(...) Or provides both ways - what is the most excellent way to do this (to define methods and to make them also available as class methods)? def esearch(args) # real codes end def self.esearch(args) self.new.esearch(args) end Toshiaki > Toshiaki Katayama wrote: > >> By the way, currently efetch method returns the following error. >> >> % ruby lib/bio/io/pubmed.rb >> : >> --- Retrieve PubMed entry by E-Utils --- >> Wed Nov 21 00:23:20 +0900 2007 >> 1: id: 16381885 Error occurred: PubMed article server is not avaliable >> Wed Nov 21 00:23:23 +0900 2007 >> 1: id: 16381885 Error occurred: PubMed article server is not avaliable >> >> Is this a temporal problem? From mkikkawa at gmail.com Sun Nov 4 06:01:14 2007 From: mkikkawa at gmail.com (Masahide Kikkawa) Date: Sun, 4 Nov 2007 15:01:14 +0900 Subject: [BioRuby] pubmed bug? Message-ID: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com> Hi, I'm new to bioruby mailing list. While ago, I reported a bug of bioruby to rubyforge. Seems like the bug was not fixed in the new bioruby release (1.1). Could someone take a look the following report? http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037 Thanks in advance. --------------------------------------------------------------- Masahide Kikkawa, M.D., Ph. D. Professor Structural Biology Graduate School of Science Kyoto University Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 JAPAN http://structure.biophys.kyoto-u.ac.jp/ Tel: +81-75-753-9421 FAX: +81-75-753-4218 --------------------------------------------------------------- From jan.aerts at bbsrc.ac.uk Sun Nov 4 11:51:47 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Sun, 4 Nov 2007 11:51:47 -0000 Subject: [BioRuby] pubmed bug? References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com> Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk> Hi Masahide. Sorry about not spotting this earlier. The bug fix has been committed to CVS now. @Toshiaki: could you set the status of the bug report on rubyforge to "Closed"? jan. -----Original Message----- From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa Sent: Sun 04/11/2007 06:01 To: bioruby at lists.open-bio.org Subject: [BioRuby] pubmed bug? Hi, I'm new to bioruby mailing list. While ago, I reported a bug of bioruby to rubyforge. Seems like the bug was not fixed in the new bioruby release (1.1). Could someone take a look the following report? http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037 Thanks in advance. --------------------------------------------------------------- Masahide Kikkawa, M.D., Ph. D. Professor Structural Biology Graduate School of Science Kyoto University Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 JAPAN http://structure.biophys.kyoto-u.ac.jp/ Tel: +81-75-753-9421 FAX: +81-75-753-4218 --------------------------------------------------------------- _______________________________________________ BioRuby mailing list BioRuby at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioruby From baj2107 at columbia.edu Sun Nov 4 20:32:39 2007 From: baj2107 at columbia.edu (Bernd Jagla) Date: Sun, 4 Nov 2007 15:32:39 -0500 Subject: [BioRuby] transcription factor binding site identification Message-ID: <01bb01c81f21$d871d620$0500a8c0@berndhome> Hi there, Is it possible with bioruby/ruby to scan a nucleotide sequence and search for binding sites of TFs? How would I do this? (I looked in the documentation but couldn't find it.) Thanks, Bernd From ktym at hgc.jp Mon Nov 5 02:03:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 5 Nov 2007 11:03:34 +0900 Subject: [BioRuby] pubmed bug? In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk> References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com> <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk> Message-ID: <97798D54-E2FC-43B1-A8A0-17905E39AEB1@hgc.jp> Kikkawa-san, I'm sorry I have never used the tracker on rubyforge as I just used the site to provide our BioRuby gem package. Jan, thanks for the fix. I changed the status to closed. Regards, Toshiaki Katayama On 2007/11/04, at 20:51, jan aerts (RI) wrote: > Hi Masahide. > > Sorry about not spotting this earlier. The bug fix has been committed to CVS now. > > @Toshiaki: could you set the status of the bug report on rubyforge to "Closed"? > > jan. > > > -----Original Message----- > From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa > Sent: Sun 04/11/2007 06:01 > To: bioruby at lists.open-bio.org > Subject: [BioRuby] pubmed bug? > > Hi, > I'm new to bioruby mailing list. While ago, I reported a bug of > bioruby to rubyforge. Seems like the bug was not fixed in the new > bioruby release (1.1). > Could someone take a look the following report? > http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037 > > Thanks in advance. > --------------------------------------------------------------- > Masahide Kikkawa, M.D., Ph. D. > Professor > Structural Biology > Graduate School of Science > Kyoto University > Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502 > JAPAN > http://structure.biophys.kyoto-u.ac.jp/ > Tel: +81-75-753-9421 > FAX: +81-75-753-4218 > --------------------------------------------------------------- > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby > > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From kpatil at science.uva.nl Tue Nov 6 10:00:46 2007 From: kpatil at science.uva.nl (Kaustubh Patil) Date: Tue, 06 Nov 2007 11:00:46 +0100 Subject: [BioRuby] count parameter in Bio::PubMed.esearch Message-ID: <47303B4E.8020103@staff.science.uva.nl> Hi, Here is a suggestion/feature for Bio::PubMed.esearch. Currently it is not possible to use rettype=count (through options hash) in Bio::PubMed.esearch. To get this feature replace the following line in pubmed.rb (approx. line 97) result = result.scan(/(.*?)<\/Id>/m).flatten by if(hash['rettype']=="count") result = result.scan(/(.*?)<\/Count>/m).flatten result = result[0] else result = result.scan(/(.*?)<\/Id>/m).flatten end and it will return the count as a string, which can be easily converted to an integer by "result.to_i" I hope it is useful. Cheers, Kaustubh Patil PS: for more details on Entrez esearch parameters, please refer to; http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html From kpatil at science.uva.nl Tue Nov 6 09:54:00 2007 From: kpatil at science.uva.nl (Kaustubh Patil) Date: Tue, 06 Nov 2007 10:54:00 +0100 Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch Message-ID: <473039B8.1060600@staff.science.uva.nl> Hi, I would like to thank you for the BioRuby library, it is a very useful tool. I am doing some literature mining using Ruby and I use PubMed as my source. Here is some background for my question; It is not possible to search PubMed with logical operators, e.g. HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it returns empty result). It is due to the url encoding (i.e. CGI.escape) of the search term (approx. line 89 in pubmed.rb). If we remove this url encoding it is possible to make such queries. Now my question is, is it safe to remove this CGI.escape ? Thank you and regards, Kaustubh Patil From georgkam at gmail.com Thu Nov 8 07:09:47 2007 From: georgkam at gmail.com (George) Date: Thu, 08 Nov 2007 10:09:47 +0300 Subject: [BioRuby] English translation Message-ID: <4732B63B.8030702@gmail.com> Hi Nakao, Please how can i translate your blog to English? Thanks George From jan.aerts at bbsrc.ac.uk Thu Nov 8 09:06:02 2007 From: jan.aerts at bbsrc.ac.uk (Jan Aerts) Date: Thu, 08 Nov 2007 09:06:02 +0000 Subject: [BioRuby] biographics In-Reply-To: <4732B309.2050008@gmail.com> References: <4732B309.2050008@gmail.com> Message-ID: <1194512762.6300.19.camel@rilxvm05> Hey George. Thanks again for your interest in using Bio::Graphics. Concerning your first question: I'm trying to implement the notion of subfeatures in Bio::Graphics at the moment. I think that would serve your purpose. Unfortunately, this requires some refactoring of one of the core-classes in bioruby itself: Bio::Feature. I'm waiting for the big guys at bioruby for their ideas on implementing that. So at the moment, the best way of displaying this is to either display the domains separately, or to use the spliced glyph: even though they're not exons, this would at least link them up later. Do you want to display that protein in its genomic environment as well? Or do you just want to have the protein on its own with the domains? Could you send us a mockup of how you'd like to have this type of information (i.e. proteins and their domains) represented? Just a simple drawing will do. I haven't had to do this type of visualization yet myself, so would be interested in how you experts would like to do that. Concerning your second question: it looks like you're referencing a version of the library that I sent out a while ago on the mailing list. All code development is now run via rubyforge. The moment I put it on rubyforge the namespace was changed from BioExt (*bio*ruby *ext*ensions) to just Bio. Did you install a version via rubyforge (i.e. following the instructions on bio-graphics.rubyforge.org)? If so: change all references to BioExt::Graphics to Bio::Graphics. So the line my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600) would become my_panel = Bio::Graphics::Panel.new(1000, 1200, false, 1, 600) jan. PS: I've CC'd this reply to the bioruby mailing list if that's OK... On Thu, 2007-11-08 at 09:56 +0300, George wrote: > Hi Dr Jan. > > I have a chado based database system running on ruby on rails for > storing sequence and annotation data. > The Feature table contains the biological sequences represented as > features and the Feature location table contains the locations or bio > coordinates for each feature. > Let me explain with an example, a protein sequence is a feature. call it > prot_A. Our Prot_A can have domains A1, A2, etc. Now these domains are > actually features by themselves but they happen to be located within Prot_A. > > So in the feature table i have Prot_A, Domain A1, A2. > > In the Feature locations table call it Featureloc, (chado style) > > --------------------------------------------- > featureloc_id| featuresrc_id |fmin |fmax| > --------------------------------------------- > 1 null 1 200 > 2 1 1 20 > 3 1 30 60 > ---------------------------------------------- > > My aim is to represent these features graphically such that a user can > view a feature with its domains. > I would like to generate simple graphics for these features from a gff > formatted file which can be created on the fly from the database tables. > Any idea on how i can do that in rails and using the bio-graphics module? > > Secondly am getting the error > "F:/Netbeans_folder/vargene/lib/biographics.rb:6: uninitialized constant > BioExt (NameError) when i try to access the Bioext::Graphics::Panel.new > method while running the following code. > > require 'stringio' > require 'base64' > gem 'bio-graphics' > require 'bio-graphics' > > my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600) > > #Create and configure tracks > track_SNP = my_panel.add_track('SNP') > track_gene = my_panel.add_track('gene') > track_transcript = my_panel.add_track('transcript') > > track_SNP.feature_colour = [1,0,0] > track_SNP.feature_glyph = 'triangle' > track_gene.feature_glyph = 'directed_spliced' > track_transcript.feature_glyph = 'spliced' > track_transcript.feature_colour = [0,0.5,0] > > # Add data to tracks > DATA.each do |line| > line.chomp! > ref, type, name, location, link = line.split(/\s+/) > if link == '' > link = nil > end > if type == 'SNP' > track_SNP.add_feature(name, location, link) > elsif type == 'gene' > track_gene.add_feature(name, location, link) > elsif type == 'transcript' > track_transcript.add_feature(name, location, link) > end > end > > # And draw > my_panel.draw('c:/my_panel.png') > > __END__ > chr1 gene CYP2D6 complement(80..120) > chr1 gene ALDH 100..449 > chr1 SNP rs1234 107 > chr1 gene bla complement(400..430) > chr1 SNP rs9876 44 > chr1 gene some_gene > complement(join(170..231,264..299,350..360,409..445)) > chr1 transcript transcript1 join(250..300,390..425) > chr1 transcript transcript2 253..330 > chr1 transcript transcript3 266..344 > chr1 transcript transcript4 > complement(join(410..430,239..286,129..151)) > > Is the Bioext module really available within the current implementation > of the biographics gem? > > Thanks > > George > -- Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 skype: aerts_ri website: http://saaientist.blogspot.com ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From ngoto at gen-info.osaka-u.ac.jp Fri Nov 9 12:30:10 2007 From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO) Date: Fri, 9 Nov 2007 21:30:10 +0900 Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch In-Reply-To: <473039B8.1060600@staff.science.uva.nl> References: <473039B8.1060600@staff.science.uva.nl> Message-ID: <20071109123012.8128D1CBC408@idnmail.gen-info.osaka-u.ac.jp> Hi, On Tue, 06 Nov 2007 10:54:00 +0100 Kaustubh Patil wrote: > Hi, > > I would like to thank you for the BioRuby library, it is a very useful > tool. I am doing some literature mining using Ruby and I use PubMed as > my source. Here is some background for my question; > > It is not possible to search PubMed with logical operators, e.g. > HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it > returns empty result). Probably you mean Bio::PubMed.esearch("HIV AND drug") Bio::PubMed.esearch("geneA OR geneB") More complicated example: Bio::PubMed.esearch("((p53 AND apoptosis) 2007/11[dp]) OR bioperl") You can use the same search terms as of NCBI PubMed seaech without any care about URL encoding. > It is due to the url encoding (i.e. CGI.escape) of the search term > (approx. line 89 in pubmed.rb). If we remove this url encoding it is > possible to make such queries. > > Now my question is, is it safe to remove this CGI.escape ? I think it is unsafe and should not be removed. -- Naohisa Goto ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp From n at bioruby.org Fri Nov 9 14:30:24 2007 From: n at bioruby.org (Mitsuteru Nakao) Date: Fri, 9 Nov 2007 23:30:24 +0900 Subject: [BioRuby] English translation In-Reply-To: <4732B63B.8030702@gmail.com> References: <4732B63B.8030702@gmail.com> Message-ID: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> Hi George, Of course OK. Please let me know the URL of my blog you mention. :-) On 11/8/07, George wrote: > Hi Nakao, > Please how can i translate your blog to English? Thanks Mitsuteru - Mitsuteru Nakao mn at kazusa.or.jp / n at bioruby.org From georgkam at gmail.com Sat Nov 10 08:39:24 2007 From: georgkam at gmail.com (George Githinji) Date: Sat, 10 Nov 2007 11:39:24 +0300 Subject: [BioRuby] English translation In-Reply-To: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> References: <4732B63B.8030702@gmail.com> <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> Message-ID: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com> Hi Nakao The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/ On Nov 9, 2007 5:30 PM, Mitsuteru Nakao wrote: > Hi George, > > Of course OK. > Please let me know the URL of my blog you mention. :-) > > On 11/8/07, George wrote: > > Hi Nakao, > > Please how can i translate your blog to English? > > Thanks > Mitsuteru > - > Mitsuteru Nakao > mn at kazusa.or.jp / n at bioruby.org > -- --------------- Sincerely George Skype: george_g2 Website: http://biorelated.wordpress.com/ From ktym at hgc.jp Sat Nov 10 08:40:09 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sat, 10 Nov 2007 17:40:09 +0900 Subject: [BioRuby] count parameter in Bio::PubMed.esearch In-Reply-To: <47303B4E.8020103@staff.science.uva.nl> References: <47303B4E.8020103@staff.science.uva.nl> Message-ID: Hi Kaustubh, Thank you for your suggestion. I applied your changes to the CVS. During this process, I found that the previous fix applied by Jan was wrong. Developers, please do the test before you commit your changes. :) The change should be made to the Bio::PubMed.query method, however, the search method is also needed to be rewritten because the HTML structure returned by NCBI was reformatted. Anyway, in Bio::PubMed module, use of the esearch/efetch methods pair is strongly recommended compared to the search/query methods pair. bioruby> Bio::PubMed.search("(genome AND analysis) OR bioinformatics)") ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368"] bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)") ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368", "17988176", "17988086", "17987666", "17987374", "17987257", "17987048", "17986781", "17986522", "17986471", "17986460", "17986440", "17986356", "17986355", "17986329", "17986320", "17986282", "17986185", "17986079", "17985162", "17984568", "17984549", "17984548", "17984520", "17984228", "17984226", "17984208", "17984205", "17984085", "17984084", "17984080", "17983847", "17983807", "17983802", "17983573", "17983493", "17983269", "17983268", "17983157", "17982457", "17982456", "17982442", "17982427", "17982176", "17982123", "17981990", "17981981", "17981974", "17981891", "17981844", "17981816", "17981801", "17981746", "17981579", "17981546", "17981477", "17981060", "17981052", "17980519", "17980517", "17980477", "17980146", "17980047", "17980028", "17980019", "17979886", "17979725", "17979297", "17979181", "17978887", "17978880", "17978572", "17978498", "17978310", "17978184", "17978179", "17977886", "17977881", "17977850", "17977831", "17977670"] bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)", {'rettype' => 'count'}) ==> 286139 Regards, Toshiaki Katayama On 2007/11/06, at 19:00, Kaustubh Patil wrote: > Hi, > > Here is a suggestion/feature for Bio::PubMed.esearch. > > Currently it is not possible to use rettype=count (through options hash) in Bio::PubMed.esearch. > > To get this feature replace the following line in pubmed.rb (approx. line 97) > > result = result.scan(/(.*?)<\/Id>/m).flatten > > by > > if(hash['rettype']=="count") > result = result.scan(/(.*?)<\/Count>/m).flatten > result = result[0] > else > result = result.scan(/(.*?)<\/Id>/m).flatten > end > > > and it will return the count as a string, which can be easily converted to an integer by "result.to_i" > > I hope it is useful. > > Cheers, > Kaustubh Patil > > PS: for more details on Entrez esearch parameters, please refer to; > > http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Sun Nov 11 14:10:34 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Sun, 11 Nov 2007 23:10:34 +0900 Subject: [BioRuby] English translation In-Reply-To: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com> References: <4732B63B.8030702@gmail.com> <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com> <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com> Message-ID: Hi George, Which did you mean? 1. you just want to read his blog in English 2. you want to translate his blog and make it publicly available In the case of 1, you can use free web translators http://www.google.com/language_tools http://babelfish.altavista.com/ http://www.worldlingo.com/en/products_services/worldlingo_translator.html http://www.freetranslation.com/ http://www.excite.co.jp/world/url/ quality of those machine translation are not good, though. In the case of 2, you can do it freely as Mitsuteru wrote. Toshiaki On 2007/11/10, at 17:39, George Githinji wrote: > Hi Nakao > The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/ > > On Nov 9, 2007 5:30 PM, Mitsuteru Nakao wrote: > >> Hi George, >> >> Of course OK. >> Please let me know the URL of my blog you mention. :-) >> >> On 11/8/07, George wrote: >>> Hi Nakao, >>> Please how can i translate your blog to English? >> >> Thanks >> Mitsuteru >> - >> Mitsuteru Nakao >> mn at kazusa.or.jp / n at bioruby.org >> > > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Website: http://biorelated.wordpress.com/ > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From ktym at hgc.jp Sun Nov 11 15:12:44 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Mon, 12 Nov 2007 00:12:44 +0900 Subject: [BioRuby] transcription factor binding site identification In-Reply-To: <01bb01c81f21$d871d620$0500a8c0@berndhome> References: <01bb01c81f21$d871d620$0500a8c0@berndhome> Message-ID: <82F3F8EA-81DC-4B60-9715-8E968F123975@hgc.jp> Hi, If you want to search with TRANSFAC motifs, you can use the tfscan command in the EMBOSS package. Otherwise, you may need to define your own algorithm to search your motif. If your motif is in profile format, you need to develop profile search method. If your motif is simple and can be converted to regexp, the task would be relatively easy. # to find all occurrences results = seq.scan(regexp) # to find positions of match pos = 0 while pos = seq.index(regexp, pos + 1) puts pos end You may also interested in the Bio::Sequence#window_search method. Thanks, Toshiaki On 2007/11/05, at 5:32, Bernd Jagla wrote: > Hi there, > > > > Is it possible with bioruby/ruby to scan a nucleotide sequence and search > for binding sites of TFs? > > > > How would I do this? (I looked in the documentation but couldn't find it.) > > > > Thanks, > > > > Bernd > > _______________________________________________ > BioRuby mailing list > BioRuby at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioruby From kpatil at science.uva.nl Mon Nov 12 11:09:33 2007 From: kpatil at science.uva.nl (Kaustubh Patil) Date: Mon, 12 Nov 2007 12:09:33 +0100 Subject: [BioRuby] Bio::PubMed efetch xml support and other options Message-ID: <4738346D.4060206@staff.science.uva.nl> Hi, XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method; # Kaustubh Patil: 6 Nov. 2007 # options hash here is different than options hash in esearch def self.efetch(ids, hash = {} ) return [] if ids.empty? # default options hash['retmode'] = 'xml' unless hash['retmode'] hash['rettype'] = 'medline' unless hash['rettype'] # create options array in required format opts = [] hash.each do |k, v| opts << "#{k}=#{v}" end host = "eutils.ncbi.nlm.nih.gov" path = "/entrez/eutils/efetch.fcgi?tool=bioruby&db=pubmed&#{opts.join('&')}&id=" ids = ids.join(",") http = Net::HTTP.new(host) response, = http.get(path + ids) result = response.body if(hash['retmode']=='text') result = result.split(/\n\n+/) end return result end I hope it is useful. Cheers, Kaustubh PS: for details of entrez efetch parameters http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html From jan.aerts at bbsrc.ac.uk Wed Nov 14 20:05:50 2007 From: jan.aerts at bbsrc.ac.uk (jan aerts (RI)) Date: Wed, 14 Nov 2007 20:05:50 -0000 Subject: [BioRuby] named arguments Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk> Hi staff, We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...): picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green) A good workaround is to use named parameter lists, which would make the previous code look like: picture.add_gene(:feature => my_gene, :colour => :green) which is much more readable. However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea? Really looking forward to your comments. jan. From ktym at hgc.jp Thu Nov 15 07:44:03 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 15 Nov 2007 16:44:03 +0900 Subject: [BioRuby] Bio::PubMed efetch xml support and other options In-Reply-To: <4738346D.4060206@staff.science.uva.nl> References: <4738346D.4060206@staff.science.uva.nl> Message-ID: <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp> Hi Patil, On 2007/11/12, at 20:09, Kaustubh Patil wrote: > XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method; Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods. Both methods are able to accept any E-Utils options as a hash. I will remove the suffix "2" from these method if the following incompatibility can be accepted. * changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility currently all of 1. efetch("123") 2. efetch("123", "456") 3. efetch(["123", "456"]) are accepted but 2. will be unavailable. Other notes: * default value for the retmode option remains "text" for the backward compatibility * both methods are rewritten to use Bio::Command.post_form to make the code clear * Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID) puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)") puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"}) puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"}) puts Bio::PubMed.efetch2("10592173") puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"}) Thanks, Toshiaki Katayama From ktym at hgc.jp Thu Nov 15 08:51:29 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Thu, 15 Nov 2007 17:51:29 +0900 Subject: [BioRuby] named arguments In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk> References: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk> Message-ID: <38F1FC51-FE7D-4F85-B7D1-DC4B5777E1E6@hgc.jp> Jan, There are several methods which accept hash as the last argument, so you are OK to proceed with it. Toshiaki On 2007/11/15, at 5:05, jan aerts ((RI)) wrote: > Hi staff, > > We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...): > > picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green) > > A good workaround is to use named parameter lists, which would make the previous code look like: > > picture.add_gene(:feature => my_gene, :colour => :green) > > which is much more readable. > > However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea? > > Really looking forward to your comments. > jan. From jan.aerts at bbsrc.ac.uk Mon Nov 19 10:35:59 2007 From: jan.aerts at bbsrc.ac.uk (Jan Aerts) Date: Mon, 19 Nov 2007 10:35:59 +0000 Subject: [BioRuby] [Fwd: Using BioRuby for parsing a .ptt file] Message-ID: <1195468559.25265.11.camel@rilxvm05> A post from Abhik Khanra. Could anyone help him out? Thanks, jan. -------- Forwarded Message -------- > From: Abhik Khanra > To: jan.aerts at bbsrc.ac.uk > Subject: Using BioRuby for parsing a .ptt file > Date: Sat, 10 Nov 2007 07:18:05 +0530 > > Hi. > > I came across your blog recently. It is a really good source of information. > > I have a query and have posted the same in the BioRuby mailing-list too. > It's just that i'm in a time-crunch. Hence i'm sending it to you as well. > Hope that would not be a problem for you. > > I'm working on a sample visualization application and leveraging > BioRuby for extracting target sequence origin and endpoints from BLAST > results. I obtained an example of this from the BioRuby tutorial. > Could you please let me know if there is any similar example using BioRuby for > extracting useful information from parsing a .ptt file? > > Thanks > Abhik -- Dr Jan Aerts Bioinformatics Group Roslin Institute Roslin EH25 9PS Scotland, UK tel: +44 131 527 4198 ----...and the obligatory disclaimer---- Roslin Institute is a company limited by guarantee, registered in Scotland (registered number SC157100) and a Scottish Charity (registered number SC023592). Our registered office is at Roslin, Midlothian, EH25 9PS. VAT registration number 847380013. The information contained in this e-mail (including any attachments) is confidential and is intended for the use of the addressee only. The opinions expressed within this e-mail (including any attachments) are the opinions of the sender and do not necessarily constitute those of Roslin Institute (Edinburgh) ("the Institute") unless specifically stated by a sender who is duly authorised to do so on behalf of the Institute. From ktym at hgc.jp Tue Nov 20 15:38:32 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 21 Nov 2007 00:38:32 +0900 Subject: [BioRuby] Bio::PubMed efetch xml support and other options In-Reply-To: <4742EEE5.90400@staff.science.uva.nl> References: <4738346D.4060206@staff.science.uva.nl> <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp> <473C1708.9020306@staff.science.uva.nl> <4742EEE5.90400@staff.science.uva.nl> Message-ID: <577009FA-1E77-493A-A036-B7939230345A@hgc.jp> Hi Kaustubh, I've just committed the change that Bio::PubMed.efetch and esearch to wait for 3 seconds between consequent queries. I also renamed efetch2 and esearch2 (newer version, which accepts E-Util options as a hash) to efetch and esearch (old version). New version of efetch method breaks backward compatibility which could accept a list of ids as variable length arguments. >>>> 1. efetch("123") --> OK >>>> 2. efetch("123", "456") --> NG >>>> 3. efetch(["123", "456"]) --> OK Here, the pubmed IDs can be (array of) string or numeric. By the way, currently efetch method returns the following error. % ruby lib/bio/io/pubmed.rb : --- Retrieve PubMed entry by E-Utils --- Wed Nov 21 00:23:20 +0900 2007 1: id: 16381885 Error occurred: PubMed article server is not avaliable Wed Nov 21 00:23:23 +0900 2007 1: id: 16381885 Error occurred: PubMed article server is not avaliable Is this a temporal problem? I believe efetch2 was working when I have implemented. Regards, Toshiaki Katayama On 2007/11/20, at 23:27, Kaustubh Patil wrote: > Hi Toshiaki, > > Thanks for your email. Please find my answers embedded below; > > Thanks, > kaustubh > > Toshiaki Katayama wrote: > >> Hi Kaustubh, >> >> On 2007/11/15, at 18:53, Kaustubh Patil wrote: >> >> >>> Hi Toshiaki, >>> >>> Thank you very much for the improvements. There are some other desirable improvements; >>> >>> 1. PubMed has some timing restrictions on two consequitive queries. So it will be very nice if it can be implemented inside a function, like, esearch/efetch. >>> >> >> How about to have following method and call it within efetch and esearch methods before the Bio::Command.post_form? >> >> -------------------------------------------------- >> # Make no more than one request every 3 seconds. @@ncbi_interval = 3 >> @@last_accessed = nil >> >> def wait_access >> if @@last_accessed >> duration = Time.now - @@last_accessed >> if duration > @@ncbi_interval >> sleep @@ncbi_interval - duration >> end >> else >> @@last_accessed = Time.now >> end >> end >> -------------------------------------------------- >> > This could be a very good and quick implementation. In fact I use something similar for my usgae now. > >> By the way, NCBI also have another restriction: >> >> http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html >> >>> Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time weekdays for any series of more than 100 requests. >>> >>> Do you think this should also be taken care automatically? >>> >>> > I am aware of those restrictions. I will be very nice if this can be taken care automatically. There is a very good Library for accessing/using Medline through R, called MedlineR (btw currentl its not downloadable as their erver is down). MedlineR handles this automatically. > > There is another improvement I am thinking about. It is not possible to fetch a large number of documents in one go. I suppose this is mainly because on the practical restrictions on URL length, e.g. IE supports max 2,048 characters (although, I am not aware if PubMed imposes any limits). It will be useful (under some conditions) to cut the fetches into a number of parts and then return the combined result. What do you think? > >>> 2. Mapping terms to MeSH (I couldn't find this!). >>> >> >> >> I'm not sure how to accomplish this. >> > I will do bit more research on this and then get back to you. > >> >> >>> I will post other comments as I recollect them. I have another question (though it is not very appropriate place for it); >>> >>> Is there any Ruby library which can do some basic text mining tasks, like, tokenization, sentence boundary discrimination etc. ? >>> >> >> I think yes, but I'm not doing text mining for now, sorry ;-) >> > Yet I haven't find a Ruby library for that. I will keep on searching. > > Cheers, > Kaustubh > >> Thanks, >> Toshiaki >> >> >> >>> Cheers, >>> Kaustubh >>> >>> Toshiaki Katayama wrote: >>> >>> >>>> Hi Patil, >>>> >>>> On 2007/11/12, at 20:09, Kaustubh Patil wrote: >>>> >>>>> XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method; >>>>> >>>> Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods. >>>> >>>> Both methods are able to accept any E-Utils options as a hash. >>>> >>>> I will remove the suffix "2" from these method if the following incompatibility can be accepted. >>>> >>>> * changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility >>>> currently all of >>>> 1. efetch("123") >>>> 2. efetch("123", "456") >>>> 3. efetch(["123", "456"]) >>>> are accepted but 2. will be unavailable. >>>> >>>> Other notes: >>>> >>>> * default value for the retmode option remains "text" for the backward compatibility >>>> * both methods are rewritten to use Bio::Command.post_form to make the code clear >>>> * Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID) >>>> >>>> >>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)") >>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"}) >>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"}) >>>> >>>> puts Bio::PubMed.efetch2("10592173") >>>> puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"}) >>>> >>>> >>>> Thanks, >>>> Toshiaki Katayama >>>> >>>> _______________________________________________ >>>> BioRuby mailing list >>>> BioRuby at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioruby >>>> >>>> >> >> >> From ktym at hgc.jp Tue Nov 20 16:20:23 2007 From: ktym at hgc.jp (Toshiaki Katayama) Date: Wed, 21 Nov 2007 01:20:23 +0900 Subject: [BioRuby] Bio::PubMed efetch xml support and other options In-Reply-To: <47430436.3070005@staff.science.uva.nl> References: <4738346D.4060206@staff.science.uva.nl> <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp> <473C1708.9020306@staff.science.uva.nl> <4742EEE5.90400@staff.science.uva.nl> <577009FA-1E77-493A-A036-B7939230345A@hgc.jp> <47430436.3070005@staff.science.uva.nl> Message-ID: <7787216D-B5E7-4EC3-B467-C62489CFDD4C@hgc.jp> Hi, On 2007/11/21, at 0:58, Kaustubh Patil wrote: > The problem was temporary (solved by now). I guess it was part of maintainance. Thank you. I've confirmed the tests are now working. Another issue: Most of the BioRuby classes (which access server) are designed to create a factory object first, e.g. server = Bio::Blast.remote(...) result = server.query(...) server = Bio::KEGG::API.new result = server.get_genes_by_pathway(...) However, Bio::PubMed is not. result = Bio::PubMed.esearch(...) I think this was caused only by a historical reason. Should I change this design to unify? server = Bio::PubMed.new result = server.esearch(...) Or provides both ways - what is the most excellent way to do this (to define methods and to make them also available as class methods)? def esearch(args) # real codes end def self.esearch(args) self.new.esearch(args) end Toshiaki > Toshiaki Katayama wrote: > >> By the way, currently efetch method returns the following error. >> >> % ruby lib/bio/io/pubmed.rb >> : >> --- Retrieve PubMed entry by E-Utils --- >> Wed Nov 21 00:23:20 +0900 2007 >> 1: id: 16381885 Error occurred: PubMed article server is not avaliable >> Wed Nov 21 00:23:23 +0900 2007 >> 1: id: 16381885 Error occurred: PubMed article server is not avaliable >> >> Is this a temporal problem?