From mkikkawa at gmail.com  Sun Nov  4 01:01:14 2007
From: mkikkawa at gmail.com (Masahide Kikkawa)
Date: Sun, 4 Nov 2007 15:01:14 +0900
Subject: [BioRuby] pubmed bug?
Message-ID: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com>

Hi,
  I'm new to bioruby mailing list. While ago, I reported a bug of  
bioruby to rubyforge. Seems like the bug was not fixed in the new  
bioruby release (1.1).
  Could someone take a look the following report?
http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037

Thanks in advance.
---------------------------------------------------------------
Masahide Kikkawa, M.D., Ph. D.
Professor
Structural Biology
Graduate School of Science
Kyoto University
Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
JAPAN
http://structure.biophys.kyoto-u.ac.jp/
Tel: +81-75-753-9421
FAX: +81-75-753-4218
---------------------------------------------------------------


From jan.aerts at bbsrc.ac.uk  Sun Nov  4 06:51:47 2007
From: jan.aerts at bbsrc.ac.uk (jan aerts (RI))
Date: Sun, 4 Nov 2007 11:51:47 -0000
Subject: [BioRuby] pubmed bug?
References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com>
Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk>

Hi Masahide.

Sorry about not spotting this earlier. The bug fix has been committed to CVS now.

@Toshiaki: could you set the status of the bug report on rubyforge to "Closed"?

jan.


-----Original Message-----
From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa
Sent: Sun 04/11/2007 06:01
To: bioruby at lists.open-bio.org
Subject: [BioRuby] pubmed bug?
 
Hi,
  I'm new to bioruby mailing list. While ago, I reported a bug of  
bioruby to rubyforge. Seems like the bug was not fixed in the new  
bioruby release (1.1).
  Could someone take a look the following report?
http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037

Thanks in advance.
---------------------------------------------------------------
Masahide Kikkawa, M.D., Ph. D.
Professor
Structural Biology
Graduate School of Science
Kyoto University
Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
JAPAN
http://structure.biophys.kyoto-u.ac.jp/
Tel: +81-75-753-9421
FAX: +81-75-753-4218
---------------------------------------------------------------


_______________________________________________
BioRuby mailing list
BioRuby at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioruby


From baj2107 at columbia.edu  Sun Nov  4 15:32:39 2007
From: baj2107 at columbia.edu (Bernd Jagla)
Date: Sun, 4 Nov 2007 15:32:39 -0500
Subject: [BioRuby] transcription factor binding site identification
Message-ID: <01bb01c81f21$d871d620$0500a8c0@berndhome>

Hi there, 

 
Is it possible with bioruby/ruby to scan a nucleotide sequence and search
for binding sites of TFs?

 
How would I do this? (I looked in the documentation but couldn't find it.)

 
Thanks,

 
Bernd


From ktym at hgc.jp  Sun Nov  4 21:03:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Mon, 5 Nov 2007 11:03:34 +0900
Subject: [BioRuby] pubmed bug?
In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk>
References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com>
	<1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk>
Message-ID: <97798D54-E2FC-43B1-A8A0-17905E39AEB1@hgc.jp>

Kikkawa-san,

I'm sorry I have never used the tracker on rubyforge
as I just used the site to provide our BioRuby gem package.

Jan, thanks for the fix. I changed the status to closed.

Regards,
Toshiaki Katayama

On 2007/11/04, at 20:51, jan aerts (RI) wrote:

> Hi Masahide.
>
> Sorry about not spotting this earlier. The bug fix has been committed to CVS now.
>
> @Toshiaki: could you set the status of the bug report on rubyforge to "Closed"?
>
> jan.
>
>
> -----Original Message-----
> From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa
> Sent: Sun 04/11/2007 06:01
> To: bioruby at lists.open-bio.org
> Subject: [BioRuby] pubmed bug?
>
> Hi,
>   I'm new to bioruby mailing list. While ago, I reported a bug of  
> bioruby to rubyforge. Seems like the bug was not fixed in the new  
> bioruby release (1.1).
>   Could someone take a look the following report?
> http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037
>
> Thanks in advance.
> ---------------------------------------------------------------
> Masahide Kikkawa, M.D., Ph. D.
> Professor
> Structural Biology
> Graduate School of Science
> Kyoto University
> Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
> JAPAN
> http://structure.biophys.kyoto-u.ac.jp/
> Tel: +81-75-753-9421
> FAX: +81-75-753-4218
> ---------------------------------------------------------------
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From kpatil at science.uva.nl  Tue Nov  6 05:00:46 2007
From: kpatil at science.uva.nl (Kaustubh Patil)
Date: Tue, 06 Nov 2007 11:00:46 +0100
Subject: [BioRuby] count parameter in Bio::PubMed.esearch
Message-ID: <47303B4E.8020103@staff.science.uva.nl>

Hi,

Here is a suggestion/feature for Bio::PubMed.esearch.

Currently it is not possible to use rettype=count (through options hash) 
in Bio::PubMed.esearch.

To get this feature replace the following line in pubmed.rb (approx. 
line 97)

result = result.scan(/<Id>(.*?)<\/Id>/m).flatten

by

if(hash['rettype']=="count")
        result = result.scan(/<Count>(.*?)<\/Count>/m).flatten
        result = result[0]
else
        result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
end


and it will return the count as a string, which can be easily converted 
to an integer by "result.to_i"

I hope it is useful.

Cheers,
Kaustubh Patil

PS: for more details on Entrez esearch parameters, please refer to;

http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html

From kpatil at science.uva.nl  Tue Nov  6 04:54:00 2007
From: kpatil at science.uva.nl (Kaustubh Patil)
Date: Tue, 06 Nov 2007 10:54:00 +0100
Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch
Message-ID: <473039B8.1060600@staff.science.uva.nl>

Hi,

I would like to thank you for the BioRuby library, it is a very useful 
tool. I am doing some literature mining using Ruby and I use PubMed as 
my source. Here is some background for my question;

It is not possible to search PubMed with logical operators, e.g.  
HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it 
returns empty result).

It is due to the url encoding (i.e. CGI.escape) of the search term 
(approx. line 89 in pubmed.rb). If we remove this url encoding it is 
possible to make such queries.

Now my question is, is it safe to remove this CGI.escape ?

Thank you and regards,
Kaustubh Patil


From georgkam at gmail.com  Thu Nov  8 02:09:47 2007
From: georgkam at gmail.com (George)
Date: Thu, 08 Nov 2007 10:09:47 +0300
Subject: [BioRuby] English translation
Message-ID: <4732B63B.8030702@gmail.com>

Hi Nakao,
Please how can i translate your blog to English?

Thanks

George

From jan.aerts at bbsrc.ac.uk  Thu Nov  8 04:06:02 2007
From: jan.aerts at bbsrc.ac.uk (Jan Aerts)
Date: Thu, 08 Nov 2007 09:06:02 +0000
Subject: [BioRuby] biographics
In-Reply-To: <4732B309.2050008@gmail.com>
References: <4732B309.2050008@gmail.com>
Message-ID: <1194512762.6300.19.camel@rilxvm05>

Hey George.

Thanks again for your interest in using Bio::Graphics.

Concerning your first question: I'm trying to implement the notion of
subfeatures in Bio::Graphics at the moment. I think that would serve
your purpose. Unfortunately, this requires some refactoring of one of
the core-classes in bioruby itself: Bio::Feature. I'm waiting for the
big guys at bioruby for their ideas on implementing that. So at the
moment, the best way of displaying this is to either display the domains
separately, or to use the spliced glyph: even though they're not exons,
this would at least link them up later.
Do you want to display that protein in its genomic environment as well?
Or do you just want to have the protein on its own with the domains?
Could you send us a mockup of how you'd like to have this type of
information (i.e. proteins and their domains) represented? Just a simple
drawing will do. I haven't had to do this type of visualization yet
myself, so would be interested in how you experts would like to do that.

Concerning your second question: it looks like you're referencing a
version of the library that I sent out a while ago on the mailing list.
All code development is now run via rubyforge. The moment I put it on
rubyforge the namespace was changed from BioExt (*bio*ruby *ext*ensions)
to just Bio. Did you install a version via rubyforge (i.e. following the
instructions on bio-graphics.rubyforge.org)? If so: change all
references to BioExt::Graphics to Bio::Graphics.
So the line
  my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600)
would become
  my_panel = Bio::Graphics::Panel.new(1000, 1200, false, 1, 600)

jan.

PS: I've CC'd this reply to the bioruby mailing list if that's OK...

On Thu, 2007-11-08 at 09:56 +0300, George wrote:
> Hi Dr Jan.
> 
> I have a chado based database system running on ruby on rails for 
> storing sequence and annotation data.
> The Feature table contains the biological sequences represented as 
> features and the Feature location table contains the locations or bio 
> coordinates for each feature.
> Let me explain with an example, a protein sequence is a feature. call it 
> prot_A. Our Prot_A can have domains A1, A2, etc. Now these domains are 
> actually features by themselves but they happen to be located within Prot_A.
> 
> So in the feature table i have Prot_A, Domain A1, A2.
> 
> In the Feature locations table call it Featureloc, (chado style)
> 
> ---------------------------------------------
> featureloc_id|     featuresrc_id |fmin     |fmax|
> ---------------------------------------------
> 1        null        1    200
> 2        1        1    20
> 3        1        30    60
> ----------------------------------------------
> 
> My aim is to represent these features graphically such that a user can 
> view a feature with its domains.
> I would like to generate simple graphics for these features from  a gff 
> formatted file which can be created on the fly from the database tables. 
> Any idea on how i can do that in rails and using the bio-graphics module?
> 
> Secondly am getting the error 
> "F:/Netbeans_folder/vargene/lib/biographics.rb:6: uninitialized constant 
> BioExt (NameError) when i try to access the Bioext::Graphics::Panel.new 
> method while running the following code.
> 
> require 'stringio'
> require 'base64'
> gem 'bio-graphics'
> require 'bio-graphics'
> 
> my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600)
>        
>        #Create and configure tracks
>        track_SNP = my_panel.add_track('SNP')
>        track_gene = my_panel.add_track('gene')
>        track_transcript = my_panel.add_track('transcript')
>        
>        track_SNP.feature_colour = [1,0,0]
>       track_SNP.feature_glyph = 'triangle'
>       track_gene.feature_glyph = 'directed_spliced'
>       track_transcript.feature_glyph = 'spliced'
>       track_transcript.feature_colour = [0,0.5,0]
>       
>       # Add data to tracks
>       DATA.each do |line|
>         line.chomp!
>         ref, type, name, location, link = line.split(/\s+/)
>         if link == ''
>           link = nil
>         end
>         if type == 'SNP'
>           track_SNP.add_feature(name, location, link)
>         elsif type == 'gene'
>           track_gene.add_feature(name, location, link)
>         elsif type == 'transcript'
>           track_transcript.add_feature(name, location, link)
>         end
>       end
>       
>       # And draw
>       my_panel.draw('c:/my_panel.png')
>       
>       __END__
>       chr1  gene        CYP2D6      complement(80..120)
>       chr1  gene        ALDH        100..449
>       chr1  SNP         rs1234      107
>       chr1  gene        bla         complement(400..430)
>       chr1  SNP         rs9876      44
>       chr1  gene        some_gene   
> complement(join(170..231,264..299,350..360,409..445))
>       chr1  transcript  transcript1 join(250..300,390..425)
>       chr1  transcript  transcript2 253..330
>       chr1  transcript  transcript3 266..344
>       chr1  transcript  transcript4 
> complement(join(410..430,239..286,129..151))
> 
> Is the Bioext module really available within the current implementation 
> of the biographics gem?
> 
> Thanks
> 
> George
> 
-- 

Dr Jan Aerts
Bioinformatics Group
Roslin Institute
Roslin EH25 9PS
Scotland, UK
tel: +44 131 527 4198
skype: aerts_ri
website: http://saaientist.blogspot.com

----...and the obligatory disclaimer----
Roslin Institute is a company limited by guarantee, registered in
Scotland (registered number SC157100) and a Scottish Charity (registered
number SC023592). Our registered office is at Roslin, Midlothian, EH25
9PS. VAT registration number 847380013.

The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of  the addressee only. The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute.


From ngoto at gen-info.osaka-u.ac.jp  Fri Nov  9 07:30:10 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Fri, 9 Nov 2007 21:30:10 +0900
Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch
In-Reply-To: <473039B8.1060600@staff.science.uva.nl>
References: <473039B8.1060600@staff.science.uva.nl>
Message-ID: <20071109123012.8128D1CBC408@idnmail.gen-info.osaka-u.ac.jp>

Hi,

On Tue, 06 Nov 2007 10:54:00 +0100
Kaustubh Patil <kpatil at science.uva.nl> wrote:

> Hi,
> 
> I would like to thank you for the BioRuby library, it is a very useful 
> tool. I am doing some literature mining using Ruby and I use PubMed as 
> my source. Here is some background for my question;
> 
> It is not possible to search PubMed with logical operators, e.g.  
> HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it 
> returns empty result).

Probably you mean

 Bio::PubMed.esearch("HIV AND drug")
 Bio::PubMed.esearch("geneA OR geneB")

More complicated example:

 Bio::PubMed.esearch("((p53 AND apoptosis) 2007/11[dp]) OR bioperl")

You can use the same search terms as of NCBI PubMed seaech
without any care about URL encoding.

> It is due to the url encoding (i.e. CGI.escape) of the search term 
> (approx. line 89 in pubmed.rb). If we remove this url encoding it is 
> possible to make such queries.
> 
> Now my question is, is it safe to remove this CGI.escape ?

I think it is unsafe and should not be removed.

-- 
Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp

From n at bioruby.org  Fri Nov  9 09:30:24 2007
From: n at bioruby.org (Mitsuteru Nakao)
Date: Fri, 9 Nov 2007 23:30:24 +0900
Subject: [BioRuby] English translation
In-Reply-To: <4732B63B.8030702@gmail.com>
References: <4732B63B.8030702@gmail.com>
Message-ID: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>

Hi George,

Of course OK.
Please let me know the URL of my blog you mention. :-)

On 11/8/07, George <georgkam at gmail.com> wrote:
> Hi Nakao,
> Please how can i translate your blog to English?

Thanks
Mitsuteru
-
Mitsuteru Nakao
mn at kazusa.or.jp / n at bioruby.org

From georgkam at gmail.com  Sat Nov 10 03:39:24 2007
From: georgkam at gmail.com (George Githinji)
Date: Sat, 10 Nov 2007 11:39:24 +0300
Subject: [BioRuby] English translation
In-Reply-To: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>
References: <4732B63B.8030702@gmail.com>
	<90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>
Message-ID: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com>

Hi Nakao
The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/

On Nov 9, 2007 5:30 PM, Mitsuteru Nakao <n at bioruby.org> wrote:

> Hi George,
>
> Of course OK.
> Please let me know the URL of my blog you mention. :-)
>
> On 11/8/07, George <georgkam at gmail.com> wrote:
> > Hi Nakao,
> > Please how can i translate your blog to English?
>
> Thanks
> Mitsuteru
> -
> Mitsuteru Nakao
> mn at kazusa.or.jp / n at bioruby.org
>


-- 
---------------
Sincerely
George

Skype: george_g2
Website: http://biorelated.wordpress.com/

From ktym at hgc.jp  Sat Nov 10 03:40:09 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sat, 10 Nov 2007 17:40:09 +0900
Subject: [BioRuby] count parameter in Bio::PubMed.esearch
In-Reply-To: <47303B4E.8020103@staff.science.uva.nl>
References: <47303B4E.8020103@staff.science.uva.nl>
Message-ID: <CE62D4B8-EC1B-4454-A8C9-D4CB756129AF@hgc.jp>

Hi Kaustubh,

Thank you for your suggestion. I applied your changes to the CVS.

During this process, I found that the previous fix applied by Jan was wrong.
Developers, please do the test before you commit your changes. :)

The change should be made to the Bio::PubMed.query method,
however, the search method is also needed to be rewritten
because the HTML structure returned by NCBI was reformatted.

Anyway, in Bio::PubMed module, use of the esearch/efetch methods pair is
strongly recommended compared to the search/query methods pair.


bioruby> Bio::PubMed.search("(genome AND analysis) OR bioinformatics)")
  ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368"]

bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)")
  ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368", "17988176", "17988086", "17987666", "17987374", "17987257", "17987048", "17986781", "17986522", "17986471", "17986460", "17986440", "17986356", "17986355", "17986329", "17986320", "17986282", "17986185", "17986079", "17985162", "17984568", "17984549", "17984548", "17984520", "17984228", "17984226", "17984208", "17984205", "17984085", "17984084", "17984080", "17983847", "17983807", "17983802", "17983573", "17983493", "17983269", "17983268", "17983157", "17982457", "17982456", "17982442", "17982427", "17982176", "17982123", "17981990", "17981981", "17981974", "17981891", "17981844", "17981816", "17981801", "17981746", "17981579", "17981546", "17981477", "17981060", "17981052", "17980519", "17980517", "17980477", "17980146", "17980047", "17980028", "17980019", "17979886", "17979725", "17979297", "17979181", "17978887", "17978880", "17978572", "17978498", "17978310", "17978184", "17978179", "17977886", "17977881", "17977850", "17977831", "17977670"]

bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)", {'rettype' => 'count'})
  ==> 286139

Regards,
Toshiaki Katayama


On 2007/11/06, at 19:00, Kaustubh Patil wrote:

> Hi,
>
> Here is a suggestion/feature for Bio::PubMed.esearch.
>
> Currently it is not possible to use rettype=count (through options hash) in Bio::PubMed.esearch.
>
> To get this feature replace the following line in pubmed.rb (approx. line 97)
>
> result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
>
> by
>
> if(hash['rettype']=="count")
>        result = result.scan(/<Count>(.*?)<\/Count>/m).flatten
>        result = result[0]
> else
>        result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
> end
>
>
> and it will return the count as a string, which can be easily converted to an integer by "result.to_i"
>
> I hope it is useful.
>
> Cheers,
> Kaustubh Patil
>
> PS: for more details on Entrez esearch parameters, please refer to;
>
> http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From ktym at hgc.jp  Sun Nov 11 09:10:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sun, 11 Nov 2007 23:10:34 +0900
Subject: [BioRuby] English translation
In-Reply-To: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com>
References: <4732B63B.8030702@gmail.com>
	<90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>
	<55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com>
Message-ID: <C4373E4C-539E-44D7-89C8-D5DAE761260F@hgc.jp>

Hi George,

Which did you mean?

1. you just want to read his blog in English
2. you want to translate his blog and make it publicly available

In the case of 1, you can use free web translators

http://www.google.com/language_tools
http://babelfish.altavista.com/
http://www.worldlingo.com/en/products_services/worldlingo_translator.html
http://www.freetranslation.com/
http://www.excite.co.jp/world/url/

quality of those machine translation are not good, though.

In the case of 2, you can do it freely as Mitsuteru wrote.

Toshiaki

On 2007/11/10, at 17:39, George Githinji wrote:

> Hi Nakao
> The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/
>
> On Nov 9, 2007 5:30 PM, Mitsuteru Nakao <n at bioruby.org> wrote:
>
>> Hi George,
>>
>> Of course OK.
>> Please let me know the URL of my blog you mention. :-)
>>
>> On 11/8/07, George <georgkam at gmail.com> wrote:
>>> Hi Nakao,
>>> Please how can i translate your blog to English?
>>
>> Thanks
>> Mitsuteru
>> -
>> Mitsuteru Nakao
>> mn at kazusa.or.jp / n at bioruby.org
>>
>
>
>
> -- 
> ---------------
> Sincerely
> George
>
> Skype: george_g2
> Website: http://biorelated.wordpress.com/
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From ktym at hgc.jp  Sun Nov 11 10:12:44 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Mon, 12 Nov 2007 00:12:44 +0900
Subject: [BioRuby] transcription factor binding site identification
In-Reply-To: <01bb01c81f21$d871d620$0500a8c0@berndhome>
References: <01bb01c81f21$d871d620$0500a8c0@berndhome>
Message-ID: <82F3F8EA-81DC-4B60-9715-8E968F123975@hgc.jp>

Hi,

If you want to search with TRANSFAC motifs, you can use the tfscan command in the EMBOSS package.

Otherwise, you may need to define your own algorithm to search your motif.
If your motif is in profile format, you need to develop profile search method.
If your motif is simple and can be converted to regexp, the task would be relatively easy.

  # to find all occurrences
  results = seq.scan(regexp)

  # to find positions of match
  pos = 0
  while pos = seq.index(regexp, pos + 1)
    puts pos
  end

You may also interested in the Bio::Sequence#window_search method.

Thanks,
Toshiaki

On 2007/11/05, at 5:32, Bernd Jagla wrote:

> Hi there, 
>
>
>
> Is it possible with bioruby/ruby to scan a nucleotide sequence and search
> for binding sites of TFs?
>
>
>
> How would I do this? (I looked in the documentation but couldn't find it.)
>
>
>
> Thanks,
>
>
>
> Bernd
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From kpatil at science.uva.nl  Mon Nov 12 06:09:33 2007
From: kpatil at science.uva.nl (Kaustubh Patil)
Date: Mon, 12 Nov 2007 12:09:33 +0100
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
Message-ID: <4738346D.4060206@staff.science.uva.nl>

Hi,

XML is very nice for searching etc. PubMed documents can be fetched in 
various formats, including xml. I have changed the efetch method in 
Bio::PubMed class in order to implement this. Here is the modified method;

# Kaustubh Patil: 6 Nov. 2007
# options hash here is different than options hash in esearch
    def self.efetch(ids, hash = {} )
      return [] if ids.empty?

      # default options
      hash['retmode'] = 'xml' unless hash['retmode']
      hash['rettype'] = 'medline' unless hash['rettype']

      # create options array in required format
      opts = []
      hash.each do |k, v|
        opts << "#{k}=#{v}"
      end

      host = "eutils.ncbi.nlm.nih.gov"
      path = 
"/entrez/eutils/efetch.fcgi?tool=bioruby&db=pubmed&#{opts.join('&')}&id="

      ids = ids.join(",")

      http = Net::HTTP.new(host)
      response, = http.get(path + ids)
      result = response.body

      if(hash['retmode']=='text')
        result = result.split(/\n\n+/)
      end

      return result

    end


I hope it is useful.

Cheers,
Kaustubh

PS: for details of entrez efetch parameters 
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html


From jan.aerts at bbsrc.ac.uk  Wed Nov 14 15:05:50 2007
From: jan.aerts at bbsrc.ac.uk (jan aerts (RI))
Date: Wed, 14 Nov 2007 20:05:50 -0000
Subject: [BioRuby] named arguments
Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk>

Hi staff,

We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...):

  picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green)

A good workaround is to use named parameter lists, which would make the previous code look like:

  picture.add_gene(:feature => my_gene, :colour => :green)

which is much more readable.

However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea?

Really looking forward to your comments.
jan.


From ktym at hgc.jp  Thu Nov 15 02:44:03 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 15 Nov 2007 16:44:03 +0900
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
In-Reply-To: <4738346D.4060206@staff.science.uva.nl>
References: <4738346D.4060206@staff.science.uva.nl>
Message-ID: <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp>

Hi Patil,

On 2007/11/12, at 20:09, Kaustubh Patil wrote:
> XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method;

Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods.

Both methods are able to accept any E-Utils options as a hash.

I will remove the suffix "2" from these method if the following incompatibility can be accepted.

* changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility
  currently all of
   1. efetch("123")
   2. efetch("123", "456")
   3. efetch(["123", "456"])
  are accepted but 2. will be unavailable.

Other notes:

* default value for the retmode option remains "text" for the backward compatibility
* both methods are rewritten to use Bio::Command.post_form to make the code clear
* Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID)


puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)")
puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"})
puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"})

puts Bio::PubMed.efetch2("10592173")
puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"})


Thanks,
Toshiaki Katayama


From ktym at hgc.jp  Thu Nov 15 03:51:29 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 15 Nov 2007 17:51:29 +0900
Subject: [BioRuby] named arguments
In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk>
References: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk>
Message-ID: <38F1FC51-FE7D-4F85-B7D1-DC4B5777E1E6@hgc.jp>

Jan,

There are several methods which accept hash as the last argument, so you are OK to proceed with it.

Toshiaki

On 2007/11/15, at 5:05, jan aerts ((RI)) wrote:

> Hi staff,
>
> We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...):
>
>   picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green)
>
> A good workaround is to use named parameter lists, which would make the previous code look like:
>
>   picture.add_gene(:feature => my_gene, :colour => :green)
>
> which is much more readable.
>
> However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea?
>
> Really looking forward to your comments.
> jan.


From jan.aerts at bbsrc.ac.uk  Mon Nov 19 05:35:59 2007
From: jan.aerts at bbsrc.ac.uk (Jan Aerts)
Date: Mon, 19 Nov 2007 10:35:59 +0000
Subject: [BioRuby] [Fwd: Using BioRuby for parsing a .ptt file]
Message-ID: <1195468559.25265.11.camel@rilxvm05>

A post from Abhik Khanra.

Could anyone help him out?

Thanks,
jan.
-------- Forwarded Message --------
> From: Abhik Khanra <abhik.rk at gmail.com>
> To: jan.aerts at bbsrc.ac.uk
> Subject: Using BioRuby for parsing a .ptt file
> Date: Sat, 10 Nov 2007 07:18:05 +0530
> 
> Hi.
> 
> I came across your blog recently. It is a really good source of information.
> 
> I have a query and have posted the same in the BioRuby mailing-list too.
> It's just that i'm in a time-crunch. Hence i'm sending it to you as well.
> Hope that would not be a problem for you.
> 
> I'm working on a sample visualization application and leveraging
> BioRuby for extracting target sequence origin and endpoints from BLAST
> results. I obtained an example of this from the BioRuby tutorial.
> Could you please let me know if there is any similar example using BioRuby for
> extracting useful information from parsing a .ptt file?
> 
> Thanks
> Abhik
-- 
Dr Jan Aerts
Bioinformatics Group
Roslin Institute
Roslin EH25 9PS
Scotland, UK
tel: +44 131 527 4198

----...and the obligatory disclaimer----
Roslin Institute is a company limited by guarantee, registered in
Scotland (registered number SC157100) and a Scottish Charity (registered
number SC023592). Our registered office is at Roslin, Midlothian, EH25
9PS. VAT registration number 847380013.

The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of  the addressee only. The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute.

From ktym at hgc.jp  Tue Nov 20 10:38:32 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Wed, 21 Nov 2007 00:38:32 +0900
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
In-Reply-To: <4742EEE5.90400@staff.science.uva.nl>
References: <4738346D.4060206@staff.science.uva.nl>
	<6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp>
	<473C1708.9020306@staff.science.uva.nl>
	<BC34793F-F6C2-4842-8716-5B3E6BB73F99@hgc.jp>
	<4742EEE5.90400@staff.science.uva.nl>
Message-ID: <577009FA-1E77-493A-A036-B7939230345A@hgc.jp>

Hi Kaustubh,

I've just committed the change that Bio::PubMed.efetch and esearch to wait for 3 seconds between consequent queries.

I also renamed efetch2 and esearch2 (newer version, which accepts E-Util options as a hash) to efetch and esearch (old version). New version of efetch method breaks backward compatibility which could accept a list of ids as variable length arguments.

>>>>  1. efetch("123") --> OK
>>>>  2. efetch("123", "456") --> NG
>>>>  3. efetch(["123", "456"]) --> OK

Here, the pubmed IDs can be (array of) string or numeric.

By the way, currently efetch method returns the following error.

% ruby lib/bio/io/pubmed.rb
   :
--- Retrieve PubMed entry by E-Utils ---
Wed Nov 21 00:23:20 +0900 2007
1: id: 16381885 Error occurred: PubMed article server is not avaliable
Wed Nov 21 00:23:23 +0900 2007
1: id: 16381885 Error occurred: PubMed article server is not avaliable

Is this a temporal problem?
I believe efetch2 was working when I have implemented.

Regards,
Toshiaki Katayama

On 2007/11/20, at 23:27, Kaustubh Patil wrote:
> Hi Toshiaki,
>
> Thanks for your email. Please find my answers embedded below;
>
> Thanks,
> kaustubh
>
> Toshiaki Katayama wrote:
>
>> Hi Kaustubh,
>>
>> On 2007/11/15, at 18:53, Kaustubh Patil wrote:
>>
>>  
>>> Hi Toshiaki,
>>>
>>> Thank you very much for the improvements. There are some other desirable improvements;
>>>
>>> 1. PubMed has some timing restrictions on two consequitive queries. So it will be very nice if it can be implemented inside a function, like, esearch/efetch.
>>>    
>>
>> How about to have following method and call it within efetch and esearch methods before the Bio::Command.post_form?
>>
>> --------------------------------------------------
>>  # Make no more than one request every 3 seconds.                              @@ncbi_interval = 3
>>  @@last_accessed = nil
>>
>>  def wait_access
>>    if @@last_accessed
>>      duration = Time.now - @@last_accessed
>>      if duration > @@ncbi_interval
>>        sleep @@ncbi_interval - duration
>>      end
>>    else
>>      @@last_accessed = Time.now
>>    end
>>  end
>> --------------------------------------------------
>>  
> This could be a very good and quick implementation. In fact I use something similar for my usgae now.
>
>> By the way, NCBI also have another restriction:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html
>>  
>>> Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time    weekdays for any series of more than 100 requests.
>>>
>>> Do you think this should also be taken care automatically?
>>>
>>>    
> I am aware of those restrictions. I will be very nice if this can be taken care automatically. There is a very good Library for accessing/using Medline through R, called MedlineR (btw currentl its not downloadable as their erver is down). MedlineR handles this automatically.
>
> There is another improvement I am thinking about. It is not possible to fetch a large number of documents in one go. I suppose this is mainly because on the practical restrictions on URL length, e.g. IE supports max 2,048 characters (although, I am not aware if PubMed imposes any limits). It will be useful (under some conditions) to  cut the fetches into a number of parts and then return the combined result. What do you think?
>
>>> 2. Mapping terms to MeSH (I couldn't find this!).
>>>    
>>
>>  
>> I'm not sure how to accomplish this.
>>  
> I will do bit more research on this and then get back to you.
>
>>
>>  
>>> I will post other comments as I recollect them. I have another question (though it is not very appropriate place for it);
>>>
>>> Is there any Ruby library which can do some basic text mining tasks, like, tokenization, sentence boundary discrimination etc. ?
>>>    
>>
>> I think yes, but I'm not doing text mining for now, sorry ;-)
>>  
> Yet I haven't find a Ruby library for that. I will keep on searching.
>
> Cheers,
> Kaustubh
>
>> Thanks,
>> Toshiaki
>>
>>
>>  
>>> Cheers,
>>> Kaustubh
>>>
>>> Toshiaki Katayama wrote:
>>>
>>>    
>>>> Hi Patil,
>>>>
>>>> On 2007/11/12, at 20:09, Kaustubh Patil wrote:
>>>>      
>>>>> XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method;
>>>>>          
>>>> Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods.
>>>>
>>>> Both methods are able to accept any E-Utils options as a hash.
>>>>
>>>> I will remove the suffix "2" from these method if the following incompatibility can be accepted.
>>>>
>>>> * changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility
>>>> currently all of
>>>>  1. efetch("123")
>>>>  2. efetch("123", "456")
>>>>  3. efetch(["123", "456"])
>>>> are accepted but 2. will be unavailable.
>>>>
>>>> Other notes:
>>>>
>>>> * default value for the retmode option remains "text" for the backward compatibility
>>>> * both methods are rewritten to use Bio::Command.post_form to make the code clear
>>>> * Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID)
>>>>
>>>>
>>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)")
>>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"})
>>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"})
>>>>
>>>> puts Bio::PubMed.efetch2("10592173")
>>>> puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"})
>>>>
>>>>
>>>> Thanks,
>>>> Toshiaki Katayama
>>>>
>>>> _______________________________________________
>>>> BioRuby mailing list
>>>> BioRuby at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>> <kpatil.vcf>
>>>>      
>>
>>  
>> <kpatil.vcf>


From ktym at hgc.jp  Tue Nov 20 11:20:23 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Wed, 21 Nov 2007 01:20:23 +0900
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
In-Reply-To: <47430436.3070005@staff.science.uva.nl>
References: <4738346D.4060206@staff.science.uva.nl>
	<6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp>
	<473C1708.9020306@staff.science.uva.nl>
	<BC34793F-F6C2-4842-8716-5B3E6BB73F99@hgc.jp>
	<4742EEE5.90400@staff.science.uva.nl>
	<577009FA-1E77-493A-A036-B7939230345A@hgc.jp>
	<47430436.3070005@staff.science.uva.nl>
Message-ID: <7787216D-B5E7-4EC3-B467-C62489CFDD4C@hgc.jp>

Hi,

On 2007/11/21, at 0:58, Kaustubh Patil wrote:
> The problem was temporary (solved by now). I guess it was part of maintainance.

Thank you. I've confirmed the tests are now working.

Another issue:

Most of the BioRuby classes (which access server) are designed to create a factory object first, e.g.

  server = Bio::Blast.remote(...)
  result = server.query(...)

  server = Bio::KEGG::API.new
  result = server.get_genes_by_pathway(...)

However, Bio::PubMed is not.

  result = Bio::PubMed.esearch(...)

I think this was caused only by a historical reason. Should I change this design to unify?

  server = Bio::PubMed.new
  result = server.esearch(...)

Or provides both ways - what is the most excellent way to do this (to define methods and to make them also available as class methods)?

  def esearch(args)
    # real codes
  end

  def self.esearch(args)
    self.new.esearch(args)
  end

Toshiaki

> Toshiaki Katayama wrote:
>
>> By the way, currently efetch method returns the following error.
>>
>> % ruby lib/bio/io/pubmed.rb
>>   :
>> --- Retrieve PubMed entry by E-Utils ---
>> Wed Nov 21 00:23:20 +0900 2007
>> 1: id: 16381885 Error occurred: PubMed article server is not avaliable
>> Wed Nov 21 00:23:23 +0900 2007
>> 1: id: 16381885 Error occurred: PubMed article server is not avaliable
>>
>> Is this a temporal problem?


From mkikkawa at gmail.com  Sun Nov  4 06:01:14 2007
From: mkikkawa at gmail.com (Masahide Kikkawa)
Date: Sun, 4 Nov 2007 15:01:14 +0900
Subject: [BioRuby] pubmed bug?
Message-ID: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com>

Hi,
  I'm new to bioruby mailing list. While ago, I reported a bug of  
bioruby to rubyforge. Seems like the bug was not fixed in the new  
bioruby release (1.1).
  Could someone take a look the following report?
http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037

Thanks in advance.
---------------------------------------------------------------
Masahide Kikkawa, M.D., Ph. D.
Professor
Structural Biology
Graduate School of Science
Kyoto University
Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
JAPAN
http://structure.biophys.kyoto-u.ac.jp/
Tel: +81-75-753-9421
FAX: +81-75-753-4218
---------------------------------------------------------------


From jan.aerts at bbsrc.ac.uk  Sun Nov  4 11:51:47 2007
From: jan.aerts at bbsrc.ac.uk (jan aerts (RI))
Date: Sun, 4 Nov 2007 11:51:47 -0000
Subject: [BioRuby] pubmed bug?
References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com>
Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk>

Hi Masahide.

Sorry about not spotting this earlier. The bug fix has been committed to CVS now.

@Toshiaki: could you set the status of the bug report on rubyforge to "Closed"?

jan.


-----Original Message-----
From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa
Sent: Sun 04/11/2007 06:01
To: bioruby at lists.open-bio.org
Subject: [BioRuby] pubmed bug?
 
Hi,
  I'm new to bioruby mailing list. While ago, I reported a bug of  
bioruby to rubyforge. Seems like the bug was not fixed in the new  
bioruby release (1.1).
  Could someone take a look the following report?
http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037

Thanks in advance.
---------------------------------------------------------------
Masahide Kikkawa, M.D., Ph. D.
Professor
Structural Biology
Graduate School of Science
Kyoto University
Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
JAPAN
http://structure.biophys.kyoto-u.ac.jp/
Tel: +81-75-753-9421
FAX: +81-75-753-4218
---------------------------------------------------------------


_______________________________________________
BioRuby mailing list
BioRuby at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioruby


From baj2107 at columbia.edu  Sun Nov  4 20:32:39 2007
From: baj2107 at columbia.edu (Bernd Jagla)
Date: Sun, 4 Nov 2007 15:32:39 -0500
Subject: [BioRuby] transcription factor binding site identification
Message-ID: <01bb01c81f21$d871d620$0500a8c0@berndhome>

Hi there, 

 
Is it possible with bioruby/ruby to scan a nucleotide sequence and search
for binding sites of TFs?

 
How would I do this? (I looked in the documentation but couldn't find it.)

 
Thanks,

 
Bernd


From ktym at hgc.jp  Mon Nov  5 02:03:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Mon, 5 Nov 2007 11:03:34 +0900
Subject: [BioRuby] pubmed bug?
In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk>
References: <3F329B3D-4338-4B55-823A-0943E081D148@gmail.com>
	<1F16910BB8546C4DA5526FABB0C98D09AA99CB@ebre2ksrv1.ebrc.bbsrc.ac.uk>
Message-ID: <97798D54-E2FC-43B1-A8A0-17905E39AEB1@hgc.jp>

Kikkawa-san,

I'm sorry I have never used the tracker on rubyforge
as I just used the site to provide our BioRuby gem package.

Jan, thanks for the fix. I changed the status to closed.

Regards,
Toshiaki Katayama

On 2007/11/04, at 20:51, jan aerts (RI) wrote:

> Hi Masahide.
>
> Sorry about not spotting this earlier. The bug fix has been committed to CVS now.
>
> @Toshiaki: could you set the status of the bug report on rubyforge to "Closed"?
>
> jan.
>
>
> -----Original Message-----
> From: bioruby-bounces at lists.open-bio.org on behalf of Masahide Kikkawa
> Sent: Sun 04/11/2007 06:01
> To: bioruby at lists.open-bio.org
> Subject: [BioRuby] pubmed bug?
>
> Hi,
>   I'm new to bioruby mailing list. While ago, I reported a bug of  
> bioruby to rubyforge. Seems like the bug was not fixed in the new  
> bioruby release (1.1).
>   Could someone take a look the following report?
> http://rubyforge.org/tracker/index.php?func=detail&aid=11736&group_id=769&atid=3037
>
> Thanks in advance.
> ---------------------------------------------------------------
> Masahide Kikkawa, M.D., Ph. D.
> Professor
> Structural Biology
> Graduate School of Science
> Kyoto University
> Oiwake, Kitashirakawa, Sakyo-ku, Kyoto, 606-8502
> JAPAN
> http://structure.biophys.kyoto-u.ac.jp/
> Tel: +81-75-753-9421
> FAX: +81-75-753-4218
> ---------------------------------------------------------------
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From kpatil at science.uva.nl  Tue Nov  6 10:00:46 2007
From: kpatil at science.uva.nl (Kaustubh Patil)
Date: Tue, 06 Nov 2007 11:00:46 +0100
Subject: [BioRuby] count parameter in Bio::PubMed.esearch
Message-ID: <47303B4E.8020103@staff.science.uva.nl>

Hi,

Here is a suggestion/feature for Bio::PubMed.esearch.

Currently it is not possible to use rettype=count (through options hash) 
in Bio::PubMed.esearch.

To get this feature replace the following line in pubmed.rb (approx. 
line 97)

result = result.scan(/<Id>(.*?)<\/Id>/m).flatten

by

if(hash['rettype']=="count")
        result = result.scan(/<Count>(.*?)<\/Count>/m).flatten
        result = result[0]
else
        result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
end


and it will return the count as a string, which can be easily converted 
to an integer by "result.to_i"

I hope it is useful.

Cheers,
Kaustubh Patil

PS: for more details on Entrez esearch parameters, please refer to;

http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html

From kpatil at science.uva.nl  Tue Nov  6 09:54:00 2007
From: kpatil at science.uva.nl (Kaustubh Patil)
Date: Tue, 06 Nov 2007 10:54:00 +0100
Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch
Message-ID: <473039B8.1060600@staff.science.uva.nl>

Hi,

I would like to thank you for the BioRuby library, it is a very useful 
tool. I am doing some literature mining using Ruby and I use PubMed as 
my source. Here is some background for my question;

It is not possible to search PubMed with logical operators, e.g.  
HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it 
returns empty result).

It is due to the url encoding (i.e. CGI.escape) of the search term 
(approx. line 89 in pubmed.rb). If we remove this url encoding it is 
possible to make such queries.

Now my question is, is it safe to remove this CGI.escape ?

Thank you and regards,
Kaustubh Patil


From georgkam at gmail.com  Thu Nov  8 07:09:47 2007
From: georgkam at gmail.com (George)
Date: Thu, 08 Nov 2007 10:09:47 +0300
Subject: [BioRuby] English translation
Message-ID: <4732B63B.8030702@gmail.com>

Hi Nakao,
Please how can i translate your blog to English?

Thanks

George


From jan.aerts at bbsrc.ac.uk  Thu Nov  8 09:06:02 2007
From: jan.aerts at bbsrc.ac.uk (Jan Aerts)
Date: Thu, 08 Nov 2007 09:06:02 +0000
Subject: [BioRuby] biographics
In-Reply-To: <4732B309.2050008@gmail.com>
References: <4732B309.2050008@gmail.com>
Message-ID: <1194512762.6300.19.camel@rilxvm05>

Hey George.

Thanks again for your interest in using Bio::Graphics.

Concerning your first question: I'm trying to implement the notion of
subfeatures in Bio::Graphics at the moment. I think that would serve
your purpose. Unfortunately, this requires some refactoring of one of
the core-classes in bioruby itself: Bio::Feature. I'm waiting for the
big guys at bioruby for their ideas on implementing that. So at the
moment, the best way of displaying this is to either display the domains
separately, or to use the spliced glyph: even though they're not exons,
this would at least link them up later.
Do you want to display that protein in its genomic environment as well?
Or do you just want to have the protein on its own with the domains?
Could you send us a mockup of how you'd like to have this type of
information (i.e. proteins and their domains) represented? Just a simple
drawing will do. I haven't had to do this type of visualization yet
myself, so would be interested in how you experts would like to do that.

Concerning your second question: it looks like you're referencing a
version of the library that I sent out a while ago on the mailing list.
All code development is now run via rubyforge. The moment I put it on
rubyforge the namespace was changed from BioExt (*bio*ruby *ext*ensions)
to just Bio. Did you install a version via rubyforge (i.e. following the
instructions on bio-graphics.rubyforge.org)? If so: change all
references to BioExt::Graphics to Bio::Graphics.
So the line
  my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600)
would become
  my_panel = Bio::Graphics::Panel.new(1000, 1200, false, 1, 600)

jan.

PS: I've CC'd this reply to the bioruby mailing list if that's OK...

On Thu, 2007-11-08 at 09:56 +0300, George wrote:
> Hi Dr Jan.
> 
> I have a chado based database system running on ruby on rails for 
> storing sequence and annotation data.
> The Feature table contains the biological sequences represented as 
> features and the Feature location table contains the locations or bio 
> coordinates for each feature.
> Let me explain with an example, a protein sequence is a feature. call it 
> prot_A. Our Prot_A can have domains A1, A2, etc. Now these domains are 
> actually features by themselves but they happen to be located within Prot_A.
> 
> So in the feature table i have Prot_A, Domain A1, A2.
> 
> In the Feature locations table call it Featureloc, (chado style)
> 
> ---------------------------------------------
> featureloc_id|     featuresrc_id |fmin     |fmax|
> ---------------------------------------------
> 1        null        1    200
> 2        1        1    20
> 3        1        30    60
> ----------------------------------------------
> 
> My aim is to represent these features graphically such that a user can 
> view a feature with its domains.
> I would like to generate simple graphics for these features from  a gff 
> formatted file which can be created on the fly from the database tables. 
> Any idea on how i can do that in rails and using the bio-graphics module?
> 
> Secondly am getting the error 
> "F:/Netbeans_folder/vargene/lib/biographics.rb:6: uninitialized constant 
> BioExt (NameError) when i try to access the Bioext::Graphics::Panel.new 
> method while running the following code.
> 
> require 'stringio'
> require 'base64'
> gem 'bio-graphics'
> require 'bio-graphics'
> 
> my_panel = BioExt::Graphics::Panel.new(1000, 1200, false, 1, 600)
>        
>        #Create and configure tracks
>        track_SNP = my_panel.add_track('SNP')
>        track_gene = my_panel.add_track('gene')
>        track_transcript = my_panel.add_track('transcript')
>        
>        track_SNP.feature_colour = [1,0,0]
>       track_SNP.feature_glyph = 'triangle'
>       track_gene.feature_glyph = 'directed_spliced'
>       track_transcript.feature_glyph = 'spliced'
>       track_transcript.feature_colour = [0,0.5,0]
>       
>       # Add data to tracks
>       DATA.each do |line|
>         line.chomp!
>         ref, type, name, location, link = line.split(/\s+/)
>         if link == ''
>           link = nil
>         end
>         if type == 'SNP'
>           track_SNP.add_feature(name, location, link)
>         elsif type == 'gene'
>           track_gene.add_feature(name, location, link)
>         elsif type == 'transcript'
>           track_transcript.add_feature(name, location, link)
>         end
>       end
>       
>       # And draw
>       my_panel.draw('c:/my_panel.png')
>       
>       __END__
>       chr1  gene        CYP2D6      complement(80..120)
>       chr1  gene        ALDH        100..449
>       chr1  SNP         rs1234      107
>       chr1  gene        bla         complement(400..430)
>       chr1  SNP         rs9876      44
>       chr1  gene        some_gene   
> complement(join(170..231,264..299,350..360,409..445))
>       chr1  transcript  transcript1 join(250..300,390..425)
>       chr1  transcript  transcript2 253..330
>       chr1  transcript  transcript3 266..344
>       chr1  transcript  transcript4 
> complement(join(410..430,239..286,129..151))
> 
> Is the Bioext module really available within the current implementation 
> of the biographics gem?
> 
> Thanks
> 
> George
> 
-- 

Dr Jan Aerts
Bioinformatics Group
Roslin Institute
Roslin EH25 9PS
Scotland, UK
tel: +44 131 527 4198
skype: aerts_ri
website: http://saaientist.blogspot.com

----...and the obligatory disclaimer----
Roslin Institute is a company limited by guarantee, registered in
Scotland (registered number SC157100) and a Scottish Charity (registered
number SC023592). Our registered office is at Roslin, Midlothian, EH25
9PS. VAT registration number 847380013.

The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of  the addressee only. The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute.


From ngoto at gen-info.osaka-u.ac.jp  Fri Nov  9 12:30:10 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Fri, 9 Nov 2007 21:30:10 +0900
Subject: [BioRuby] use of CGI.escape in Bio::Pubmed.esearch
In-Reply-To: <473039B8.1060600@staff.science.uva.nl>
References: <473039B8.1060600@staff.science.uva.nl>
Message-ID: <20071109123012.8128D1CBC408@idnmail.gen-info.osaka-u.ac.jp>

Hi,

On Tue, 06 Nov 2007 10:54:00 +0100
Kaustubh Patil <kpatil at science.uva.nl> wrote:

> Hi,
> 
> I would like to thank you for the BioRuby library, it is a very useful 
> tool. I am doing some literature mining using Ruby and I use PubMed as 
> my source. Here is some background for my question;
> 
> It is not possible to search PubMed with logical operators, e.g.  
> HIV+AND+drug or geneA+OR+geneB etc. using Bio::PubMed.esearch (it 
> returns empty result).

Probably you mean

 Bio::PubMed.esearch("HIV AND drug")
 Bio::PubMed.esearch("geneA OR geneB")

More complicated example:

 Bio::PubMed.esearch("((p53 AND apoptosis) 2007/11[dp]) OR bioperl")

You can use the same search terms as of NCBI PubMed seaech
without any care about URL encoding.

> It is due to the url encoding (i.e. CGI.escape) of the search term 
> (approx. line 89 in pubmed.rb). If we remove this url encoding it is 
> possible to make such queries.
> 
> Now my question is, is it safe to remove this CGI.escape ?

I think it is unsafe and should not be removed.

-- 
Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp


From n at bioruby.org  Fri Nov  9 14:30:24 2007
From: n at bioruby.org (Mitsuteru Nakao)
Date: Fri, 9 Nov 2007 23:30:24 +0900
Subject: [BioRuby] English translation
In-Reply-To: <4732B63B.8030702@gmail.com>
References: <4732B63B.8030702@gmail.com>
Message-ID: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>

Hi George,

Of course OK.
Please let me know the URL of my blog you mention. :-)

On 11/8/07, George <georgkam at gmail.com> wrote:
> Hi Nakao,
> Please how can i translate your blog to English?

Thanks
Mitsuteru
-
Mitsuteru Nakao
mn at kazusa.or.jp / n at bioruby.org


From georgkam at gmail.com  Sat Nov 10 08:39:24 2007
From: georgkam at gmail.com (George Githinji)
Date: Sat, 10 Nov 2007 11:39:24 +0300
Subject: [BioRuby] English translation
In-Reply-To: <90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>
References: <4732B63B.8030702@gmail.com>
	<90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>
Message-ID: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com>

Hi Nakao
The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/

On Nov 9, 2007 5:30 PM, Mitsuteru Nakao <n at bioruby.org> wrote:

> Hi George,
>
> Of course OK.
> Please let me know the URL of my blog you mention. :-)
>
> On 11/8/07, George <georgkam at gmail.com> wrote:
> > Hi Nakao,
> > Please how can i translate your blog to English?
>
> Thanks
> Mitsuteru
> -
> Mitsuteru Nakao
> mn at kazusa.or.jp / n at bioruby.org
>


-- 
---------------
Sincerely
George

Skype: george_g2
Website: http://biorelated.wordpress.com/


From ktym at hgc.jp  Sat Nov 10 08:40:09 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sat, 10 Nov 2007 17:40:09 +0900
Subject: [BioRuby] count parameter in Bio::PubMed.esearch
In-Reply-To: <47303B4E.8020103@staff.science.uva.nl>
References: <47303B4E.8020103@staff.science.uva.nl>
Message-ID: <CE62D4B8-EC1B-4454-A8C9-D4CB756129AF@hgc.jp>

Hi Kaustubh,

Thank you for your suggestion. I applied your changes to the CVS.

During this process, I found that the previous fix applied by Jan was wrong.
Developers, please do the test before you commit your changes. :)

The change should be made to the Bio::PubMed.query method,
however, the search method is also needed to be rewritten
because the HTML structure returned by NCBI was reformatted.

Anyway, in Bio::PubMed module, use of the esearch/efetch methods pair is
strongly recommended compared to the search/query methods pair.


bioruby> Bio::PubMed.search("(genome AND analysis) OR bioinformatics)")
  ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368"]

bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)")
  ==> ["17989981", "17989975", "17989954", "17989953", "17989781", "17989717", "17989252", "17989247", "17989233", "17989226", "17989095", "17989092", "17989061", "17989054", "17988782", "17988704", "17988577", "17988401", "17988398", "17988368", "17988176", "17988086", "17987666", "17987374", "17987257", "17987048", "17986781", "17986522", "17986471", "17986460", "17986440", "17986356", "17986355", "17986329", "17986320", "17986282", "17986185", "17986079", "17985162", "17984568", "17984549", "17984548", "17984520", "17984228", "17984226", "17984208", "17984205", "17984085", "17984084", "17984080", "17983847", "17983807", "17983802", "17983573", "17983493", "17983269", "17983268", "17983157", "17982457", "17982456", "17982442", "17982427", "17982176", "17982123", "17981990", "17981981", "17981974", "17981891", "17981844", "17981816", "17981801", "17981746", "17981579", "17981546", "17981477", "17981060", "17981052", "17980519", "17980517", "17980477", "17980146", "17980047", "17980028", "17980019", "17979886", "17979725", "17979297", "17979181", "17978887", "17978880", "17978572", "17978498", "17978310", "17978184", "17978179", "17977886", "17977881", "17977850", "17977831", "17977670"]

bioruby> Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)", {'rettype' => 'count'})
  ==> 286139

Regards,
Toshiaki Katayama


On 2007/11/06, at 19:00, Kaustubh Patil wrote:

> Hi,
>
> Here is a suggestion/feature for Bio::PubMed.esearch.
>
> Currently it is not possible to use rettype=count (through options hash) in Bio::PubMed.esearch.
>
> To get this feature replace the following line in pubmed.rb (approx. line 97)
>
> result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
>
> by
>
> if(hash['rettype']=="count")
>        result = result.scan(/<Count>(.*?)<\/Count>/m).flatten
>        result = result[0]
> else
>        result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
> end
>
>
> and it will return the count as a string, which can be easily converted to an integer by "result.to_i"
>
> I hope it is useful.
>
> Cheers,
> Kaustubh Patil
>
> PS: for more details on Entrez esearch parameters, please refer to;
>
> http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From ktym at hgc.jp  Sun Nov 11 14:10:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sun, 11 Nov 2007 23:10:34 +0900
Subject: [BioRuby] English translation
In-Reply-To: <55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com>
References: <4732B63B.8030702@gmail.com>
	<90ca35f70711090630h26c32d1ejec05cb419ebfe4@mail.gmail.com>
	<55915f820711100039o54904cf5o94dcd8690e60c1b3@mail.gmail.com>
Message-ID: <C4373E4C-539E-44D7-89C8-D5DAE761260F@hgc.jp>

Hi George,

Which did you mean?

1. you just want to read his blog in English
2. you want to translate his blog and make it publicly available

In the case of 1, you can use free web translators

http://www.google.com/language_tools
http://babelfish.altavista.com/
http://www.worldlingo.com/en/products_services/worldlingo_translator.html
http://www.freetranslation.com/
http://www.excite.co.jp/world/url/

quality of those machine translation are not good, though.

In the case of 2, you can do it freely as Mitsuteru wrote.

Toshiaki

On 2007/11/10, at 17:39, George Githinji wrote:

> Hi Nakao
> The blog address is: http://bioruby.g.hatena.ne.jp/nakao_mitsuteru/
>
> On Nov 9, 2007 5:30 PM, Mitsuteru Nakao <n at bioruby.org> wrote:
>
>> Hi George,
>>
>> Of course OK.
>> Please let me know the URL of my blog you mention. :-)
>>
>> On 11/8/07, George <georgkam at gmail.com> wrote:
>>> Hi Nakao,
>>> Please how can i translate your blog to English?
>>
>> Thanks
>> Mitsuteru
>> -
>> Mitsuteru Nakao
>> mn at kazusa.or.jp / n at bioruby.org
>>
>
>
>
> -- 
> ---------------
> Sincerely
> George
>
> Skype: george_g2
> Website: http://biorelated.wordpress.com/
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From ktym at hgc.jp  Sun Nov 11 15:12:44 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Mon, 12 Nov 2007 00:12:44 +0900
Subject: [BioRuby] transcription factor binding site identification
In-Reply-To: <01bb01c81f21$d871d620$0500a8c0@berndhome>
References: <01bb01c81f21$d871d620$0500a8c0@berndhome>
Message-ID: <82F3F8EA-81DC-4B60-9715-8E968F123975@hgc.jp>

Hi,

If you want to search with TRANSFAC motifs, you can use the tfscan command in the EMBOSS package.

Otherwise, you may need to define your own algorithm to search your motif.
If your motif is in profile format, you need to develop profile search method.
If your motif is simple and can be converted to regexp, the task would be relatively easy.

  # to find all occurrences
  results = seq.scan(regexp)

  # to find positions of match
  pos = 0
  while pos = seq.index(regexp, pos + 1)
    puts pos
  end

You may also interested in the Bio::Sequence#window_search method.

Thanks,
Toshiaki

On 2007/11/05, at 5:32, Bernd Jagla wrote:

> Hi there, 
>
>
>
> Is it possible with bioruby/ruby to scan a nucleotide sequence and search
> for binding sites of TFs?
>
>
>
> How would I do this? (I looked in the documentation but couldn't find it.)
>
>
>
> Thanks,
>
>
>
> Bernd
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From kpatil at science.uva.nl  Mon Nov 12 11:09:33 2007
From: kpatil at science.uva.nl (Kaustubh Patil)
Date: Mon, 12 Nov 2007 12:09:33 +0100
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
Message-ID: <4738346D.4060206@staff.science.uva.nl>

Hi,

XML is very nice for searching etc. PubMed documents can be fetched in 
various formats, including xml. I have changed the efetch method in 
Bio::PubMed class in order to implement this. Here is the modified method;

# Kaustubh Patil: 6 Nov. 2007
# options hash here is different than options hash in esearch
    def self.efetch(ids, hash = {} )
      return [] if ids.empty?

      # default options
      hash['retmode'] = 'xml' unless hash['retmode']
      hash['rettype'] = 'medline' unless hash['rettype']

      # create options array in required format
      opts = []
      hash.each do |k, v|
        opts << "#{k}=#{v}"
      end

      host = "eutils.ncbi.nlm.nih.gov"
      path = 
"/entrez/eutils/efetch.fcgi?tool=bioruby&db=pubmed&#{opts.join('&')}&id="

      ids = ids.join(",")

      http = Net::HTTP.new(host)
      response, = http.get(path + ids)
      result = response.body

      if(hash['retmode']=='text')
        result = result.split(/\n\n+/)
      end

      return result

    end


I hope it is useful.

Cheers,
Kaustubh

PS: for details of entrez efetch parameters 
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchlit_help.html


From jan.aerts at bbsrc.ac.uk  Wed Nov 14 20:05:50 2007
From: jan.aerts at bbsrc.ac.uk (jan aerts (RI))
Date: Wed, 14 Nov 2007 20:05:50 -0000
Subject: [BioRuby] named arguments
Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk>

Hi staff,

We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...):

  picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green)

A good workaround is to use named parameter lists, which would make the previous code look like:

  picture.add_gene(:feature => my_gene, :colour => :green)

which is much more readable.

However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea?

Really looking forward to your comments.
jan.


From ktym at hgc.jp  Thu Nov 15 07:44:03 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 15 Nov 2007 16:44:03 +0900
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
In-Reply-To: <4738346D.4060206@staff.science.uva.nl>
References: <4738346D.4060206@staff.science.uva.nl>
Message-ID: <6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp>

Hi Patil,

On 2007/11/12, at 20:09, Kaustubh Patil wrote:
> XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method;

Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods.

Both methods are able to accept any E-Utils options as a hash.

I will remove the suffix "2" from these method if the following incompatibility can be accepted.

* changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility
  currently all of
   1. efetch("123")
   2. efetch("123", "456")
   3. efetch(["123", "456"])
  are accepted but 2. will be unavailable.

Other notes:

* default value for the retmode option remains "text" for the backward compatibility
* both methods are rewritten to use Bio::Command.post_form to make the code clear
* Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID)


puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)")
puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"})
puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"})

puts Bio::PubMed.efetch2("10592173")
puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"})


Thanks,
Toshiaki Katayama


From ktym at hgc.jp  Thu Nov 15 08:51:29 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 15 Nov 2007 17:51:29 +0900
Subject: [BioRuby] named arguments
In-Reply-To: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk>
References: <1F16910BB8546C4DA5526FABB0C98D09AA9A03@ebre2ksrv1.ebrc.bbsrc.ac.uk>
Message-ID: <38F1FC51-FE7D-4F85-B7D1-DC4B5777E1E6@hgc.jp>

Jan,

There are several methods which accept hash as the last argument, so you are OK to proceed with it.

Toshiaki

On 2007/11/15, at 5:05, jan aerts ((RI)) wrote:

> Hi staff,
>
> We think we're getting to a good workable version of Bio::Graphics. However, we also bumped into a distinctive feature of a library like Bio::Graphics: its methods have to be highly configurable. What should be the colour of the genes, what glyph should be used (spliced, a line, ...), what's the label that should be displayed, ... As a result, the argument lists for many of the methods become unwieldingly long and cumbersome to use. This is most apparent when the user of the library wants to use all default values for a method, except one which happens to be the last one in the argument list. The user has to write code like this (a little bit exaggerated, but still...):
>
>   picture.add_gene(my_gene, nil, nil, [], nil, nil, [], [], nil, {}, [], :green)
>
> A good workaround is to use named parameter lists, which would make the previous code look like:
>
>   picture.add_gene(:feature => my_gene, :colour => :green)
>
> which is much more readable.
>
> However, I'm a bit squeemish of doing it this way, because it would be a different paradigm than the one that bioruby uses. What do you guys think about integration of bioruby and Bio::Graphics somewhere in the future? Would the fact that we'd implement named argument lists in Bio::Graphics make integration into the bioruby toolkit difficult/impossible/not a good idea?
>
> Really looking forward to your comments.
> jan.


From jan.aerts at bbsrc.ac.uk  Mon Nov 19 10:35:59 2007
From: jan.aerts at bbsrc.ac.uk (Jan Aerts)
Date: Mon, 19 Nov 2007 10:35:59 +0000
Subject: [BioRuby] [Fwd: Using BioRuby for parsing a .ptt file]
Message-ID: <1195468559.25265.11.camel@rilxvm05>

A post from Abhik Khanra.

Could anyone help him out?

Thanks,
jan.
-------- Forwarded Message --------
> From: Abhik Khanra <abhik.rk at gmail.com>
> To: jan.aerts at bbsrc.ac.uk
> Subject: Using BioRuby for parsing a .ptt file
> Date: Sat, 10 Nov 2007 07:18:05 +0530
> 
> Hi.
> 
> I came across your blog recently. It is a really good source of information.
> 
> I have a query and have posted the same in the BioRuby mailing-list too.
> It's just that i'm in a time-crunch. Hence i'm sending it to you as well.
> Hope that would not be a problem for you.
> 
> I'm working on a sample visualization application and leveraging
> BioRuby for extracting target sequence origin and endpoints from BLAST
> results. I obtained an example of this from the BioRuby tutorial.
> Could you please let me know if there is any similar example using BioRuby for
> extracting useful information from parsing a .ptt file?
> 
> Thanks
> Abhik
-- 
Dr Jan Aerts
Bioinformatics Group
Roslin Institute
Roslin EH25 9PS
Scotland, UK
tel: +44 131 527 4198

----...and the obligatory disclaimer----
Roslin Institute is a company limited by guarantee, registered in
Scotland (registered number SC157100) and a Scottish Charity (registered
number SC023592). Our registered office is at Roslin, Midlothian, EH25
9PS. VAT registration number 847380013.

The information contained in this e-mail (including any attachments) is
confidential and is intended for the use of  the addressee only. The
opinions expressed within this e-mail (including any attachments) are
the opinions of the sender and do not necessarily constitute those of
Roslin Institute (Edinburgh) ("the Institute") unless specifically
stated by a sender who is duly authorised to do so on behalf of the
Institute.


From ktym at hgc.jp  Tue Nov 20 15:38:32 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Wed, 21 Nov 2007 00:38:32 +0900
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
In-Reply-To: <4742EEE5.90400@staff.science.uva.nl>
References: <4738346D.4060206@staff.science.uva.nl>
	<6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp>
	<473C1708.9020306@staff.science.uva.nl>
	<BC34793F-F6C2-4842-8716-5B3E6BB73F99@hgc.jp>
	<4742EEE5.90400@staff.science.uva.nl>
Message-ID: <577009FA-1E77-493A-A036-B7939230345A@hgc.jp>

Hi Kaustubh,

I've just committed the change that Bio::PubMed.efetch and esearch to wait for 3 seconds between consequent queries.

I also renamed efetch2 and esearch2 (newer version, which accepts E-Util options as a hash) to efetch and esearch (old version). New version of efetch method breaks backward compatibility which could accept a list of ids as variable length arguments.

>>>>  1. efetch("123") --> OK
>>>>  2. efetch("123", "456") --> NG
>>>>  3. efetch(["123", "456"]) --> OK

Here, the pubmed IDs can be (array of) string or numeric.

By the way, currently efetch method returns the following error.

% ruby lib/bio/io/pubmed.rb
   :
--- Retrieve PubMed entry by E-Utils ---
Wed Nov 21 00:23:20 +0900 2007
1: id: 16381885 Error occurred: PubMed article server is not avaliable
Wed Nov 21 00:23:23 +0900 2007
1: id: 16381885 Error occurred: PubMed article server is not avaliable

Is this a temporal problem?
I believe efetch2 was working when I have implemented.

Regards,
Toshiaki Katayama

On 2007/11/20, at 23:27, Kaustubh Patil wrote:
> Hi Toshiaki,
>
> Thanks for your email. Please find my answers embedded below;
>
> Thanks,
> kaustubh
>
> Toshiaki Katayama wrote:
>
>> Hi Kaustubh,
>>
>> On 2007/11/15, at 18:53, Kaustubh Patil wrote:
>>
>>  
>>> Hi Toshiaki,
>>>
>>> Thank you very much for the improvements. There are some other desirable improvements;
>>>
>>> 1. PubMed has some timing restrictions on two consequitive queries. So it will be very nice if it can be implemented inside a function, like, esearch/efetch.
>>>    
>>
>> How about to have following method and call it within efetch and esearch methods before the Bio::Command.post_form?
>>
>> --------------------------------------------------
>>  # Make no more than one request every 3 seconds.                              @@ncbi_interval = 3
>>  @@last_accessed = nil
>>
>>  def wait_access
>>    if @@last_accessed
>>      duration = Time.now - @@last_accessed
>>      if duration > @@ncbi_interval
>>        sleep @@ncbi_interval - duration
>>      end
>>    else
>>      @@last_accessed = Time.now
>>    end
>>  end
>> --------------------------------------------------
>>  
> This could be a very good and quick implementation. In fact I use something similar for my usgae now.
>
>> By the way, NCBI also have another restriction:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/utils/utils_index.html
>>  
>>> Run retrieval scripts on weekends or between 9 pm and 5 am Eastern Time    weekdays for any series of more than 100 requests.
>>>
>>> Do you think this should also be taken care automatically?
>>>
>>>    
> I am aware of those restrictions. I will be very nice if this can be taken care automatically. There is a very good Library for accessing/using Medline through R, called MedlineR (btw currentl its not downloadable as their erver is down). MedlineR handles this automatically.
>
> There is another improvement I am thinking about. It is not possible to fetch a large number of documents in one go. I suppose this is mainly because on the practical restrictions on URL length, e.g. IE supports max 2,048 characters (although, I am not aware if PubMed imposes any limits). It will be useful (under some conditions) to  cut the fetches into a number of parts and then return the combined result. What do you think?
>
>>> 2. Mapping terms to MeSH (I couldn't find this!).
>>>    
>>
>>  
>> I'm not sure how to accomplish this.
>>  
> I will do bit more research on this and then get back to you.
>
>>
>>  
>>> I will post other comments as I recollect them. I have another question (though it is not very appropriate place for it);
>>>
>>> Is there any Ruby library which can do some basic text mining tasks, like, tokenization, sentence boundary discrimination etc. ?
>>>    
>>
>> I think yes, but I'm not doing text mining for now, sorry ;-)
>>  
> Yet I haven't find a Ruby library for that. I will keep on searching.
>
> Cheers,
> Kaustubh
>
>> Thanks,
>> Toshiaki
>>
>>
>>  
>>> Cheers,
>>> Kaustubh
>>>
>>> Toshiaki Katayama wrote:
>>>
>>>    
>>>> Hi Patil,
>>>>
>>>> On 2007/11/12, at 20:09, Kaustubh Patil wrote:
>>>>      
>>>>> XML is very nice for searching etc. PubMed documents can be fetched in various formats, including xml. I have changed the efetch method in Bio::PubMed class in order to implement this. Here is the modified method;
>>>>>          
>>>> Enhancement to accept retmode=xml sounds good idea, so I just committed efetch2 and esearch2 methods which can be better replacements for the efetch and esearch methods.
>>>>
>>>> Both methods are able to accept any E-Utils options as a hash.
>>>>
>>>> I will remove the suffix "2" from these method if the following incompatibility can be accepted.
>>>>
>>>> * changing efetch(*ids) to efetch(ids, hash = {}) breaks compatibility
>>>> currently all of
>>>>  1. efetch("123")
>>>>  2. efetch("123", "456")
>>>>  3. efetch(["123", "456"])
>>>> are accepted but 2. will be unavailable.
>>>>
>>>> Other notes:
>>>>
>>>> * default value for the retmode option remains "text" for the backward compatibility
>>>> * both methods are rewritten to use Bio::Command.post_form to make the code clear
>>>> * Bio::FlatFile is updated to accept recent MEDLINE entry format (UI -> PMID)
>>>>
>>>>
>>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)")
>>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"retmax" => "500"})
>>>> puts Bio::PubMed.esearch2("(genome AND analysis) OR bioinformatics)", {"rettype" => "count"})
>>>>
>>>> puts Bio::PubMed.efetch2("10592173")
>>>> puts Bio::PubMed.efetch2(["10592173", "14693808"], {"retmode" => "xml"})
>>>>
>>>>
>>>> Thanks,
>>>> Toshiaki Katayama
>>>>
>>>> _______________________________________________
>>>> BioRuby mailing list
>>>> BioRuby at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>> <kpatil.vcf>
>>>>      
>>
>>  
>> <kpatil.vcf>


From ktym at hgc.jp  Tue Nov 20 16:20:23 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Wed, 21 Nov 2007 01:20:23 +0900
Subject: [BioRuby] Bio::PubMed efetch xml support and other options
In-Reply-To: <47430436.3070005@staff.science.uva.nl>
References: <4738346D.4060206@staff.science.uva.nl>
	<6FA5536D-C4FF-49A5-AF68-ABA3E77C70FA@hgc.jp>
	<473C1708.9020306@staff.science.uva.nl>
	<BC34793F-F6C2-4842-8716-5B3E6BB73F99@hgc.jp>
	<4742EEE5.90400@staff.science.uva.nl>
	<577009FA-1E77-493A-A036-B7939230345A@hgc.jp>
	<47430436.3070005@staff.science.uva.nl>
Message-ID: <7787216D-B5E7-4EC3-B467-C62489CFDD4C@hgc.jp>

Hi,

On 2007/11/21, at 0:58, Kaustubh Patil wrote:
> The problem was temporary (solved by now). I guess it was part of maintainance.

Thank you. I've confirmed the tests are now working.

Another issue:

Most of the BioRuby classes (which access server) are designed to create a factory object first, e.g.

  server = Bio::Blast.remote(...)
  result = server.query(...)

  server = Bio::KEGG::API.new
  result = server.get_genes_by_pathway(...)

However, Bio::PubMed is not.

  result = Bio::PubMed.esearch(...)

I think this was caused only by a historical reason. Should I change this design to unify?

  server = Bio::PubMed.new
  result = server.esearch(...)

Or provides both ways - what is the most excellent way to do this (to define methods and to make them also available as class methods)?

  def esearch(args)
    # real codes
  end

  def self.esearch(args)
    self.new.esearch(args)
  end

Toshiaki

> Toshiaki Katayama wrote:
>
>> By the way, currently efetch method returns the following error.
>>
>> % ruby lib/bio/io/pubmed.rb
>>   :
>> --- Retrieve PubMed entry by E-Utils ---
>> Wed Nov 21 00:23:20 +0900 2007
>> 1: id: 16381885 Error occurred: PubMed article server is not avaliable
>> Wed Nov 21 00:23:23 +0900 2007
>> 1: id: 16381885 Error occurred: PubMed article server is not avaliable
>>
>> Is this a temporal problem?