From F.Schwach at uea.ac.uk  Mon Dec 10 12:21:31 2007
From: F.Schwach at uea.ac.uk (Schwach Frank Dr (CMP))
Date: Mon, 10 Dec 2007 17:21:31 -0000
Subject: [BioRuby] using Bio::FlatFileIndex
Message-ID: <E6E2C9D1BCA8084CB8160442A214D4969A3CC0@UEAEXCHCLUS01.UEA.AC.UK>


Hi,

I need to retrieve sequences from fasta files. In Perl I used to do this with Bio::DB:fasta but at first I couldn't find an equivalent in Bioruby and was almost about to give up and use Perl for this purpose when I found Bio::FlatFileIndex. 
Unfortunately, this class is not very well documented (unless I missed something). I think I can more or less figure out most of it from the code and the comments in the rdoc (http://bioruby.org/rdoc/classes/Bio/FlatFileIndex.html) but it would really be great to have some examples from people who are more familiar with this class, especially since I am relatively new to Ruby still.

What I want to do is simply:

1) Build an index for a directory containing a few fasta files
2) In a Rails App (or any other Ruby script): retrieve sequences by their accessions and update the index if the fasta db is updated by the user.

Some of the questions I have are:
What are the options that I can pass to the makeindex method?
In Bioperl it is possible to retrieve a subsequence straight away like this:

 my $seq_db_obj = Bio::DB::Fasta->new($path_to_db); 
 my $seq = $seq_db_obj->seq($accession, $start, $end) ; # retrieve (sub)sequence from the database

Can I do this in Ruby too or would I retrieve the entire sequence and then get the subsequence from that?

Any help and examples welcome!
Thanks a lot!


From jan.aerts at bbsrc.ac.uk  Mon Dec 10 15:43:24 2007
From: jan.aerts at bbsrc.ac.uk (jan aerts (RI))
Date: Mon, 10 Dec 2007 20:43:24 -0000
Subject: [BioRuby] rcov
Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA9A49@ebre2ksrv1.ebrc.bbsrc.ac.uk>

Just had a look at the test coverage for bioruby at http://swdev.cbri.umn.edu/rcov-bioruby20070405/

In case we've got time to spare: it would be good to get the coverage up... Just to remind everyone :-)

jan.


From ngoto at gen-info.osaka-u.ac.jp  Tue Dec 11 09:59:52 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 11 Dec 2007 23:59:52 +0900
Subject: [BioRuby] using Bio::FlatFileIndex
In-Reply-To: <E6E2C9D1BCA8084CB8160442A214D4969A3CC0@UEAEXCHCLUS01.UEA.AC.UK>
References: <E6E2C9D1BCA8084CB8160442A214D4969A3CC0@UEAEXCHCLUS01.UEA.AC.UK>
Message-ID: <20071211145953.30AF51CBC411@idnmail.gen-info.osaka-u.ac.jp>

Hi,

Indexes can be generated with a command-line application br_bioflat.rb
or within Ruby script.

Example: creates an index from command line:

% br_bioflat.rb --create --type flat --location /home/xx/dbidx \
  --dbname test --files /home/xx/test01.fst /home/xx/test02.fst

equivalent ruby script:

  require 'bio'
  is_bdb = nil # is_bdb = Bio::FlatFileIndex::MAGIC_BDB for BDB index
  dbname = '/home/xx/dbidx/test'
  format = nil # file format is automatically determined
  options = {}
  files = ['/home/xx/test01.fst', '/home/xx/test02.fst' ]
  Bio::FlatFileIndex.makeindex(is_bdb, dbname, format, options, *files)

As Bio::FlatFileIndex was first written in 2002 and is
very old, the API is ugly. In addition, its internal structure
is too complicated. It may be rewritten and the API might
be changed in the future.

Addes files to the index:

% br_bioflat.rb --update --location /home/xx/dbidx \
  --dbname test --files /home/xx/test03.fst /home/xx/test04.fst

equivalent ruby script:

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  options = {}
  files = ['/home/xx/test03.fst', '/home/xx/test04.fst' ]
  Bio::FlatFileIndex::update_index(dbname, nil, options, *files)

Re-read all files and re-generate the index:

% br_bioflat.rb --update --location /home/xx/dbidx \
  --dbname test --renew

equivalent ruby script:

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  options = {}
  options['renew'] = true
  Bio::FlatFileIndex::update_index(dbname, nil, options, [])

Note that add files or updating the flat database (without BDB)
is very slow because it actually rebuilds indexes again.


Retrieving sequences in the index:

% br_bioflat.rb --location /home/xx/dbidx --dbname test M12963

equivalent ruby script:

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  key = 'M12963'
  idx = Bio::FlatFileIndex.open(dbname)
  results = idx.search(key)
  results.each do |str|
    print str
  end
  idx.close

'results' is a Bio::FlatFileIndex::Results object.
Each search result is an string.

(For more information, please see RDoc
http://bioruby.org/rdoc/classes/Bio/FlatFileIndex/Results.html )

If you want subsequence of fasta formatted data,
for example,

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  key = 'M12963'
  result = idx.search(key)
  result.each do |str|
    ent = Bio::FastaFormat.new(str)
    # for nucleic acid sequence
    puts ent.naseq[0..100]
    # for amino acid sequence
    puts ent.aaseq[0..100]
    # nucleic or amino acid sequence
    puts ent.seq[0..100]
  end
  idx.close

Please see OBDA flat file indexing specifications
for philosophy and internal structure of index.

http://code.open-bio.org/cgi/viewcvs.cgi/obda-specs/flatfile/?cvsroot=obf-common

Thanks,

Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp


On Mon, 10 Dec 2007 17:21:31 -0000
"Schwach Frank Dr \(CMP\)" <F.Schwach at uea.ac.uk> wrote:

> 
> Hi,
> 
> I need to retrieve sequences from fasta files. In Perl I used to do this with Bio::DB:fasta but at first I couldn't find an equivalent in Bioruby and was almost about to give up and use Perl for this purpose when I found Bio::FlatFileIndex. 
> Unfortunately, this class is not very well documented (unless I missed something). I think I can more or less figure out most of it from the code and the comments in the rdoc (http://bioruby.org/rdoc/classes/Bio/FlatFileIndex.html) but it would really be great to have some examples from people who are more familiar with this class, especially since I am relatively new to Ruby still.
> 
> What I want to do is simply:
> 
> 1) Build an index for a directory containing a few fasta files
> 2) In a Rails App (or any other Ruby script): retrieve sequences by their accessions and update the index if the fasta db is updated by the user.
> 
> Some of the questions I have are:
> What are the options that I can pass to the makeindex method?
> In Bioperl it is possible to retrieve a subsequence straight away like this:
> 
>  my $seq_db_obj = Bio::DB::Fasta->new($path_to_db); 
>  my $seq = $seq_db_obj->seq($accession, $start, $end) ; # retrieve (sub)sequence from the database
> 
> Can I do this in Ruby too or would I retrieve the entire sequence and then get the subsequence from that?
> 
> Any help and examples welcome!
> Thanks a lot!
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby

From yjchenx at gmail.com  Wed Dec 12 20:54:04 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 17:54:04 -0800
Subject: [BioRuby] Parse big PDB use up all memory
Message-ID: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>

This is what I did:

require 'bio'
serv = Bio::Fetch.new()
entry = serv.fetch('pdb', '1w6k')
pdb = Bio::PDB.new(entry)

The last step use up all memory and quit.
The pdb file is quite big and I only need the information from header.
Is it possible to do something like this ?

pdb = Bio::PDB.new(entry[0-40000])

Thanx for the help

From yjchenx at gmail.com  Wed Dec 12 23:50:29 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 20:50:29 -0800
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
Message-ID: <f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>

Thank you for the hint for retrieve only header.

I am using the default Ruby on Mac OS X 10.5.
Here is the output of 'ruby -v'

ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]

And bioruby is 1.1.0 from gems.

I will test it on Linux and see.

Yen-Ju

On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:
> Hi,
>
> Could you give some more details on what system and ruby/bioruby
> version you are running? The same script uses less than 20MB on my
> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB
> files out there so if you're having trouble with this one then others
> will certainly be a problem.
>
> In answer to your second question, yes you should be able to just
> extract the header (everything up to the ATOM records). But if you're
> really running out of memory just parsing that file then I suspect you
> have deeper issues. Anyway, the sample below works for me for parsing
> the header from 1w6k:
>
> require 'bio'
>
> serv = Bio::Fetch.new
> entry = serv.fetch('pdb','1w6k')
>
> header = ''
> entry.each do |l|
>    break if l.match(/^ATOM/)
>    header << l
> end
>
> pdb = Bio::PDB.new(header)
> p pdb.accession
>
>
> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
>
> > This is what I did:
> >
> > require 'bio'
> > serv = Bio::Fetch.new()
> > entry = serv.fetch('pdb', '1w6k')
> > pdb = Bio::PDB.new(entry)
> >
> > The last step use up all memory and quit.
> > The pdb file is quite big and I only need the information from header.
> > Is it possible to do something like this ?
> >
> > pdb = Bio::PDB.new(entry[0-40000])
> >
> > Thanx for the help
> > _______________________________________________
> > BioRuby mailing list
> > BioRuby at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioruby
> >
>
> Alex Gutteridge
>
> Bioinformatics Center
> Kyoto University
>
>
>

From yjchenx at gmail.com  Thu Dec 13 00:22:36 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 21:22:36 -0800
Subject: [BioRuby] Detect error from Bio::Fetch
Message-ID: <f514e4aa0712122122j473bd8f0n17d2746210337f22@mail.gmail.com>

This is the script I run:

require 'bio'
serv = Bio::Fetch.new()
entry = serv.fetch('swissprot', 'not_existing_id')
swissprot = Bio::SwissProt.new(entry)
p swissprot.entry_name # <== Error raises here

The problem is that Bio.Fetch does not raise an exception or something
else to notify that it cannot find the entry in database. An error
shows up only at 'swissprot.entry_name'. It would be nice to detect
the error early on, either in Bio::Fetch.fetch() or
Bio::SwissProt.new().

Yen-Ju

From alexg at kuicr.kyoto-u.ac.jp  Thu Dec 13 00:22:59 2007
From: alexg at kuicr.kyoto-u.ac.jp (Alex Gutteridge)
Date: Thu, 13 Dec 2007 14:22:59 +0900
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
	<f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
	<f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>
Message-ID: <20495B39-57E6-46C4-87AF-24B041CBA54D@kuicr.kyoto-u.ac.jp>

Yup, I see the same behavior on linux and osx. Bio::PDB.new kills irb  
but runs fine in a script. Thanks for the bug report. I'll see if I  
can identify what's going on.

AlexG

On 13 Dec 2007, at 14:11, Yen-Ju Chen wrote:

> I did a quick test and found the problem is that I ran it in irb.
> If I run it in script, like 'ruby test.rb', then it works fine.
>
> Yen-Ju
>
> On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
>> Thank you for the hint for retrieve only header.
>>
>> I am using the default Ruby on Mac OS X 10.5.
>> Here is the output of 'ruby -v'
>>
>> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
>>
>> And bioruby is 1.1.0 from gems.
>>
>> I will test it on Linux and see.
>>
>> Yen-Ju
>>
>>
>> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto- 
>> u.ac.jp> wrote:
>>> Hi,
>>>
>>> Could you give some more details on what system and ruby/bioruby
>>> version you are running? The same script uses less than 20MB on my
>>> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
>>> seem so bad. Also 1w6k is biggish, but there are certainly bigger  
>>> PDB
>>> files out there so if you're having trouble with this one then  
>>> others
>>> will certainly be a problem.
>>>
>>> In answer to your second question, yes you should be able to just
>>> extract the header (everything up to the ATOM records). But if  
>>> you're
>>> really running out of memory just parsing that file then I suspect  
>>> you
>>> have deeper issues. Anyway, the sample below works for me for  
>>> parsing
>>> the header from 1w6k:
>>>
>>> require 'bio'
>>>
>>> serv = Bio::Fetch.new
>>> entry = serv.fetch('pdb','1w6k')
>>>
>>> header = ''
>>> entry.each do |l|
>>>   break if l.match(/^ATOM/)
>>>   header << l
>>> end
>>>
>>> pdb = Bio::PDB.new(header)
>>> p pdb.accession
>>>
>>>
>>> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
>>>
>>>> This is what I did:
>>>>
>>>> require 'bio'
>>>> serv = Bio::Fetch.new()
>>>> entry = serv.fetch('pdb', '1w6k')
>>>> pdb = Bio::PDB.new(entry)
>>>>
>>>> The last step use up all memory and quit.
>>>> The pdb file is quite big and I only need the information from  
>>>> header.
>>>> Is it possible to do something like this ?
>>>>
>>>> pdb = Bio::PDB.new(entry[0-40000])
>>>>
>>>> Thanx for the help
>>>> _______________________________________________
>>>> BioRuby mailing list
>>>> BioRuby at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>>
>>>
>>> Alex Gutteridge
>>>
>>> Bioinformatics Center
>>> Kyoto University
>>>
>>>
>>>
>>
>

Alex Gutteridge

Bioinformatics Center
Kyoto University


From yjchenx at gmail.com  Thu Dec 13 00:11:33 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 21:11:33 -0800
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
	<f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
Message-ID: <f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>

I did a quick test and found the problem is that I ran it in irb.
If I run it in script, like 'ruby test.rb', then it works fine.

Yen-Ju

On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
> Thank you for the hint for retrieve only header.
>
> I am using the default Ruby on Mac OS X 10.5.
> Here is the output of 'ruby -v'
>
> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
>
> And bioruby is 1.1.0 from gems.
>
> I will test it on Linux and see.
>
> Yen-Ju
>
>
> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:
> > Hi,
> >
> > Could you give some more details on what system and ruby/bioruby
> > version you are running? The same script uses less than 20MB on my
> > machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> > seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB
> > files out there so if you're having trouble with this one then others
> > will certainly be a problem.
> >
> > In answer to your second question, yes you should be able to just
> > extract the header (everything up to the ATOM records). But if you're
> > really running out of memory just parsing that file then I suspect you
> > have deeper issues. Anyway, the sample below works for me for parsing
> > the header from 1w6k:
> >
> > require 'bio'
> >
> > serv = Bio::Fetch.new
> > entry = serv.fetch('pdb','1w6k')
> >
> > header = ''
> > entry.each do |l|
> >    break if l.match(/^ATOM/)
> >    header << l
> > end
> >
> > pdb = Bio::PDB.new(header)
> > p pdb.accession
> >
> >
> > On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
> >
> > > This is what I did:
> > >
> > > require 'bio'
> > > serv = Bio::Fetch.new()
> > > entry = serv.fetch('pdb', '1w6k')
> > > pdb = Bio::PDB.new(entry)
> > >
> > > The last step use up all memory and quit.
> > > The pdb file is quite big and I only need the information from header.
> > > Is it possible to do something like this ?
> > >
> > > pdb = Bio::PDB.new(entry[0-40000])
> > >
> > > Thanx for the help
> > > _______________________________________________
> > > BioRuby mailing list
> > > BioRuby at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioruby
> > >
> >
> > Alex Gutteridge
> >
> > Bioinformatics Center
> > Kyoto University
> >
> >
> >
>

From alexg at kuicr.kyoto-u.ac.jp  Wed Dec 12 22:49:04 2007
From: alexg at kuicr.kyoto-u.ac.jp (Alex Gutteridge)
Date: Thu, 13 Dec 2007 12:49:04 +0900
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
Message-ID: <16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>

Hi,

Could you give some more details on what system and ruby/bioruby  
version you are running? The same script uses less than 20MB on my  
machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't  
seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB  
files out there so if you're having trouble with this one then others  
will certainly be a problem.

In answer to your second question, yes you should be able to just  
extract the header (everything up to the ATOM records). But if you're  
really running out of memory just parsing that file then I suspect you  
have deeper issues. Anyway, the sample below works for me for parsing  
the header from 1w6k:

require 'bio'

serv = Bio::Fetch.new
entry = serv.fetch('pdb','1w6k')

header = ''
entry.each do |l|
   break if l.match(/^ATOM/)
   header << l
end

pdb = Bio::PDB.new(header)
p pdb.accession

On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:

> This is what I did:
>
> require 'bio'
> serv = Bio::Fetch.new()
> entry = serv.fetch('pdb', '1w6k')
> pdb = Bio::PDB.new(entry)
>
> The last step use up all memory and quit.
> The pdb file is quite big and I only need the information from header.
> Is it possible to do something like this ?
>
> pdb = Bio::PDB.new(entry[0-40000])
>
> Thanx for the help
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>

Alex Gutteridge

Bioinformatics Center
Kyoto University


From odo at mac.com  Thu Dec 13 03:23:48 2007
From: odo at mac.com (Florian Odronitz)
Date: Thu, 13 Dec 2007 09:23:48 +0100
Subject: [BioRuby] Proton Nomenclature in PDB
In-Reply-To: <mailman.388.1197521447.847.bioruby@lists.open-bio.org>
References: <mailman.388.1197521447.847.bioruby@lists.open-bio.org>
Message-ID: <3A227C17-9C34-42BF-80C6-B96467573291@mac.com>

Hi,

I am using Bio::PDB in my NMR-related software project. I was  
encountering a problem with the naming of protons that were generated  
by PyMol and MolMol and wrote a method to rename the protons according  
to BMRB nomenclature (http://www.bmrb.wisc.edu/ref_info/statsel.htm).
If anyone thinks this could be useful to others, I would like to  
contribute  it to BioRuby. Or is it to specific? Maybe I could do it  
in a more general way since it also involves things like bonding which  
are, to my understanding, not implemented yet. Who would be the right  
person to talk to?

Thanks,
Florian


From ktym at hgc.jp  Fri Dec 14 12:20:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sat, 15 Dec 2007 02:20:34 +0900
Subject: [BioRuby] BioRuby 1.2.0 is released
Message-ID: <AA76D6A5-6C52-465C-BC93-6C06CE9F51F2@hgc.jp>

Hi all,

I just released the BioRuby 1.2.0 at http://bioruby.org/archive/bioruby-1.2.0.tar.gz

  http://bioruby.org/
  http://bioruby.org/rdoc/
  http://rubyforge.org/projects/bioruby/
  http://raa.ruby-lang.org/project/bioruby/

I also put RubyGems pacakge at RubyForge as always.

  % sudo gem update bio

Here is a brief summary of updates snipped from the ChangeLog file.

        * BioRuby 1.2.0 released

          * BioRuby shell is improved
            * file save functionality is fixed
            * deprecated require_gem is changed to gem to suppress warnings
            * deprecated end_form_tag is rewrited to suppress warnings
            * images for Rails shell are separated to the bioruby directory
            * spinner is shown during the evaluation
            * background image in the textarea is removed for the visibility
          * Bio::Blast is fixed to parse -m 8 formatted result correctly
          * Bio::PubMed is rewrited to enhance its functionality
            * e.g. 'rettype' => 'count' and 'retmode' => 'xml' are available
          * Bio::FlatFile is improved to accept recent MEDLINE format
          * Bio::KEGG::COMPOUND is enhanced to utilize REMARK field
          * Bio::KEGG::API is fixed to skip filter when the value is Fixnum
          * A number of minor bug fixes

Hope you enjoy.

Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614
fax://+81-3-5449-5434
http://www.hgc.jp/ (Human Genome Center)
http://bioruby.org/ (BioRuby project)
http://das.hgc.jp/ (KEGG DAS)
http://www.genome.jp/kegg/soap/ (KEGG API)


From raoul.bonnal at itb.cnr.it  Fri Dec 14 08:50:30 2007
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Fri, 14 Dec 2007 14:50:30 +0100
Subject: [BioRuby] FlatFile loading genbank, the last entry is a fake
In-Reply-To: <f514e4aa0712122122j473bd8f0n17d2746210337f22@mail.gmail.com>
References: <f514e4aa0712122122j473bd8f0n17d2746210337f22@mail.gmail.com>
Message-ID: <1197640230.10347.15.camel@Graco>

Downloading the AJ561198's genbank file from ncbi and loading it with

data=Bio::FlatFile.auto("AJ561198.gb")

data.each_entry do |entry|
	puts entry.entry_id
end

You get

AJ561198
nil

I think the parser identify the "\n" at the end of the genbank (after
"//\n") and think there is another entry, but it's wrong.
Deleting the last line, works.

--
Ra


From ktym at hgc.jp  Fri Dec 14 17:31:11 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sat, 15 Dec 2007 07:31:11 +0900
Subject: [BioRuby] Fwd:  BioRuby 1.2.0 is released
References: <1F16910BB8546C4DA5526FABB0C98D09AA9A53@ebre2ksrv1.ebrc.bbsrc.ac.uk>
Message-ID: <2D2BADE4-A31A-4356-9820-FC700AEE903C@hgc.jp>

Hi all,

Does anybody has the same problem on Linux/Windows?

Toshiaki

Begin forwarded message:

> From: "jan aerts (RI)" <jan.aerts at bbsrc.ac.uk>
> Date: 2007?12?15? 5:50:42:JST
> To: "Toshiaki Katayama" <ktym at hgc.jp>
> Cc: <n at bioruby.org>
> Subject: RE: [BioRuby] BioRuby 1.2.0 is released
>
> Ubuntu 7.10 (Gutsy Gibbon).
> ruby 1.8.6
> soap4r 1.5.5-1 (apt-get package)
>
> j.
>
>
> -----Original Message-----
> From: Toshiaki Katayama [mailto:ktym at hgc.jp]
> Sent: Fri 14/12/2007 18:42
> To: jan aerts (RI)
> Cc: n at bioruby.org
> Subject: Re: [BioRuby] BioRuby 1.2.0 is released
>
> Jan,
>
> In my environment (OS X Leopard), I have no errors on all tests in BioRuby 1.2.0 with Ruby 1.8.6
> What kind of environment do you use?
>
> Regards,
> Toshiaki
>
> On 2007/12/15, at 3:28, jan aerts (RI) wrote:
>
>> Thanks T.
>>
>> Good to see a new release is out.
>>
>> I noticed that the test/functional/bio/io/test_soapwsdl.rb test returned errors. All 4 tests in that testfile give the following error:
>>
>> NoMethodError: undefined method `location=' for nil:NilClass
>>   /usr/lib/ruby/1.8/wsdl/xmlSchema/importer.rb:31:in `import'
>>   /usr/lib/ruby/1.8/wsdl/importer.rb:18:in `import'
>>   /usr/lib/ruby/1.8/soap/wsdlDriver.rb:124:in `import'
>>   /usr/lib/ruby/1.8/soap/wsdlDriver.rb:28:in `initialize'
>>   ../../../../lib/bio/io/soapwsdl.rb:63:in `new'
>>   ../../../../lib/bio/io/soapwsdl.rb:63:in `create_driver'
>>   ../../../../lib/bio/io/soapwsdl.rb:57:in `initialize'
>>   ./test_soapwsdl.rb:25:in `new'
>>   ./test_soapwsdl.rb:25:in `setup'
>>
>> jan.
>>
>>
>> -----Original Message-----
>> From: bioruby-bounces at lists.open-bio.org on behalf of Toshiaki Katayama
>> Sent: Fri 14/12/2007 17:20
>> To: BioRuby; bioruby-ja at lists.open-bio.org
>> Subject: [BioRuby] BioRuby 1.2.0 is released
>>
>> Hi all,
>>
>> I just released the BioRuby 1.2.0 at http://bioruby.org/archive/bioruby-1.2.0.tar.gz
>>
>> http://bioruby.org/
>> http://bioruby.org/rdoc/
>> http://rubyforge.org/projects/bioruby/
>> http://raa.ruby-lang.org/project/bioruby/
>>
>> I also put RubyGems pacakge at RubyForge as always.
>>
>> % sudo gem update bio
>>
>> Here is a brief summary of updates snipped from the ChangeLog file.
>>
>>       * BioRuby 1.2.0 released
>>
>>         * BioRuby shell is improved
>>           * file save functionality is fixed
>>           * deprecated require_gem is changed to gem to suppress warnings
>>           * deprecated end_form_tag is rewrited to suppress warnings
>>           * images for Rails shell are separated to the bioruby directory
>>           * spinner is shown during the evaluation
>>           * background image in the textarea is removed for the visibility
>>         * Bio::Blast is fixed to parse -m 8 formatted result correctly
>>         * Bio::PubMed is rewrited to enhance its functionality
>>           * e.g. 'rettype' => 'count' and 'retmode' => 'xml' are available
>>         * Bio::FlatFile is improved to accept recent MEDLINE format
>>         * Bio::KEGG::COMPOUND is enhanced to utilize REMARK field
>>         * Bio::KEGG::API is fixed to skip filter when the value is Fixnum
>>         * A number of minor bug fixes
>>
>> Hope you enjoy.
>>
>> Regards,
>> Toshiaki Katayama
>> --
>> Human Genome Center, Institute of Medical Science, University of Tokyo
>> 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
>> tel://+81-3-5449-5614
>> fax://+81-3-5449-5434
>> http://www.hgc.jp/ (Human Genome Center)
>> http://bioruby.org/ (BioRuby project)
>> http://das.hgc.jp/ (KEGG DAS)
>> http://www.genome.jp/kegg/soap/ (KEGG API)
>>
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
>


From ngoto at gen-info.osaka-u.ac.jp  Tue Dec 18 08:55:57 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 18 Dec 2007 22:55:57 +0900
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <20495B39-57E6-46C4-87AF-24B041CBA54D@kuicr.kyoto-u.ac.jp>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
	<f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
	<f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>
	<20495B39-57E6-46C4-87AF-24B041CBA54D@kuicr.kyoto-u.ac.jp>
Message-ID: <20071218135558.4880D1CBC43F@idnmail.gen-info.osaka-u.ac.jp>

Hi,

Objects inside Bio::PDB often refer another objects
in the same Bio::PDB object, and this might cause
infinite recursion in Bio::PDB#inspect.

To define customized Bio::PDB#inspect seems to prevent
the memory exhaust problem.

  class Bio::PDB
    # returns a string containing human-readable representation
    # of this object.
    def inspect
      "#<#{self.class.to_s} entry_id=#{entry_id.inspect}>"
    end
  end

I also defined Bio::PDB::(Model|Chain|Residue)#inspect 
like above, and committed them into CVS.

Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp

On Thu, 13 Dec 2007 14:22:59 +0900
Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:

> Yup, I see the same behavior on linux and osx. Bio::PDB.new kills irb  
> but runs fine in a script. Thanks for the bug report. I'll see if I  
> can identify what's going on.
> 
> AlexG
> 
> On 13 Dec 2007, at 14:11, Yen-Ju Chen wrote:
> 
> > I did a quick test and found the problem is that I ran it in irb.
> > If I run it in script, like 'ruby test.rb', then it works fine.
> >
> > Yen-Ju
> >
> > On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
> >> Thank you for the hint for retrieve only header.
> >>
> >> I am using the default Ruby on Mac OS X 10.5.
> >> Here is the output of 'ruby -v'
> >>
> >> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
> >>
> >> And bioruby is 1.1.0 from gems.
> >>
> >> I will test it on Linux and see.
> >>
> >> Yen-Ju
> >>
> >>
> >> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto- 
> >> u.ac.jp> wrote:
> >>> Hi,
> >>>
> >>> Could you give some more details on what system and ruby/bioruby
> >>> version you are running? The same script uses less than 20MB on my
> >>> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> >>> seem so bad. Also 1w6k is biggish, but there are certainly bigger  
> >>> PDB
> >>> files out there so if you're having trouble with this one then  
> >>> others
> >>> will certainly be a problem.
> >>>
> >>> In answer to your second question, yes you should be able to just
> >>> extract the header (everything up to the ATOM records). But if  
> >>> you're
> >>> really running out of memory just parsing that file then I suspect  
> >>> you
> >>> have deeper issues. Anyway, the sample below works for me for  
> >>> parsing
> >>> the header from 1w6k:
> >>>
> >>> require 'bio'
> >>>
> >>> serv = Bio::Fetch.new
> >>> entry = serv.fetch('pdb','1w6k')
> >>>
> >>> header = ''
> >>> entry.each do |l|
> >>>   break if l.match(/^ATOM/)
> >>>   header << l
> >>> end
> >>>
> >>> pdb = Bio::PDB.new(header)
> >>> p pdb.accession
> >>>
> >>>
> >>> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
> >>>
> >>>> This is what I did:
> >>>>
> >>>> require 'bio'
> >>>> serv = Bio::Fetch.new()
> >>>> entry = serv.fetch('pdb', '1w6k')
> >>>> pdb = Bio::PDB.new(entry)
> >>>>
> >>>> The last step use up all memory and quit.
> >>>> The pdb file is quite big and I only need the information from  
> >>>> header.
> >>>> Is it possible to do something like this ?
> >>>>
> >>>> pdb = Bio::PDB.new(entry[0-40000])
> >>>>
> >>>> Thanx for the help
> >>>> _______________________________________________
> >>>> BioRuby mailing list
> >>>> BioRuby at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>>>
> >>>
> >>> Alex Gutteridge
> >>>
> >>> Bioinformatics Center
> >>> Kyoto University
> >>>
> >>>
> >>>
> >>
> >
> 
> Alex Gutteridge
> 
> Bioinformatics Center
> Kyoto University
> 
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From yjchen at reciprocallattice.com  Tue Dec 18 16:54:34 2007
From: yjchen at reciprocallattice.com (Yen-Ju Chen)
Date: Tue, 18 Dec 2007 13:54:34 -0800
Subject: [BioRuby] A Rails application with BioRuby
Message-ID: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>

Hi,
  I am working on a rails application using BioRuby to collect references
and database entries.
  You can find the application (not source code yet) at
journalclub.reciprocallattice.com
  It is still at early stage. I use it personally and figure it would be
interesting to have more users.
  If you want to join, please write to me in private so that it will not
pollute BioRuby maillist.
  I don't know how many users the application can take. Please see the
website for more details.

  These are things related to BioRuby,
  * The output from Reference to BibTex format lacks abstract.
  * It would be nice to be able to output to RIS format for EndNote and
ReferenceManager.
  * Is it possible to get DOI from PubMed ?
  * BioRuby can get information from many databases through biofetch,
    but not processing them, like Pfam, Prosite, etc.
  * it is not clear what's the database from biofetch, for example: rn, rp,
str, pr.
    I am in structural biology. Many of these abbreviation is not obvious.

  If I have chance to write codes for these missing features, I will submit
them back to BioRuby.
  Have fun.

  Yen-Ju

From sgujja at broad.mit.edu  Wed Dec 19 11:03:24 2007
From: sgujja at broad.mit.edu (Sharvari Gujja)
Date: Wed, 19 Dec 2007 11:03:24 -0500
Subject: [BioRuby] how to retrieve a genbank record by GI
Message-ID: <476940CC.6000803@broad.mit.edu>

Hi all,

I am new to Ruby and Bioruby and am amazed at how simple and yet 
powerful is it.

I am trying to access a genbank record (NCBI) by GI number. I have tried 
Bio::Fetch, Bio::Registry but none seems to work.

Any help is appreciated.

Thanks
-S

From robert.citek at gmail.com  Wed Dec 19 14:39:01 2007
From: robert.citek at gmail.com (Robert Citek)
Date: Wed, 19 Dec 2007 13:39:01 -0600
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <476940CC.6000803@broad.mit.edu>
References: <476940CC.6000803@broad.mit.edu>
Message-ID: <4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>

On Dec 19, 2007 10:03 AM, Sharvari Gujja <sgujja at broad.mit.edu> wrote:
> I am new to Ruby and Bioruby and am amazed at how simple and yet
> powerful is it.
>
> I am trying to access a genbank record (NCBI) by GI number. I have tried
> Bio::Fetch, Bio::Registry but none seems to work.

Can you give an example of what you've tried?  Also, on what system
are you running bioruby on, e.g. Windows XP, Cygwin in Windows, Ubuntu
Linux, Mac OS X, Solaris?  What version of bioruby?

Regards,
- Robert

From robert.citek at gmail.com  Wed Dec 19 15:46:07 2007
From: robert.citek at gmail.com (Robert Citek)
Date: Wed, 19 Dec 2007 14:46:07 -0600
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <4769756B.3080406@broad.mit.edu>
References: <476940CC.6000803@broad.mit.edu>
	<4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>
	<4769756B.3080406@broad.mit.edu>
Message-ID: <4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>

On Dec 19, 2007 1:47 PM, Sharvari Gujja <sgujja at broad.mit.edu> wrote:
> Robert Citek wrote:
> > Can you give an example of what you've tried?  Also, on what system
> > are you running bioruby on, e.g. Windows XP, Cygwin in Windows, Ubuntu
> > Linux, Mac OS X, Solaris?  What version of bioruby?
>
> I have tried:
>
> reg = Bio::Registry.new
> serv = reg.get_database('genbank')
> puts  serv.get_by_id('J00231')
>
>
> puts Bio::Fetch.query('genbank','185041')
>
> server = Bio::Fetch.new()
> #server = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch')
> puts server.fetch('genbank','J00231','html')
>
> entry = Bio::DBGET.bget("AF139016")
>
> gb = Bio::GenBank.new(Bio::Fetch.query('gb', 'J00231'))
> puts gb.read
>
> And running on Windows XP. Ruby 1.8.6

I also get errors:

$ ruby -rbio -e 'reg = Bio::Registry.new'
/usr/lib/ruby/1.8/net/http.rb:560:in `initialize': No route to host -
connect(2) (Errno::EHOSTUNREACH)
        from /usr/lib/ruby/1.8/net/http.rb:560:in `open'
        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
        from /usr/lib/ruby/1.8/timeout.rb:48:in `timeout'
        from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
        from /usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
        from /usr/lib/ruby/1.8/net/http.rb:542:in `start'
        from /usr/lib/ruby/1.8/net/http.rb:440:in `start'
        from /usr/lib/ruby/1.8/bio/io/registry.rb:190:in `read_remote'
        from /usr/lib/ruby/1.8/bio/io/registry.rb:126:in `initialize'
        from -e:1:in `new'
        from -e:1

$ ruby -v
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]

$ lsb_release  -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 7.10
Release:        7.10
Codename:       gutsy

Unfortunately, I don't know how to display what version of bioruby I'm
using.  I guess I'm too new to ruby, let alone bioruby, to be of any
help.  Anyone have a working example?  Unfortunately, my connection to
bioruby.org doesn't work (I suspect our 'Net connection is snafu'ed).

Regards,
- Robert

From ktym at hgc.jp  Thu Dec 20 02:41:12 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 20 Dec 2007 16:41:12 +0900
Subject: [BioRuby] A Rails application with BioRuby
In-Reply-To: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
References: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
Message-ID: <C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>

Hi Yen-Ju,

On 2007/12/19, at 6:54, Yen-Ju Chen wrote:

> Hi,
>  I am working on a rails application using BioRuby to collect references
> and database entries.
>  You can find the application (not source code yet) at
> journalclub.reciprocallattice.com

Cool.


>  It is still at early stage. I use it personally and figure it would be
> interesting to have more users.
>  If you want to join, please write to me in private so that it will not
> pollute BioRuby maillist.
>  I don't know how many users the application can take. Please see the
> website for more details.
>
>  These are things related to BioRuby,
>  * The output from Reference to BibTex format lacks abstract.
>  * It would be nice to be able to output to RIS format for EndNote and
> ReferenceManager.


If you could provide a patch for them, I'll include it in BioRuby.


>  * Is it possible to get DOI from PubMed ?

  entry = Bio::PubMed.query(16946072)
  doi = entry[/AID - (\S+) \[doi\]/, 1]


or you can extend the Bio::MEDLINE class to add the doi method


  class Bio::MEDLINE
    attr_reader :pubmed

    def doi
      @pubmed['AID'][/(\S+) \[doi\]/, 1]
    end
  end

  entry = Bio::PubMed.query(16946072)
  medline = Bio::MEDLINE.new(entry)
  doi = medline.doi


or utilize the XML format of the PubMed output


  entry_xml = Bio::PubMed.efetch(16946072, {"retmode" => "xml"})

           :
        <ArticleIdList>
            <ArticleId IdType="pii">313/5791/1295</ArticleId>
            <ArticleId IdType="doi">10.1126/science.1131542</ArticleId>
            <ArticleId IdType="pubmed">16946072</ArticleId>
        </ArticleIdList>
           :

then extract DOI ID

  require 'rexml/document'
  pubmed = REXML::Document.new(entry_xml)
  doi = pubmed.elements['//ArticleId[@IdType="doi"]'].get_text


>  * BioRuby can get information from many databases through biofetch,
>    but not processing them, like Pfam, Prosite, etc.

You can process them by appropriate corresponding classes. For example,

  cyclins = Bio::Fetch.query('prosite', 'PS00292')
  prosite = Bio::PROSIE.new(cyclins)

  prosite.entry_id
  # ==> "PS00292"

  prosite.definition
  # ==> "Cyclins signature."

  prosite.pattern
  # ==> "R-x(2)-[LIVMSA]-x(2)-[FYWS]-[LIVM]-x(8)-[LIVMFC]-x(4)-[LIVMFYA]-x(2)-[STAGC]-[LIVMFYQ]-x-[LIVMFYC]-[LIVMFY]-D-[RKH]-[LIVMFYW]."

  prosite.re
  # ==> /R.{2}[LIVMSA].{2}[FYWS][LIVM].{8}[LIVMFC].{4}[LIVMFYA].{2}[STAGC][LIVMFYQ].[LIVMFYC][LIVMFY]D[RKH][LIVMFYW]/i

 
>  * it is not clear what's the database from biofetch, for example: rn, rp,
> str, pr.
>    I am in structural biology. Many of these abbreviation is not obvious.

In BioRuby, the default BioFetch server is implemented as a proxy for the DBGET system through KEGG API.
So, please refer to the abbreviation field in the DBGET manual at

  http://www.genome.jp/dbget/

and also note that the DBGET service for GenBank (gb) database is no longer available.


Regards,
Toshiaki Katayama


From ktym at hgc.jp  Thu Dec 20 03:29:48 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 20 Dec 2007 17:29:48 +0900
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>
References: <476940CC.6000803@broad.mit.edu>
	<4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>
	<4769756B.3080406@broad.mit.edu>
	<4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>
Message-ID: <B8176F86-1358-47F0-884B-1FA8095141C2@hgc.jp>

Hi Gujja,

On 2007/12/20, at 5:46, Robert Citek wrote:

> On Dec 19, 2007 1:47 PM, Sharvari Gujja <sgujja at broad.mit.edu> wrote:
>> Robert Citek wrote:
>>> Can you give an example of what you've tried?  Also, on what system
>>> are you running bioruby on, e.g. Windows XP, Cygwin in Windows, Ubuntu
>>> Linux, Mac OS X, Solaris?  What version of bioruby?
>>
>> I have tried:
>>
>> reg = Bio::Registry.new
>> serv = reg.get_database('genbank')
>> puts  serv.get_by_id('J00231')

Did you setup your "seqdatabase.ini" file as described in the README file?

  http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/README?rev=1.17&cvsroot=bioruby

Otherwise, 'genbank' database is not supported by OBDA (Bio::Registry) by defalut.

However, there is another problem.

In the BioRuby's default configuration file, 'genbank' refers to the BioFetch server at bioruby.org
and as I wrote in the separate mail, current BioFetch server won't continue to support GenBank database.

  [genbank]
  protocol=biofetch
  location=http://bioruby.org/cgi-bin/biofetch.rb
  dbname=genbank

Thus, the above configuration is not valid already...


>> puts Bio::Fetch.query('genbank','185041')
>>
>> server = Bio::Fetch.new()
>> #server = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch')
>> puts server.fetch('genbank','J00231','html')

Besides, as you can find at another BioFetch server provided by EBI (Dbfetch),

  http://www.ebi.ac.uk/cgi-bin/dbfetch

they doesn't provide GenBank database also (because they have EMBL instead).


As a conclusion, if you need to fetch a GenBank entry from remote server,
using NCBI with E-Utils is the best way for now.

Unfortunately, we don't have the Bio::NCBI::Eutils class yet,
it seems that you can temporally divert the Bio::PubMed class to do that.

  Bio::PubMed.efetch("185041", {"db"=>"nuccore", "rettype"=>"gb"})
  Bio::PubMed.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})

ESOAP can be alternative but it takes quite long time to read the current
version of the WSDL file and returned value is not easy to handle.


Regards,
Toshiaki Katayama


>> entry = Bio::DBGET.bget("AF139016")
>>
>> gb = Bio::GenBank.new(Bio::Fetch.query('gb', 'J00231'))
>> puts gb.read
>>
>> And running on Windows XP. Ruby 1.8.6
>
> I also get errors:
>
> $ ruby -rbio -e 'reg = Bio::Registry.new'
> /usr/lib/ruby/1.8/net/http.rb:560:in `initialize': No route to host -
> connect(2) (Errno::EHOSTUNREACH)
>        from /usr/lib/ruby/1.8/net/http.rb:560:in `open'
>        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
>        from /usr/lib/ruby/1.8/timeout.rb:48:in `timeout'
>        from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
>        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
>        from /usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
>        from /usr/lib/ruby/1.8/net/http.rb:542:in `start'
>        from /usr/lib/ruby/1.8/net/http.rb:440:in `start'
>        from /usr/lib/ruby/1.8/bio/io/registry.rb:190:in `read_remote'
>        from /usr/lib/ruby/1.8/bio/io/registry.rb:126:in `initialize'
>        from -e:1:in `new'
>        from -e:1
>
> $ ruby -v
> ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
>
> $ lsb_release  -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 7.10
> Release:        7.10
> Codename:       gutsy
>
> Unfortunately, I don't know how to display what version of bioruby I'm
> using.  I guess I'm too new to ruby, let alone bioruby, to be of any
> help.  Anyone have a working example?  Unfortunately, my connection to
> bioruby.org doesn't work (I suspect our 'Net connection is snafu'ed).
>
> Regards,
> - Robert
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From ktym at hgc.jp  Thu Dec 20 11:54:18 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Fri, 21 Dec 2007 01:54:18 +0900
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <476A8817.9030108@broad.mit.edu>
References: <476940CC.6000803@broad.mit.edu>	<4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>	<4769756B.3080406@broad.mit.edu>	<4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>
	<B8176F86-1358-47F0-884B-1FA8095141C2@hgc.jp>
	<476A8817.9030108@broad.mit.edu>
Message-ID: <95F14218-A50E-4A18-9ECB-3FC68B4D8DAE@hgc.jp>

Hi Gujja,

On 2007/12/21, at 0:19, Sharvari Gujja wrote:
> On 2007/12/20, at 5:46, Robert Citek wrote:
>>> Unfortunately, I don't know how to display what version of bioruby I'm
>>> using.

You can check the version of BioRuby by

 % ruby -rubygems -rbio -e 'p Bio::BIORUBY_VERSION'
 [1, 2, 0]

or by running the bioruby command like

 % bioruby
 Loading config (/Users/ktym/.bioruby/shell/session/config) ... done
 Loading object (/Users/ktym/.bioruby/shell/session/object) ... done
 Loading history (/Users/ktym/.bioruby/shell/session/history) ... done

 . . . B i o R u b y   i n   t h e   s h e l l . . .

   Version : BioRuby 1.2.0 / Ruby 1.8.6

 bioruby> exit


> Hi all
>
> Thanks for all your input.
>
> However, can s'one explain how to set up seqdatabase.ini file. I did go thru the read me file but does not make much sense to me.

Ah, if you are using Windows, I have no idea as I have never tried.
Instead, you can also put the file on the net as described in:

 http://bioruby.org/rdoc/files/lib/bio/io/registry_rb.html

Anyway, the OBDA is still available in BioRuby but I feel
it is not actively used in other Bio* projects these days.

This situation reminds me one more way to retrieve a GenBank entry.
If you have installed the EMBOSS suite, you can setup ~/.embossrc file 
to access NCBI like:

DB genbank [
 type: N 
 format: genbank
 method: url
 url: "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=gb&retmode=text&id=%s"
]

and call entret command by

 Bio::EMBOSS.entret('genbank:185041')


> Also , I have tried
>
> Bio::PubMed.efetch("185041", {"db"=>"nuccore", "rettype"=>"gb"})
>
> but this gives me the pubmed entry. I need the genbank format.

If your BioRuby is older than 1.2.0, try update it first.
In my environment, I've got a GenBank entry correctly.
I expect that this way is most feasible on Windows for now.
I'll prepare the Bio::NCBI::Eutils class in the next release.


> Appreciate your help.
>
> Thanks
> S


Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614
fax://+81-3-5449-5434
http://www.hgc.jp/ (Human Genome Center)
http://bioruby.org/ (BioRuby project)
http://das.hgc.jp/ (KEGG DAS)
http://www.genome.jp/kegg/soap/ (KEGG API)


From yjchen at reciprocallattice.com  Thu Dec 20 14:11:39 2007
From: yjchen at reciprocallattice.com (Yen-Ju Chen)
Date: Thu, 20 Dec 2007 11:11:39 -0800
Subject: [BioRuby] A Rails application with BioRuby
In-Reply-To: <C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>
References: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
	<C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>
Message-ID: <bd370ace0712201111s30de42a1nc28f8188b573bb1f@mail.gmail.com>

On 12/19/07, Toshiaki Katayama <ktym at hgc.jp> wrote:
>
> Hi Yen-Ju,
>
> On 2007/12/19, at 6:54, Yen-Ju Chen wrote:
>
> > Hi,
> >  I am working on a rails application using BioRuby to collect references
> > and database entries.
> >  You can find the application (not source code yet) at
> > journalclub.reciprocallattice.com
>
> Cool.
>
>
> >  It is still at early stage. I use it personally and figure it would be
> > interesting to have more users.
> >  If you want to join, please write to me in private so that it will not
> > pollute BioRuby maillist.
> >  I don't know how many users the application can take. Please see the
> > website for more details.
> >
> >  These are things related to BioRuby,
> >  * The output from Reference to BibTex format lacks abstract.
> >  * It would be nice to be able to output to RIS format for EndNote and
> > ReferenceManager.
>
>
> If you could provide a patch for them, I'll include it in BioRuby.


  I will look at the RIS format and supply a patch later.

>  * Is it possible to get DOI from PubMed ?
>
>   entry = Bio::PubMed.query(16946072)
>   doi = entry[/AID - (\S+) \[doi\]/, 1]
>
>
> or you can extend the Bio::MEDLINE class to add the doi method


  Is it possible to have this feature in BioRuby ?
  I found DOI becomes more common recently, even PDB has DOI number.
  And it seems the only way to have a unique id on an article.
  For example, PubMed and Goggle Scholar may return the same article with
their own id (PMID and Google Scholar ID).
  I found it is only possible to compare the DOI to ensure two entries refer
to the same article.

  [snip]


> >  * BioRuby can get information from many databases through biofetch,
> >    but not processing them, like Pfam, Prosite, etc.
>
> You can process them by appropriate corresponding classes. For example,
>
>   cyclins = Bio::Fetch.query('prosite', 'PS00292')
>   prosite = Bio::PROSIE.new(cyclins)


  Thanx. I didn't notice PROSITE from BioRuby API before.
  Pfam is still missing.
  I will see what I can do about it.


>
>
> >  * it is not clear what's the database from biofetch, for example: rn,
> rp,
> > str, pr.
> >    I am in structural biology. Many of these abbreviation is not
> obvious.
>
> In BioRuby, the default BioFetch server is implemented as a proxy for the
> DBGET system through KEGG API.
> So, please refer to the abbreviation field in the DBGET manual at
>
>   http://www.genome.jp/dbget/


  That's a good tip.
  It would also be user-friendly to show them from BioRuby.

  Thanx for these information.

  Yen-Ju

and also note that the DBGET service for GenBank (gb) database is no longer
> available.
>
>
> Regards,
> Toshiaki Katayama
>
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>

From ktym at hgc.jp  Fri Dec 21 00:16:06 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Fri, 21 Dec 2007 14:16:06 +0900
Subject: [BioRuby] A Rails application with BioRuby
In-Reply-To: <bd370ace0712201111s30de42a1nc28f8188b573bb1f@mail.gmail.com>
References: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
	<C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>
	<bd370ace0712201111s30de42a1nc28f8188b573bb1f@mail.gmail.com>
Message-ID: <8FADCE93-34C9-468F-99B5-96CCE49D6ECF@hgc.jp>

Hi Yen-Ju,

On 2007/12/21, at 4:11, Yen-Ju Chen wrote:

> >  * Is it possible to get DOI from PubMed ?
>
>   entry = Bio::PubMed.query(16946072)
>   doi = entry[/AID - (\S+) \[doi\]/, 1]
>
>
> or you can extend the Bio::MEDLINE class to add the doi method
>
>
>   Is it possible to have this feature in BioRuby ?
>


I just committed the following changes to the CVS.

  def doi
    @pubmed['AID'][/(\S+) \[doi\]/, 1]
  end

  def pii
    @pubmed['AID'][/(\S+) \[pii\]/, 1]
  end

so that you can use them as

 entry = Bio::PubMed.query(16946072)
 medline = Bio::MEDLINE.new(entry)
 doi = medline.doi
 pii = medline.pii

Regards,
Toshiaki


From ktym at hgc.jp  Sat Dec 29 15:12:11 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sun, 30 Dec 2007 05:12:11 +0900
Subject: [BioRuby] BioRuby 1.2.1 is released
Message-ID: <A231F333-D068-452F-9F8D-1D789A5A4A66@hgc.jp>

Hi all,

I just released the BioRuby 1.2.1 including fix for BLAST 2.2.17 output.
Note that this version is not yet Ruby 1.9 compliant.

 http://bioruby.org/archive/bioruby-1.2.1.tar.gz
 http://rubyforge.org/projects/bioruby/

You can see changes at

  http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/ChangeLog?rev=1.79&cvsroot=bioruby


P.S.
Unfortunately, I removed the RAA entry for BioRuby by mistake (I need to sleep now :).
I immediately re-added as a new project but our history was lost.

  http://raa.ruby-lang.org/project/bioruby/

I took a screenshot of the old admin screen for record.

  http://bioruby.org/tmp/bioruby-deleted-raa.png

Happy holidays!

Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614
fax://+81-3-5449-5434
http://www.hgc.jp/ (Human Genome Center)
http://bioruby.org/ (BioRuby project)
http://das.hgc.jp/ (KEGG DAS)
http://www.genome.jp/kegg/soap/ (KEGG API)


From F.Schwach at uea.ac.uk  Mon Dec 10 17:21:31 2007
From: F.Schwach at uea.ac.uk (Schwach Frank Dr (CMP))
Date: Mon, 10 Dec 2007 17:21:31 -0000
Subject: [BioRuby] using Bio::FlatFileIndex
Message-ID: <E6E2C9D1BCA8084CB8160442A214D4969A3CC0@UEAEXCHCLUS01.UEA.AC.UK>


Hi,

I need to retrieve sequences from fasta files. In Perl I used to do this with Bio::DB:fasta but at first I couldn't find an equivalent in Bioruby and was almost about to give up and use Perl for this purpose when I found Bio::FlatFileIndex. 
Unfortunately, this class is not very well documented (unless I missed something). I think I can more or less figure out most of it from the code and the comments in the rdoc (http://bioruby.org/rdoc/classes/Bio/FlatFileIndex.html) but it would really be great to have some examples from people who are more familiar with this class, especially since I am relatively new to Ruby still.

What I want to do is simply:

1) Build an index for a directory containing a few fasta files
2) In a Rails App (or any other Ruby script): retrieve sequences by their accessions and update the index if the fasta db is updated by the user.

Some of the questions I have are:
What are the options that I can pass to the makeindex method?
In Bioperl it is possible to retrieve a subsequence straight away like this:

 my $seq_db_obj = Bio::DB::Fasta->new($path_to_db); 
 my $seq = $seq_db_obj->seq($accession, $start, $end) ; # retrieve (sub)sequence from the database

Can I do this in Ruby too or would I retrieve the entire sequence and then get the subsequence from that?

Any help and examples welcome!
Thanks a lot!


From jan.aerts at bbsrc.ac.uk  Mon Dec 10 20:43:24 2007
From: jan.aerts at bbsrc.ac.uk (jan aerts (RI))
Date: Mon, 10 Dec 2007 20:43:24 -0000
Subject: [BioRuby] rcov
Message-ID: <1F16910BB8546C4DA5526FABB0C98D09AA9A49@ebre2ksrv1.ebrc.bbsrc.ac.uk>

Just had a look at the test coverage for bioruby at http://swdev.cbri.umn.edu/rcov-bioruby20070405/

In case we've got time to spare: it would be good to get the coverage up... Just to remind everyone :-)

jan.


From ngoto at gen-info.osaka-u.ac.jp  Tue Dec 11 14:59:52 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 11 Dec 2007 23:59:52 +0900
Subject: [BioRuby] using Bio::FlatFileIndex
In-Reply-To: <E6E2C9D1BCA8084CB8160442A214D4969A3CC0@UEAEXCHCLUS01.UEA.AC.UK>
References: <E6E2C9D1BCA8084CB8160442A214D4969A3CC0@UEAEXCHCLUS01.UEA.AC.UK>
Message-ID: <20071211145953.30AF51CBC411@idnmail.gen-info.osaka-u.ac.jp>

Hi,

Indexes can be generated with a command-line application br_bioflat.rb
or within Ruby script.

Example: creates an index from command line:

% br_bioflat.rb --create --type flat --location /home/xx/dbidx \
  --dbname test --files /home/xx/test01.fst /home/xx/test02.fst

equivalent ruby script:

  require 'bio'
  is_bdb = nil # is_bdb = Bio::FlatFileIndex::MAGIC_BDB for BDB index
  dbname = '/home/xx/dbidx/test'
  format = nil # file format is automatically determined
  options = {}
  files = ['/home/xx/test01.fst', '/home/xx/test02.fst' ]
  Bio::FlatFileIndex.makeindex(is_bdb, dbname, format, options, *files)

As Bio::FlatFileIndex was first written in 2002 and is
very old, the API is ugly. In addition, its internal structure
is too complicated. It may be rewritten and the API might
be changed in the future.

Addes files to the index:

% br_bioflat.rb --update --location /home/xx/dbidx \
  --dbname test --files /home/xx/test03.fst /home/xx/test04.fst

equivalent ruby script:

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  options = {}
  files = ['/home/xx/test03.fst', '/home/xx/test04.fst' ]
  Bio::FlatFileIndex::update_index(dbname, nil, options, *files)

Re-read all files and re-generate the index:

% br_bioflat.rb --update --location /home/xx/dbidx \
  --dbname test --renew

equivalent ruby script:

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  options = {}
  options['renew'] = true
  Bio::FlatFileIndex::update_index(dbname, nil, options, [])

Note that add files or updating the flat database (without BDB)
is very slow because it actually rebuilds indexes again.


Retrieving sequences in the index:

% br_bioflat.rb --location /home/xx/dbidx --dbname test M12963

equivalent ruby script:

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  key = 'M12963'
  idx = Bio::FlatFileIndex.open(dbname)
  results = idx.search(key)
  results.each do |str|
    print str
  end
  idx.close

'results' is a Bio::FlatFileIndex::Results object.
Each search result is an string.

(For more information, please see RDoc
http://bioruby.org/rdoc/classes/Bio/FlatFileIndex/Results.html )

If you want subsequence of fasta formatted data,
for example,

  require 'bio'
  dbname = '/home/xx/dbidx/test'
  key = 'M12963'
  result = idx.search(key)
  result.each do |str|
    ent = Bio::FastaFormat.new(str)
    # for nucleic acid sequence
    puts ent.naseq[0..100]
    # for amino acid sequence
    puts ent.aaseq[0..100]
    # nucleic or amino acid sequence
    puts ent.seq[0..100]
  end
  idx.close

Please see OBDA flat file indexing specifications
for philosophy and internal structure of index.

http://code.open-bio.org/cgi/viewcvs.cgi/obda-specs/flatfile/?cvsroot=obf-common

Thanks,

Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp


On Mon, 10 Dec 2007 17:21:31 -0000
"Schwach Frank Dr \(CMP\)" <F.Schwach at uea.ac.uk> wrote:

> 
> Hi,
> 
> I need to retrieve sequences from fasta files. In Perl I used to do this with Bio::DB:fasta but at first I couldn't find an equivalent in Bioruby and was almost about to give up and use Perl for this purpose when I found Bio::FlatFileIndex. 
> Unfortunately, this class is not very well documented (unless I missed something). I think I can more or less figure out most of it from the code and the comments in the rdoc (http://bioruby.org/rdoc/classes/Bio/FlatFileIndex.html) but it would really be great to have some examples from people who are more familiar with this class, especially since I am relatively new to Ruby still.
> 
> What I want to do is simply:
> 
> 1) Build an index for a directory containing a few fasta files
> 2) In a Rails App (or any other Ruby script): retrieve sequences by their accessions and update the index if the fasta db is updated by the user.
> 
> Some of the questions I have are:
> What are the options that I can pass to the makeindex method?
> In Bioperl it is possible to retrieve a subsequence straight away like this:
> 
>  my $seq_db_obj = Bio::DB::Fasta->new($path_to_db); 
>  my $seq = $seq_db_obj->seq($accession, $start, $end) ; # retrieve (sub)sequence from the database
> 
> Can I do this in Ruby too or would I retrieve the entire sequence and then get the subsequence from that?
> 
> Any help and examples welcome!
> Thanks a lot!
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From yjchenx at gmail.com  Thu Dec 13 01:54:04 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 17:54:04 -0800
Subject: [BioRuby] Parse big PDB use up all memory
Message-ID: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>

This is what I did:

require 'bio'
serv = Bio::Fetch.new()
entry = serv.fetch('pdb', '1w6k')
pdb = Bio::PDB.new(entry)

The last step use up all memory and quit.
The pdb file is quite big and I only need the information from header.
Is it possible to do something like this ?

pdb = Bio::PDB.new(entry[0-40000])

Thanx for the help


From yjchenx at gmail.com  Thu Dec 13 04:50:29 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 20:50:29 -0800
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
Message-ID: <f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>

Thank you for the hint for retrieve only header.

I am using the default Ruby on Mac OS X 10.5.
Here is the output of 'ruby -v'

ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]

And bioruby is 1.1.0 from gems.

I will test it on Linux and see.

Yen-Ju

On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:
> Hi,
>
> Could you give some more details on what system and ruby/bioruby
> version you are running? The same script uses less than 20MB on my
> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB
> files out there so if you're having trouble with this one then others
> will certainly be a problem.
>
> In answer to your second question, yes you should be able to just
> extract the header (everything up to the ATOM records). But if you're
> really running out of memory just parsing that file then I suspect you
> have deeper issues. Anyway, the sample below works for me for parsing
> the header from 1w6k:
>
> require 'bio'
>
> serv = Bio::Fetch.new
> entry = serv.fetch('pdb','1w6k')
>
> header = ''
> entry.each do |l|
>    break if l.match(/^ATOM/)
>    header << l
> end
>
> pdb = Bio::PDB.new(header)
> p pdb.accession
>
>
> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
>
> > This is what I did:
> >
> > require 'bio'
> > serv = Bio::Fetch.new()
> > entry = serv.fetch('pdb', '1w6k')
> > pdb = Bio::PDB.new(entry)
> >
> > The last step use up all memory and quit.
> > The pdb file is quite big and I only need the information from header.
> > Is it possible to do something like this ?
> >
> > pdb = Bio::PDB.new(entry[0-40000])
> >
> > Thanx for the help
> > _______________________________________________
> > BioRuby mailing list
> > BioRuby at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioruby
> >
>
> Alex Gutteridge
>
> Bioinformatics Center
> Kyoto University
>
>
>


From yjchenx at gmail.com  Thu Dec 13 05:22:36 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 21:22:36 -0800
Subject: [BioRuby] Detect error from Bio::Fetch
Message-ID: <f514e4aa0712122122j473bd8f0n17d2746210337f22@mail.gmail.com>

This is the script I run:

require 'bio'
serv = Bio::Fetch.new()
entry = serv.fetch('swissprot', 'not_existing_id')
swissprot = Bio::SwissProt.new(entry)
p swissprot.entry_name # <== Error raises here

The problem is that Bio.Fetch does not raise an exception or something
else to notify that it cannot find the entry in database. An error
shows up only at 'swissprot.entry_name'. It would be nice to detect
the error early on, either in Bio::Fetch.fetch() or
Bio::SwissProt.new().

Yen-Ju


From alexg at kuicr.kyoto-u.ac.jp  Thu Dec 13 05:22:59 2007
From: alexg at kuicr.kyoto-u.ac.jp (Alex Gutteridge)
Date: Thu, 13 Dec 2007 14:22:59 +0900
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
	<f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
	<f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>
Message-ID: <20495B39-57E6-46C4-87AF-24B041CBA54D@kuicr.kyoto-u.ac.jp>

Yup, I see the same behavior on linux and osx. Bio::PDB.new kills irb  
but runs fine in a script. Thanks for the bug report. I'll see if I  
can identify what's going on.

AlexG

On 13 Dec 2007, at 14:11, Yen-Ju Chen wrote:

> I did a quick test and found the problem is that I ran it in irb.
> If I run it in script, like 'ruby test.rb', then it works fine.
>
> Yen-Ju
>
> On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
>> Thank you for the hint for retrieve only header.
>>
>> I am using the default Ruby on Mac OS X 10.5.
>> Here is the output of 'ruby -v'
>>
>> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
>>
>> And bioruby is 1.1.0 from gems.
>>
>> I will test it on Linux and see.
>>
>> Yen-Ju
>>
>>
>> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto- 
>> u.ac.jp> wrote:
>>> Hi,
>>>
>>> Could you give some more details on what system and ruby/bioruby
>>> version you are running? The same script uses less than 20MB on my
>>> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
>>> seem so bad. Also 1w6k is biggish, but there are certainly bigger  
>>> PDB
>>> files out there so if you're having trouble with this one then  
>>> others
>>> will certainly be a problem.
>>>
>>> In answer to your second question, yes you should be able to just
>>> extract the header (everything up to the ATOM records). But if  
>>> you're
>>> really running out of memory just parsing that file then I suspect  
>>> you
>>> have deeper issues. Anyway, the sample below works for me for  
>>> parsing
>>> the header from 1w6k:
>>>
>>> require 'bio'
>>>
>>> serv = Bio::Fetch.new
>>> entry = serv.fetch('pdb','1w6k')
>>>
>>> header = ''
>>> entry.each do |l|
>>>   break if l.match(/^ATOM/)
>>>   header << l
>>> end
>>>
>>> pdb = Bio::PDB.new(header)
>>> p pdb.accession
>>>
>>>
>>> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
>>>
>>>> This is what I did:
>>>>
>>>> require 'bio'
>>>> serv = Bio::Fetch.new()
>>>> entry = serv.fetch('pdb', '1w6k')
>>>> pdb = Bio::PDB.new(entry)
>>>>
>>>> The last step use up all memory and quit.
>>>> The pdb file is quite big and I only need the information from  
>>>> header.
>>>> Is it possible to do something like this ?
>>>>
>>>> pdb = Bio::PDB.new(entry[0-40000])
>>>>
>>>> Thanx for the help
>>>> _______________________________________________
>>>> BioRuby mailing list
>>>> BioRuby at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>>>
>>>
>>> Alex Gutteridge
>>>
>>> Bioinformatics Center
>>> Kyoto University
>>>
>>>
>>>
>>
>

Alex Gutteridge

Bioinformatics Center
Kyoto University


From yjchenx at gmail.com  Thu Dec 13 05:11:33 2007
From: yjchenx at gmail.com (Yen-Ju Chen)
Date: Wed, 12 Dec 2007 21:11:33 -0800
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
	<f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
Message-ID: <f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>

I did a quick test and found the problem is that I ran it in irb.
If I run it in script, like 'ruby test.rb', then it works fine.

Yen-Ju

On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
> Thank you for the hint for retrieve only header.
>
> I am using the default Ruby on Mac OS X 10.5.
> Here is the output of 'ruby -v'
>
> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
>
> And bioruby is 1.1.0 from gems.
>
> I will test it on Linux and see.
>
> Yen-Ju
>
>
> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:
> > Hi,
> >
> > Could you give some more details on what system and ruby/bioruby
> > version you are running? The same script uses less than 20MB on my
> > machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> > seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB
> > files out there so if you're having trouble with this one then others
> > will certainly be a problem.
> >
> > In answer to your second question, yes you should be able to just
> > extract the header (everything up to the ATOM records). But if you're
> > really running out of memory just parsing that file then I suspect you
> > have deeper issues. Anyway, the sample below works for me for parsing
> > the header from 1w6k:
> >
> > require 'bio'
> >
> > serv = Bio::Fetch.new
> > entry = serv.fetch('pdb','1w6k')
> >
> > header = ''
> > entry.each do |l|
> >    break if l.match(/^ATOM/)
> >    header << l
> > end
> >
> > pdb = Bio::PDB.new(header)
> > p pdb.accession
> >
> >
> > On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
> >
> > > This is what I did:
> > >
> > > require 'bio'
> > > serv = Bio::Fetch.new()
> > > entry = serv.fetch('pdb', '1w6k')
> > > pdb = Bio::PDB.new(entry)
> > >
> > > The last step use up all memory and quit.
> > > The pdb file is quite big and I only need the information from header.
> > > Is it possible to do something like this ?
> > >
> > > pdb = Bio::PDB.new(entry[0-40000])
> > >
> > > Thanx for the help
> > > _______________________________________________
> > > BioRuby mailing list
> > > BioRuby at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioruby
> > >
> >
> > Alex Gutteridge
> >
> > Bioinformatics Center
> > Kyoto University
> >
> >
> >
>


From alexg at kuicr.kyoto-u.ac.jp  Thu Dec 13 03:49:04 2007
From: alexg at kuicr.kyoto-u.ac.jp (Alex Gutteridge)
Date: Thu, 13 Dec 2007 12:49:04 +0900
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
Message-ID: <16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>

Hi,

Could you give some more details on what system and ruby/bioruby  
version you are running? The same script uses less than 20MB on my  
machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't  
seem so bad. Also 1w6k is biggish, but there are certainly bigger PDB  
files out there so if you're having trouble with this one then others  
will certainly be a problem.

In answer to your second question, yes you should be able to just  
extract the header (everything up to the ATOM records). But if you're  
really running out of memory just parsing that file then I suspect you  
have deeper issues. Anyway, the sample below works for me for parsing  
the header from 1w6k:

require 'bio'

serv = Bio::Fetch.new
entry = serv.fetch('pdb','1w6k')

header = ''
entry.each do |l|
   break if l.match(/^ATOM/)
   header << l
end

pdb = Bio::PDB.new(header)
p pdb.accession

On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:

> This is what I did:
>
> require 'bio'
> serv = Bio::Fetch.new()
> entry = serv.fetch('pdb', '1w6k')
> pdb = Bio::PDB.new(entry)
>
> The last step use up all memory and quit.
> The pdb file is quite big and I only need the information from header.
> Is it possible to do something like this ?
>
> pdb = Bio::PDB.new(entry[0-40000])
>
> Thanx for the help
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>

Alex Gutteridge

Bioinformatics Center
Kyoto University


From odo at mac.com  Thu Dec 13 08:23:48 2007
From: odo at mac.com (Florian Odronitz)
Date: Thu, 13 Dec 2007 09:23:48 +0100
Subject: [BioRuby] Proton Nomenclature in PDB
In-Reply-To: <mailman.388.1197521447.847.bioruby@lists.open-bio.org>
References: <mailman.388.1197521447.847.bioruby@lists.open-bio.org>
Message-ID: <3A227C17-9C34-42BF-80C6-B96467573291@mac.com>

Hi,

I am using Bio::PDB in my NMR-related software project. I was  
encountering a problem with the naming of protons that were generated  
by PyMol and MolMol and wrote a method to rename the protons according  
to BMRB nomenclature (http://www.bmrb.wisc.edu/ref_info/statsel.htm).
If anyone thinks this could be useful to others, I would like to  
contribute  it to BioRuby. Or is it to specific? Maybe I could do it  
in a more general way since it also involves things like bonding which  
are, to my understanding, not implemented yet. Who would be the right  
person to talk to?

Thanks,
Florian


From ktym at hgc.jp  Fri Dec 14 17:20:34 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sat, 15 Dec 2007 02:20:34 +0900
Subject: [BioRuby] BioRuby 1.2.0 is released
Message-ID: <AA76D6A5-6C52-465C-BC93-6C06CE9F51F2@hgc.jp>

Hi all,

I just released the BioRuby 1.2.0 at http://bioruby.org/archive/bioruby-1.2.0.tar.gz

  http://bioruby.org/
  http://bioruby.org/rdoc/
  http://rubyforge.org/projects/bioruby/
  http://raa.ruby-lang.org/project/bioruby/

I also put RubyGems pacakge at RubyForge as always.

  % sudo gem update bio

Here is a brief summary of updates snipped from the ChangeLog file.

        * BioRuby 1.2.0 released

          * BioRuby shell is improved
            * file save functionality is fixed
            * deprecated require_gem is changed to gem to suppress warnings
            * deprecated end_form_tag is rewrited to suppress warnings
            * images for Rails shell are separated to the bioruby directory
            * spinner is shown during the evaluation
            * background image in the textarea is removed for the visibility
          * Bio::Blast is fixed to parse -m 8 formatted result correctly
          * Bio::PubMed is rewrited to enhance its functionality
            * e.g. 'rettype' => 'count' and 'retmode' => 'xml' are available
          * Bio::FlatFile is improved to accept recent MEDLINE format
          * Bio::KEGG::COMPOUND is enhanced to utilize REMARK field
          * Bio::KEGG::API is fixed to skip filter when the value is Fixnum
          * A number of minor bug fixes

Hope you enjoy.

Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614
fax://+81-3-5449-5434
http://www.hgc.jp/ (Human Genome Center)
http://bioruby.org/ (BioRuby project)
http://das.hgc.jp/ (KEGG DAS)
http://www.genome.jp/kegg/soap/ (KEGG API)


From raoul.bonnal at itb.cnr.it  Fri Dec 14 13:50:30 2007
From: raoul.bonnal at itb.cnr.it (Raoul Jean Pierre Bonnal)
Date: Fri, 14 Dec 2007 14:50:30 +0100
Subject: [BioRuby] FlatFile loading genbank, the last entry is a fake
In-Reply-To: <f514e4aa0712122122j473bd8f0n17d2746210337f22@mail.gmail.com>
References: <f514e4aa0712122122j473bd8f0n17d2746210337f22@mail.gmail.com>
Message-ID: <1197640230.10347.15.camel@Graco>

Downloading the AJ561198's genbank file from ncbi and loading it with

data=Bio::FlatFile.auto("AJ561198.gb")

data.each_entry do |entry|
	puts entry.entry_id
end

You get

AJ561198
nil

I think the parser identify the "\n" at the end of the genbank (after
"//\n") and think there is another entry, but it's wrong.
Deleting the last line, works.

--
Ra


From ktym at hgc.jp  Fri Dec 14 22:31:11 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sat, 15 Dec 2007 07:31:11 +0900
Subject: [BioRuby] Fwd:  BioRuby 1.2.0 is released
References: <1F16910BB8546C4DA5526FABB0C98D09AA9A53@ebre2ksrv1.ebrc.bbsrc.ac.uk>
Message-ID: <2D2BADE4-A31A-4356-9820-FC700AEE903C@hgc.jp>

Hi all,

Does anybody has the same problem on Linux/Windows?

Toshiaki

Begin forwarded message:

> From: "jan aerts (RI)" <jan.aerts at bbsrc.ac.uk>
> Date: 2007?12?15? 5:50:42:JST
> To: "Toshiaki Katayama" <ktym at hgc.jp>
> Cc: <n at bioruby.org>
> Subject: RE: [BioRuby] BioRuby 1.2.0 is released
>
> Ubuntu 7.10 (Gutsy Gibbon).
> ruby 1.8.6
> soap4r 1.5.5-1 (apt-get package)
>
> j.
>
>
> -----Original Message-----
> From: Toshiaki Katayama [mailto:ktym at hgc.jp]
> Sent: Fri 14/12/2007 18:42
> To: jan aerts (RI)
> Cc: n at bioruby.org
> Subject: Re: [BioRuby] BioRuby 1.2.0 is released
>
> Jan,
>
> In my environment (OS X Leopard), I have no errors on all tests in BioRuby 1.2.0 with Ruby 1.8.6
> What kind of environment do you use?
>
> Regards,
> Toshiaki
>
> On 2007/12/15, at 3:28, jan aerts (RI) wrote:
>
>> Thanks T.
>>
>> Good to see a new release is out.
>>
>> I noticed that the test/functional/bio/io/test_soapwsdl.rb test returned errors. All 4 tests in that testfile give the following error:
>>
>> NoMethodError: undefined method `location=' for nil:NilClass
>>   /usr/lib/ruby/1.8/wsdl/xmlSchema/importer.rb:31:in `import'
>>   /usr/lib/ruby/1.8/wsdl/importer.rb:18:in `import'
>>   /usr/lib/ruby/1.8/soap/wsdlDriver.rb:124:in `import'
>>   /usr/lib/ruby/1.8/soap/wsdlDriver.rb:28:in `initialize'
>>   ../../../../lib/bio/io/soapwsdl.rb:63:in `new'
>>   ../../../../lib/bio/io/soapwsdl.rb:63:in `create_driver'
>>   ../../../../lib/bio/io/soapwsdl.rb:57:in `initialize'
>>   ./test_soapwsdl.rb:25:in `new'
>>   ./test_soapwsdl.rb:25:in `setup'
>>
>> jan.
>>
>>
>> -----Original Message-----
>> From: bioruby-bounces at lists.open-bio.org on behalf of Toshiaki Katayama
>> Sent: Fri 14/12/2007 17:20
>> To: BioRuby; bioruby-ja at lists.open-bio.org
>> Subject: [BioRuby] BioRuby 1.2.0 is released
>>
>> Hi all,
>>
>> I just released the BioRuby 1.2.0 at http://bioruby.org/archive/bioruby-1.2.0.tar.gz
>>
>> http://bioruby.org/
>> http://bioruby.org/rdoc/
>> http://rubyforge.org/projects/bioruby/
>> http://raa.ruby-lang.org/project/bioruby/
>>
>> I also put RubyGems pacakge at RubyForge as always.
>>
>> % sudo gem update bio
>>
>> Here is a brief summary of updates snipped from the ChangeLog file.
>>
>>       * BioRuby 1.2.0 released
>>
>>         * BioRuby shell is improved
>>           * file save functionality is fixed
>>           * deprecated require_gem is changed to gem to suppress warnings
>>           * deprecated end_form_tag is rewrited to suppress warnings
>>           * images for Rails shell are separated to the bioruby directory
>>           * spinner is shown during the evaluation
>>           * background image in the textarea is removed for the visibility
>>         * Bio::Blast is fixed to parse -m 8 formatted result correctly
>>         * Bio::PubMed is rewrited to enhance its functionality
>>           * e.g. 'rettype' => 'count' and 'retmode' => 'xml' are available
>>         * Bio::FlatFile is improved to accept recent MEDLINE format
>>         * Bio::KEGG::COMPOUND is enhanced to utilize REMARK field
>>         * Bio::KEGG::API is fixed to skip filter when the value is Fixnum
>>         * A number of minor bug fixes
>>
>> Hope you enjoy.
>>
>> Regards,
>> Toshiaki Katayama
>> --
>> Human Genome Center, Institute of Medical Science, University of Tokyo
>> 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
>> tel://+81-3-5449-5614
>> fax://+81-3-5449-5434
>> http://www.hgc.jp/ (Human Genome Center)
>> http://bioruby.org/ (BioRuby project)
>> http://das.hgc.jp/ (KEGG DAS)
>> http://www.genome.jp/kegg/soap/ (KEGG API)
>>
>>
>>
>> _______________________________________________
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>
>
>


From ngoto at gen-info.osaka-u.ac.jp  Tue Dec 18 13:55:57 2007
From: ngoto at gen-info.osaka-u.ac.jp (Naohisa GOTO)
Date: Tue, 18 Dec 2007 22:55:57 +0900
Subject: [BioRuby] Parse big PDB use up all memory
In-Reply-To: <20495B39-57E6-46C4-87AF-24B041CBA54D@kuicr.kyoto-u.ac.jp>
References: <f514e4aa0712121754s84a1003u4c5eed8307c5fcc5@mail.gmail.com>
	<16683AAA-7D69-4D8A-9B3D-A878DA98E727@kuicr.kyoto-u.ac.jp>
	<f514e4aa0712122050p1262baebjdf7859a99954c81f@mail.gmail.com>
	<f514e4aa0712122111g1cb8e039oea0ac362f1cea6a@mail.gmail.com>
	<20495B39-57E6-46C4-87AF-24B041CBA54D@kuicr.kyoto-u.ac.jp>
Message-ID: <20071218135558.4880D1CBC43F@idnmail.gen-info.osaka-u.ac.jp>

Hi,

Objects inside Bio::PDB often refer another objects
in the same Bio::PDB object, and this might cause
infinite recursion in Bio::PDB#inspect.

To define customized Bio::PDB#inspect seems to prevent
the memory exhaust problem.

  class Bio::PDB
    # returns a string containing human-readable representation
    # of this object.
    def inspect
      "#<#{self.class.to_s} entry_id=#{entry_id.inspect}>"
    end
  end

I also defined Bio::PDB::(Model|Chain|Residue)#inspect 
like above, and committed them into CVS.

Naohisa Goto
ng at bioruby.org / ngoto at gen-info.osaka-u.ac.jp

On Thu, 13 Dec 2007 14:22:59 +0900
Alex Gutteridge <alexg at kuicr.kyoto-u.ac.jp> wrote:

> Yup, I see the same behavior on linux and osx. Bio::PDB.new kills irb  
> but runs fine in a script. Thanks for the bug report. I'll see if I  
> can identify what's going on.
> 
> AlexG
> 
> On 13 Dec 2007, at 14:11, Yen-Ju Chen wrote:
> 
> > I did a quick test and found the problem is that I ran it in irb.
> > If I run it in script, like 'ruby test.rb', then it works fine.
> >
> > Yen-Ju
> >
> > On Dec 12, 2007 8:50 PM, Yen-Ju Chen <yjchenx at gmail.com> wrote:
> >> Thank you for the hint for retrieve only header.
> >>
> >> I am using the default Ruby on Mac OS X 10.5.
> >> Here is the output of 'ruby -v'
> >>
> >> ruby 1.8.6 (2007-06-07 patchlevel 36) [universal-darwin9.0]
> >>
> >> And bioruby is 1.1.0 from gems.
> >>
> >> I will test it on Linux and see.
> >>
> >> Yen-Ju
> >>
> >>
> >> On Dec 12, 2007 7:49 PM, Alex Gutteridge <alexg at kuicr.kyoto- 
> >> u.ac.jp> wrote:
> >>> Hi,
> >>>
> >>> Could you give some more details on what system and ruby/bioruby
> >>> version you are running? The same script uses less than 20MB on my
> >>> machine (ruby 1.8.6 / bioruby 1.1.0 / ubuntu linux), which doesn't
> >>> seem so bad. Also 1w6k is biggish, but there are certainly bigger  
> >>> PDB
> >>> files out there so if you're having trouble with this one then  
> >>> others
> >>> will certainly be a problem.
> >>>
> >>> In answer to your second question, yes you should be able to just
> >>> extract the header (everything up to the ATOM records). But if  
> >>> you're
> >>> really running out of memory just parsing that file then I suspect  
> >>> you
> >>> have deeper issues. Anyway, the sample below works for me for  
> >>> parsing
> >>> the header from 1w6k:
> >>>
> >>> require 'bio'
> >>>
> >>> serv = Bio::Fetch.new
> >>> entry = serv.fetch('pdb','1w6k')
> >>>
> >>> header = ''
> >>> entry.each do |l|
> >>>   break if l.match(/^ATOM/)
> >>>   header << l
> >>> end
> >>>
> >>> pdb = Bio::PDB.new(header)
> >>> p pdb.accession
> >>>
> >>>
> >>> On 13 Dec 2007, at 10:54, Yen-Ju Chen wrote:
> >>>
> >>>> This is what I did:
> >>>>
> >>>> require 'bio'
> >>>> serv = Bio::Fetch.new()
> >>>> entry = serv.fetch('pdb', '1w6k')
> >>>> pdb = Bio::PDB.new(entry)
> >>>>
> >>>> The last step use up all memory and quit.
> >>>> The pdb file is quite big and I only need the information from  
> >>>> header.
> >>>> Is it possible to do something like this ?
> >>>>
> >>>> pdb = Bio::PDB.new(entry[0-40000])
> >>>>
> >>>> Thanx for the help
> >>>> _______________________________________________
> >>>> BioRuby mailing list
> >>>> BioRuby at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioruby
> >>>>
> >>>
> >>> Alex Gutteridge
> >>>
> >>> Bioinformatics Center
> >>> Kyoto University
> >>>
> >>>
> >>>
> >>
> >
> 
> Alex Gutteridge
> 
> Bioinformatics Center
> Kyoto University
> 
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From yjchen at reciprocallattice.com  Tue Dec 18 21:54:34 2007
From: yjchen at reciprocallattice.com (Yen-Ju Chen)
Date: Tue, 18 Dec 2007 13:54:34 -0800
Subject: [BioRuby] A Rails application with BioRuby
Message-ID: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>

Hi,
  I am working on a rails application using BioRuby to collect references
and database entries.
  You can find the application (not source code yet) at
journalclub.reciprocallattice.com
  It is still at early stage. I use it personally and figure it would be
interesting to have more users.
  If you want to join, please write to me in private so that it will not
pollute BioRuby maillist.
  I don't know how many users the application can take. Please see the
website for more details.

  These are things related to BioRuby,
  * The output from Reference to BibTex format lacks abstract.
  * It would be nice to be able to output to RIS format for EndNote and
ReferenceManager.
  * Is it possible to get DOI from PubMed ?
  * BioRuby can get information from many databases through biofetch,
    but not processing them, like Pfam, Prosite, etc.
  * it is not clear what's the database from biofetch, for example: rn, rp,
str, pr.
    I am in structural biology. Many of these abbreviation is not obvious.

  If I have chance to write codes for these missing features, I will submit
them back to BioRuby.
  Have fun.

  Yen-Ju


From sgujja at broad.mit.edu  Wed Dec 19 16:03:24 2007
From: sgujja at broad.mit.edu (Sharvari Gujja)
Date: Wed, 19 Dec 2007 11:03:24 -0500
Subject: [BioRuby] how to retrieve a genbank record by GI
Message-ID: <476940CC.6000803@broad.mit.edu>

Hi all,

I am new to Ruby and Bioruby and am amazed at how simple and yet 
powerful is it.

I am trying to access a genbank record (NCBI) by GI number. I have tried 
Bio::Fetch, Bio::Registry but none seems to work.

Any help is appreciated.

Thanks
-S


From robert.citek at gmail.com  Wed Dec 19 19:39:01 2007
From: robert.citek at gmail.com (Robert Citek)
Date: Wed, 19 Dec 2007 13:39:01 -0600
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <476940CC.6000803@broad.mit.edu>
References: <476940CC.6000803@broad.mit.edu>
Message-ID: <4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>

On Dec 19, 2007 10:03 AM, Sharvari Gujja <sgujja at broad.mit.edu> wrote:
> I am new to Ruby and Bioruby and am amazed at how simple and yet
> powerful is it.
>
> I am trying to access a genbank record (NCBI) by GI number. I have tried
> Bio::Fetch, Bio::Registry but none seems to work.

Can you give an example of what you've tried?  Also, on what system
are you running bioruby on, e.g. Windows XP, Cygwin in Windows, Ubuntu
Linux, Mac OS X, Solaris?  What version of bioruby?

Regards,
- Robert


From robert.citek at gmail.com  Wed Dec 19 20:46:07 2007
From: robert.citek at gmail.com (Robert Citek)
Date: Wed, 19 Dec 2007 14:46:07 -0600
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <4769756B.3080406@broad.mit.edu>
References: <476940CC.6000803@broad.mit.edu>
	<4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>
	<4769756B.3080406@broad.mit.edu>
Message-ID: <4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>

On Dec 19, 2007 1:47 PM, Sharvari Gujja <sgujja at broad.mit.edu> wrote:
> Robert Citek wrote:
> > Can you give an example of what you've tried?  Also, on what system
> > are you running bioruby on, e.g. Windows XP, Cygwin in Windows, Ubuntu
> > Linux, Mac OS X, Solaris?  What version of bioruby?
>
> I have tried:
>
> reg = Bio::Registry.new
> serv = reg.get_database('genbank')
> puts  serv.get_by_id('J00231')
>
>
> puts Bio::Fetch.query('genbank','185041')
>
> server = Bio::Fetch.new()
> #server = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch')
> puts server.fetch('genbank','J00231','html')
>
> entry = Bio::DBGET.bget("AF139016")
>
> gb = Bio::GenBank.new(Bio::Fetch.query('gb', 'J00231'))
> puts gb.read
>
> And running on Windows XP. Ruby 1.8.6

I also get errors:

$ ruby -rbio -e 'reg = Bio::Registry.new'
/usr/lib/ruby/1.8/net/http.rb:560:in `initialize': No route to host -
connect(2) (Errno::EHOSTUNREACH)
        from /usr/lib/ruby/1.8/net/http.rb:560:in `open'
        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
        from /usr/lib/ruby/1.8/timeout.rb:48:in `timeout'
        from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
        from /usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
        from /usr/lib/ruby/1.8/net/http.rb:542:in `start'
        from /usr/lib/ruby/1.8/net/http.rb:440:in `start'
        from /usr/lib/ruby/1.8/bio/io/registry.rb:190:in `read_remote'
        from /usr/lib/ruby/1.8/bio/io/registry.rb:126:in `initialize'
        from -e:1:in `new'
        from -e:1

$ ruby -v
ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]

$ lsb_release  -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 7.10
Release:        7.10
Codename:       gutsy

Unfortunately, I don't know how to display what version of bioruby I'm
using.  I guess I'm too new to ruby, let alone bioruby, to be of any
help.  Anyone have a working example?  Unfortunately, my connection to
bioruby.org doesn't work (I suspect our 'Net connection is snafu'ed).

Regards,
- Robert


From ktym at hgc.jp  Thu Dec 20 07:41:12 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 20 Dec 2007 16:41:12 +0900
Subject: [BioRuby] A Rails application with BioRuby
In-Reply-To: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
References: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
Message-ID: <C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>

Hi Yen-Ju,

On 2007/12/19, at 6:54, Yen-Ju Chen wrote:

> Hi,
>  I am working on a rails application using BioRuby to collect references
> and database entries.
>  You can find the application (not source code yet) at
> journalclub.reciprocallattice.com

Cool.


>  It is still at early stage. I use it personally and figure it would be
> interesting to have more users.
>  If you want to join, please write to me in private so that it will not
> pollute BioRuby maillist.
>  I don't know how many users the application can take. Please see the
> website for more details.
>
>  These are things related to BioRuby,
>  * The output from Reference to BibTex format lacks abstract.
>  * It would be nice to be able to output to RIS format for EndNote and
> ReferenceManager.


If you could provide a patch for them, I'll include it in BioRuby.


>  * Is it possible to get DOI from PubMed ?

  entry = Bio::PubMed.query(16946072)
  doi = entry[/AID - (\S+) \[doi\]/, 1]


or you can extend the Bio::MEDLINE class to add the doi method


  class Bio::MEDLINE
    attr_reader :pubmed

    def doi
      @pubmed['AID'][/(\S+) \[doi\]/, 1]
    end
  end

  entry = Bio::PubMed.query(16946072)
  medline = Bio::MEDLINE.new(entry)
  doi = medline.doi


or utilize the XML format of the PubMed output


  entry_xml = Bio::PubMed.efetch(16946072, {"retmode" => "xml"})

           :
        <ArticleIdList>
            <ArticleId IdType="pii">313/5791/1295</ArticleId>
            <ArticleId IdType="doi">10.1126/science.1131542</ArticleId>
            <ArticleId IdType="pubmed">16946072</ArticleId>
        </ArticleIdList>
           :

then extract DOI ID

  require 'rexml/document'
  pubmed = REXML::Document.new(entry_xml)
  doi = pubmed.elements['//ArticleId[@IdType="doi"]'].get_text


>  * BioRuby can get information from many databases through biofetch,
>    but not processing them, like Pfam, Prosite, etc.

You can process them by appropriate corresponding classes. For example,

  cyclins = Bio::Fetch.query('prosite', 'PS00292')
  prosite = Bio::PROSIE.new(cyclins)

  prosite.entry_id
  # ==> "PS00292"

  prosite.definition
  # ==> "Cyclins signature."

  prosite.pattern
  # ==> "R-x(2)-[LIVMSA]-x(2)-[FYWS]-[LIVM]-x(8)-[LIVMFC]-x(4)-[LIVMFYA]-x(2)-[STAGC]-[LIVMFYQ]-x-[LIVMFYC]-[LIVMFY]-D-[RKH]-[LIVMFYW]."

  prosite.re
  # ==> /R.{2}[LIVMSA].{2}[FYWS][LIVM].{8}[LIVMFC].{4}[LIVMFYA].{2}[STAGC][LIVMFYQ].[LIVMFYC][LIVMFY]D[RKH][LIVMFYW]/i

 
>  * it is not clear what's the database from biofetch, for example: rn, rp,
> str, pr.
>    I am in structural biology. Many of these abbreviation is not obvious.

In BioRuby, the default BioFetch server is implemented as a proxy for the DBGET system through KEGG API.
So, please refer to the abbreviation field in the DBGET manual at

  http://www.genome.jp/dbget/

and also note that the DBGET service for GenBank (gb) database is no longer available.


Regards,
Toshiaki Katayama


From ktym at hgc.jp  Thu Dec 20 08:29:48 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Thu, 20 Dec 2007 17:29:48 +0900
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>
References: <476940CC.6000803@broad.mit.edu>
	<4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>
	<4769756B.3080406@broad.mit.edu>
	<4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>
Message-ID: <B8176F86-1358-47F0-884B-1FA8095141C2@hgc.jp>

Hi Gujja,

On 2007/12/20, at 5:46, Robert Citek wrote:

> On Dec 19, 2007 1:47 PM, Sharvari Gujja <sgujja at broad.mit.edu> wrote:
>> Robert Citek wrote:
>>> Can you give an example of what you've tried?  Also, on what system
>>> are you running bioruby on, e.g. Windows XP, Cygwin in Windows, Ubuntu
>>> Linux, Mac OS X, Solaris?  What version of bioruby?
>>
>> I have tried:
>>
>> reg = Bio::Registry.new
>> serv = reg.get_database('genbank')
>> puts  serv.get_by_id('J00231')

Did you setup your "seqdatabase.ini" file as described in the README file?

  http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/README?rev=1.17&cvsroot=bioruby

Otherwise, 'genbank' database is not supported by OBDA (Bio::Registry) by defalut.

However, there is another problem.

In the BioRuby's default configuration file, 'genbank' refers to the BioFetch server at bioruby.org
and as I wrote in the separate mail, current BioFetch server won't continue to support GenBank database.

  [genbank]
  protocol=biofetch
  location=http://bioruby.org/cgi-bin/biofetch.rb
  dbname=genbank

Thus, the above configuration is not valid already...


>> puts Bio::Fetch.query('genbank','185041')
>>
>> server = Bio::Fetch.new()
>> #server = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch')
>> puts server.fetch('genbank','J00231','html')

Besides, as you can find at another BioFetch server provided by EBI (Dbfetch),

  http://www.ebi.ac.uk/cgi-bin/dbfetch

they doesn't provide GenBank database also (because they have EMBL instead).


As a conclusion, if you need to fetch a GenBank entry from remote server,
using NCBI with E-Utils is the best way for now.

Unfortunately, we don't have the Bio::NCBI::Eutils class yet,
it seems that you can temporally divert the Bio::PubMed class to do that.

  Bio::PubMed.efetch("185041", {"db"=>"nuccore", "rettype"=>"gb"})
  Bio::PubMed.efetch("J00231", {"db"=>"nuccore", "rettype"=>"gb"})

ESOAP can be alternative but it takes quite long time to read the current
version of the WSDL file and returned value is not easy to handle.


Regards,
Toshiaki Katayama


>> entry = Bio::DBGET.bget("AF139016")
>>
>> gb = Bio::GenBank.new(Bio::Fetch.query('gb', 'J00231'))
>> puts gb.read
>>
>> And running on Windows XP. Ruby 1.8.6
>
> I also get errors:
>
> $ ruby -rbio -e 'reg = Bio::Registry.new'
> /usr/lib/ruby/1.8/net/http.rb:560:in `initialize': No route to host -
> connect(2) (Errno::EHOSTUNREACH)
>        from /usr/lib/ruby/1.8/net/http.rb:560:in `open'
>        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
>        from /usr/lib/ruby/1.8/timeout.rb:48:in `timeout'
>        from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
>        from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
>        from /usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
>        from /usr/lib/ruby/1.8/net/http.rb:542:in `start'
>        from /usr/lib/ruby/1.8/net/http.rb:440:in `start'
>        from /usr/lib/ruby/1.8/bio/io/registry.rb:190:in `read_remote'
>        from /usr/lib/ruby/1.8/bio/io/registry.rb:126:in `initialize'
>        from -e:1:in `new'
>        from -e:1
>
> $ ruby -v
> ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
>
> $ lsb_release  -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 7.10
> Release:        7.10
> Codename:       gutsy
>
> Unfortunately, I don't know how to display what version of bioruby I'm
> using.  I guess I'm too new to ruby, let alone bioruby, to be of any
> help.  Anyone have a working example?  Unfortunately, my connection to
> bioruby.org doesn't work (I suspect our 'Net connection is snafu'ed).
>
> Regards,
> - Robert
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby


From ktym at hgc.jp  Thu Dec 20 16:54:18 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Fri, 21 Dec 2007 01:54:18 +0900
Subject: [BioRuby] how to retrieve a genbank record by GI
In-Reply-To: <476A8817.9030108@broad.mit.edu>
References: <476940CC.6000803@broad.mit.edu>	<4145b6790712191139o2fa6c37er6331fa38def372d9@mail.gmail.com>	<4769756B.3080406@broad.mit.edu>	<4145b6790712191246i2abd5252q11f702f116a76115@mail.gmail.com>
	<B8176F86-1358-47F0-884B-1FA8095141C2@hgc.jp>
	<476A8817.9030108@broad.mit.edu>
Message-ID: <95F14218-A50E-4A18-9ECB-3FC68B4D8DAE@hgc.jp>

Hi Gujja,

On 2007/12/21, at 0:19, Sharvari Gujja wrote:
> On 2007/12/20, at 5:46, Robert Citek wrote:
>>> Unfortunately, I don't know how to display what version of bioruby I'm
>>> using.

You can check the version of BioRuby by

 % ruby -rubygems -rbio -e 'p Bio::BIORUBY_VERSION'
 [1, 2, 0]

or by running the bioruby command like

 % bioruby
 Loading config (/Users/ktym/.bioruby/shell/session/config) ... done
 Loading object (/Users/ktym/.bioruby/shell/session/object) ... done
 Loading history (/Users/ktym/.bioruby/shell/session/history) ... done

 . . . B i o R u b y   i n   t h e   s h e l l . . .

   Version : BioRuby 1.2.0 / Ruby 1.8.6

 bioruby> exit


> Hi all
>
> Thanks for all your input.
>
> However, can s'one explain how to set up seqdatabase.ini file. I did go thru the read me file but does not make much sense to me.

Ah, if you are using Windows, I have no idea as I have never tried.
Instead, you can also put the file on the net as described in:

 http://bioruby.org/rdoc/files/lib/bio/io/registry_rb.html

Anyway, the OBDA is still available in BioRuby but I feel
it is not actively used in other Bio* projects these days.

This situation reminds me one more way to retrieve a GenBank entry.
If you have installed the EMBOSS suite, you can setup ~/.embossrc file 
to access NCBI like:

DB genbank [
 type: N 
 format: genbank
 method: url
 url: "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=gb&retmode=text&id=%s"
]

and call entret command by

 Bio::EMBOSS.entret('genbank:185041')


> Also , I have tried
>
> Bio::PubMed.efetch("185041", {"db"=>"nuccore", "rettype"=>"gb"})
>
> but this gives me the pubmed entry. I need the genbank format.

If your BioRuby is older than 1.2.0, try update it first.
In my environment, I've got a GenBank entry correctly.
I expect that this way is most feasible on Windows for now.
I'll prepare the Bio::NCBI::Eutils class in the next release.


> Appreciate your help.
>
> Thanks
> S


Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614
fax://+81-3-5449-5434
http://www.hgc.jp/ (Human Genome Center)
http://bioruby.org/ (BioRuby project)
http://das.hgc.jp/ (KEGG DAS)
http://www.genome.jp/kegg/soap/ (KEGG API)


From yjchen at reciprocallattice.com  Thu Dec 20 19:11:39 2007
From: yjchen at reciprocallattice.com (Yen-Ju Chen)
Date: Thu, 20 Dec 2007 11:11:39 -0800
Subject: [BioRuby] A Rails application with BioRuby
In-Reply-To: <C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>
References: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
	<C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>
Message-ID: <bd370ace0712201111s30de42a1nc28f8188b573bb1f@mail.gmail.com>

On 12/19/07, Toshiaki Katayama <ktym at hgc.jp> wrote:
>
> Hi Yen-Ju,
>
> On 2007/12/19, at 6:54, Yen-Ju Chen wrote:
>
> > Hi,
> >  I am working on a rails application using BioRuby to collect references
> > and database entries.
> >  You can find the application (not source code yet) at
> > journalclub.reciprocallattice.com
>
> Cool.
>
>
> >  It is still at early stage. I use it personally and figure it would be
> > interesting to have more users.
> >  If you want to join, please write to me in private so that it will not
> > pollute BioRuby maillist.
> >  I don't know how many users the application can take. Please see the
> > website for more details.
> >
> >  These are things related to BioRuby,
> >  * The output from Reference to BibTex format lacks abstract.
> >  * It would be nice to be able to output to RIS format for EndNote and
> > ReferenceManager.
>
>
> If you could provide a patch for them, I'll include it in BioRuby.


  I will look at the RIS format and supply a patch later.

>  * Is it possible to get DOI from PubMed ?
>
>   entry = Bio::PubMed.query(16946072)
>   doi = entry[/AID - (\S+) \[doi\]/, 1]
>
>
> or you can extend the Bio::MEDLINE class to add the doi method


  Is it possible to have this feature in BioRuby ?
  I found DOI becomes more common recently, even PDB has DOI number.
  And it seems the only way to have a unique id on an article.
  For example, PubMed and Goggle Scholar may return the same article with
their own id (PMID and Google Scholar ID).
  I found it is only possible to compare the DOI to ensure two entries refer
to the same article.

  [snip]


> >  * BioRuby can get information from many databases through biofetch,
> >    but not processing them, like Pfam, Prosite, etc.
>
> You can process them by appropriate corresponding classes. For example,
>
>   cyclins = Bio::Fetch.query('prosite', 'PS00292')
>   prosite = Bio::PROSIE.new(cyclins)


  Thanx. I didn't notice PROSITE from BioRuby API before.
  Pfam is still missing.
  I will see what I can do about it.


>
>
> >  * it is not clear what's the database from biofetch, for example: rn,
> rp,
> > str, pr.
> >    I am in structural biology. Many of these abbreviation is not
> obvious.
>
> In BioRuby, the default BioFetch server is implemented as a proxy for the
> DBGET system through KEGG API.
> So, please refer to the abbreviation field in the DBGET manual at
>
>   http://www.genome.jp/dbget/


  That's a good tip.
  It would also be user-friendly to show them from BioRuby.

  Thanx for these information.

  Yen-Ju

and also note that the DBGET service for GenBank (gb) database is no longer
> available.
>
>
> Regards,
> Toshiaki Katayama
>
>
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>


From ktym at hgc.jp  Fri Dec 21 05:16:06 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Fri, 21 Dec 2007 14:16:06 +0900
Subject: [BioRuby] A Rails application with BioRuby
In-Reply-To: <bd370ace0712201111s30de42a1nc28f8188b573bb1f@mail.gmail.com>
References: <bd370ace0712181354i7317b9f3le3679e395b8d71de@mail.gmail.com>
	<C5897EFA-129F-46EA-9130-752A5DC3D11D@hgc.jp>
	<bd370ace0712201111s30de42a1nc28f8188b573bb1f@mail.gmail.com>
Message-ID: <8FADCE93-34C9-468F-99B5-96CCE49D6ECF@hgc.jp>

Hi Yen-Ju,

On 2007/12/21, at 4:11, Yen-Ju Chen wrote:

> >  * Is it possible to get DOI from PubMed ?
>
>   entry = Bio::PubMed.query(16946072)
>   doi = entry[/AID - (\S+) \[doi\]/, 1]
>
>
> or you can extend the Bio::MEDLINE class to add the doi method
>
>
>   Is it possible to have this feature in BioRuby ?
>


I just committed the following changes to the CVS.

  def doi
    @pubmed['AID'][/(\S+) \[doi\]/, 1]
  end

  def pii
    @pubmed['AID'][/(\S+) \[pii\]/, 1]
  end

so that you can use them as

 entry = Bio::PubMed.query(16946072)
 medline = Bio::MEDLINE.new(entry)
 doi = medline.doi
 pii = medline.pii

Regards,
Toshiaki


From ktym at hgc.jp  Sat Dec 29 20:12:11 2007
From: ktym at hgc.jp (Toshiaki Katayama)
Date: Sun, 30 Dec 2007 05:12:11 +0900
Subject: [BioRuby] BioRuby 1.2.1 is released
Message-ID: <A231F333-D068-452F-9F8D-1D789A5A4A66@hgc.jp>

Hi all,

I just released the BioRuby 1.2.1 including fix for BLAST 2.2.17 output.
Note that this version is not yet Ruby 1.9 compliant.

 http://bioruby.org/archive/bioruby-1.2.1.tar.gz
 http://rubyforge.org/projects/bioruby/

You can see changes at

  http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioruby/ChangeLog?rev=1.79&cvsroot=bioruby


P.S.
Unfortunately, I removed the RAA entry for BioRuby by mistake (I need to sleep now :).
I immediately re-added as a new project but our history was lost.

  http://raa.ruby-lang.org/project/bioruby/

I took a screenshot of the old admin screen for record.

  http://bioruby.org/tmp/bioruby-deleted-raa.png

Happy holidays!

Regards,
Toshiaki Katayama
--
Human Genome Center, Institute of Medical Science, University of Tokyo
4-6-1 Shirokanedai, Minato-ku, Tokyo 108-0071, Japan
tel://+81-3-5449-5614
fax://+81-3-5449-5434
http://www.hgc.jp/ (Human Genome Center)
http://bioruby.org/ (BioRuby project)
http://das.hgc.jp/ (KEGG DAS)
http://www.genome.jp/kegg/soap/ (KEGG API)