[BioRuby] RFC Caching (was BioRuby standards)

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Tue Sep 9 21:48:20 EDT 2008


Hi,

I think the most important thing for cache is data integrity.
For example, timing for detecting updates of original data,
controlling accesses and resolving race conditions
(two or more processes or threads simultaneously want to
use, update, create, and/or remove the same cache data).
However, your code only contains directory name determination.

line 24:
>      def set directory, subdir = nil

In def lines, please use parentheses explicitly,
e.g.   def set(directory, subdir = nil),
because most of existing code in BioRuby does so.

line 28:
>         dir = dir + '/' + subdir

File.join(dir, subdir) should be used, possibly to support
non-UNIX systems like Windows.

lines 41 to 45:
>          if cache==nil or cache==''
>            cache = ENV['TMPDIR']
>          end
>          cache = '/tmp' if cache==nil or cache==''
>          set cache, subdir

Using Dir.tmpdir defined in tempdir.rb is better.
http://www.ruby-doc.org/stdlib/libdoc/tmpdir/rdoc/index.html

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org

On Tue, 9 Sep 2008 13:38:16 +0200
pjotr2008 at thebird.nl (Pjotr Prins) wrote:

> I wrote a simple file Cache Singleton. See:
> 
>   http://github.com/pjotrp/bioruby/tree/462614487767568f41db03d894875a3d78ced08e/lib/bio/db/microarray/cache.rb
> 
> The Cache can be read and set with:
> 
>   dir = Bio::Microarray::Cache.instance.directory('GEO')
> 	# override cache dir
>   dir = Bio::Microarray::Cache.instance.set(newcachedir,'GEO')
> 	
> Everyone OK with this?
> 
> Pj.
> 
> On Tue, Sep 02, 2008 at 11:19:58AM +0200, Pjotr Prins wrote:
> > > Note that some classes use Tempfile class, a standard bundled
> > > class with Ruby by default, and the Tempfile class depends
> > > on enviroment variables (TMPDIR, TMP, etc.).
> > 
> > I noticed. Caching is a bit different in nature - as caches may be
> > there for a long time. TMPDIRs get emptied on reboot, for one.
> > 
> > > I think cache isn't suitable for standard, because its purpose
> > > may differ from program (or class, module, etc.) to program.
> > 
> > > For example, if I want to put class A's cache on a fast hard disk
> > > with very large size, and program B's cache on a slower hard disk
> > > with small size, what should I do?
> > 
> > That is true. OK, leave caching for the modules to resolve. I'll use
> > my own caching of GEO XML objects.




More information about the BioRuby mailing list