[Bioperl-l] storing/retrieving a large hash on file system?

Brian Osborne bosborne11 at verizon.net
Tue May 18 16:00:06 UTC 2010


Ben,

I've use Storable to do things like this, for example:

use Storable;

my %species = ( "Sc" => 4932,  # Saccharomyces cerevisiae
				     "Ec" => 83333, # Escherichia coli K12
					  "Hs" => 9606   # H. sapiens
				   );

my ($help,$id,$name);

GetOptions( "s=s"  =>  \$name,
            "i=i"  =>  \$id,
				"h"    =>  \$help );

usage() if ($help || !$id || !$name);

my $storedHash = $name . ".dump";

# create index for a directory of fasta files
my $db = Bio::DB::Fasta->new($name, -makeid => \&make_my_id);

# extract species-specific data from gene2accession
unless (-e $storedHash) {
	my $ref;
	# extract species-specific information from gene2accession
	open MYIN,"gene2accession" or die "No gene2accession file\n";
	while (<MYIN>) {
		my @arr = split "\t",$_;
		if ($arr[0] == $species{$name} && $arr[9] =~ /\d+/ && $arr[10] =~ /\d+/) {
			($ref->{$arr[1]}->{"start"}, $ref->{$arr[1]}->{"end"}, 
			 $ref->{$arr[1]}->{"strand"}, $ref->{$arr[1]}->{"id"}) =	
				($arr[9], $arr[10], $arr[11], $arr[7]);
		}
	}
	# save species-specific information using Storable
	store $ref, $storedHash;
} 

# retrieve the species-specific data from a stored hash
my $ref = retrieve($storedHash);

Take away all the parsing details and you can see that it's simple, and that Storable exports store() and retrieve(). Make up a file name, "store" the hash reference.

Brian O.

On May 18, 2010, at 11:28 AM, Ben Bimber wrote:

> this question is more of a general perl one than bioperl specific, so
> I hope it is appropriate for this list:
> 
> I am writing code that has two steps.  the first generates a large,
> complex hash describing mutations.  it takes a fair amount of time to
> run this step.  the second step uses this data to perform downstream
> calculations.  for the purposes of writing/debugging this downstream
> code, it would save me a lot of time if i could run the first step
> once, then store this hash in something like the file system.  this
> way I could quickly load it, when debugging the downstream code
> without waiting for the hash to be recreated.
> 
> is there a 'best practice' way to do something like this?  I could
> save a tab-delimited file, which is human readable, but does not
> represent the structure of the hash, so I would need code to re-parse
> it.  I assume I could probably do something along the lines of dumping
> a JSON string, then read/decode it.  this is easy, but not so
> human-readable.  is there another option i'm not thinking of?  what do
> others do in this sort of situation?
> 
> thanks in advance.
> 
> -Ben
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list