Bioperl: Re: Bioperl-guts: Any non-redundant database tools out there ???

Staffan Bergh staffan@sgibest.sto.se.pnu.com
Fri, 28 Aug 1998 08:39:23 +0200


One thing I've done was to key on the MD5 checksum of the
sequence.

My 0.02$

/staffan

-----------------------------------------------------------------------

Staffan Bergh
Pharmacia & Upjohn, P14:5, SE-112 87 Stockholm, Sweden

email:  Staffan.Bergh@eu.pnu.com + Don't let that horse eat that violin
phone:  (int+46)  08 695 9884    +               cried Chagall's mother
fax:    (int+46)  08 695 4084    + but he kept right on painting
mobile: (int+46) 070 698 8829    +             -- Lawrence Ferlinghetti

> Ewan Birney said:
>
> Gordon posted this is to 'guts' but it seems much more
> appropiate to post the main mailing list, hence I am
> forwarding it.
> 
> 
> 
> Ewan Birney
> <birney@sanger.ac.uk>
> http://www.sanger.ac.uk/Users/birney/
> 
> ---------- Forwarded message ----------
> Date: Thu, 27 Aug 1998 11:11:50 -0500
> From: Gordon D. Pusch <pusch@mcs.anl.gov>
> To: vsns-bcd-perl-guts@lists.uni-bielefeld.de
> Subject: Bioperl-guts: Any non-redundant database tools out there ???
> 
> 
> Hi --- I am trying to construct a ``non-redundant'' version of WIT's
> sequence database. An obvious stupid-but-simple way to do this would
> be to use the sequence itself as the key to a hash of ID lists.
> 
> However, since there are a LOT of sequences, the whole thing obviously
> won't fit into memory and we will have to store the hash as a Berkeley-DB;
> and off course, some of the sequences are quite long.  I worry about such
> enormously long keys ``breaking'' something in either perl5 or Berkeley-DB's
> hash routines ---I gather they are stored internally as B-trees, so I
> could easily imagine very long keys producing stack-overflows during a
> tree traversal if the trees got too deep... :-(
> 
> Has anyone on this list implemented a non-redundant database-builder 
> in perl ???  
> 
> Does anyone know if there =IS= there a limit as to how long a hash-key
> can be for either perl5 or Berkeley-DB ???  If so, what are the usual
> failure-modes ???
> 
> Can anyone suggest a more elegant algorithm than the ``stupid-but-simple'' 
> method outlined above ???
> 
> 
> Thanks in advance,
> 
> --  Gordon D. Pusch   <pusch@mcs.anl.gov>
> 
> Disclaimer:  I'm a consultant collaborating with Argonne researchers;
> I don't speak for ANL or the DOE --- and they *certainly* don't speak
> for =ME= !!!
> 
> Claimer:  I report =ALL= SPAMvertisers to their ISP --- =NO= exceptions !!!
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl-guts.html
> ====================================================================
> 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================




=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================