[Biojava-l] New Wiki page
Paul Edlefsen
pedlefsen@systemsbiology.org
Wed, 07 Feb 2001 11:01:04 -0800
Speaking of who's doing what, I was considering writing an implementation of
SymbolList that takes a nibble (or maybe a byte) per DNA base instead of a
word. I've got this code in C++ and thought I'd port it over, though I
haven't yet begun.
Is anybody else working along similar lines? I need to read in multimegabase
sequences and just 35Megabase Human chr.22 is too much for the current
implementation, even increasing the heap to 128Megs. (This makes sense: 35 M
bases * 4 bytes/base > 128 M bytes).
Our goal is to make some open tools for whole-genome analysis and
cross-species comparison. 35 Megabases is just the tip of the iceberg: to
defend biojava to my peers I need to demonstrate that it can handle big
sequences.
:Paul
PS This is my first correspondence with the list, so I'll introduce myself.
The Institute for Systems Biology (http://www.systemsbiology.org) is a
Seattle-based nonprofit academic institution for interdisciplinary molecular
biology and biotechnology. I am a computer programmer (not a biologist
(yet?)) in the Computational Biology group. If anyone on this list is in the
US Northwest, let's have lunch.
I've been working on a C++ bio-toolkit ala biojava, etc. that can use
Paracel's Genematcher or just a local search, though it is nowhere near ready
for public scrutiny. I've been asked to make some quick-and-dirty
visualization tools in Java, which has brought me to biojava. Thanks to
biojava (and Jazz -- check out http://www.cs.umd.edu/hcil/jazz/), I made the
prototype in 3 days! Y'all have done a fine job, and I look forward to
contributing to the effort.
--
Paul T. Edlefsen Software Engineer
<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>
Computational Biology Group
The Institute for Systems Biology
4225 Rooosevelt Way NE, Suite 200
Seattle, WA 98105
pedlefsen@systemsbiology.org
<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>>>
Phone: (206)732-1336
Fax: (206)732-1299