[Bioperl-l] Bio::Index::Fastq - Interface for indexing (multiple) fastq files failure

Wed Apr 7 13:08:16 EDT 2010

On Wed, Apr 7, 2010 at 5:56 PM, Chris Fields <cjfields at illinois.edu> wrote:
>
> I think we're going with the AnyDBM option, which allows SQLite if
> requested (via Mark's SQLite_DBM).
>
> chris

Hi Chris,

Does Mark's SQLite_DBM already have an SQLite schema defined? I'd
idealy like us to agree something shared with other Bio* libraries (a new
OBDA standard using SQLite instead of BDB). I was thinking something
along these lines if we want to support an index for multiple files:

* meta - table with string key/values (in particular to hold a schema version
number, plus perhaps the tool which built the index)

* offsets - table with entry accessions, file number, file offset

* files - table with filenames, file type (e.g. FASTA), datestamp
(so we can spot if the index is older than the file and needs to be
updated), perhaps other things like if the file is compressed (gzip,
bz2, ...).

If some kind of shared SQLite index schema (whatever it looks like)
does seem like a good idea to you guys (BioPerl), should we move
this discussion over to open-bio-l at lists.open-bio.org?

Regards,

Peter