[Open-bio-l] Status of OBDA and indexed flatfiles?

Mon Aug 31 08:07:45 EDT 2009

Hi all,

I'm looking at indexing next generation sequence files for Biopython
(e.g. FASTQ short read files with 10s of millions of entries), where
even just holding the record names and their file offsets in memory
is beginning to be a bottleneck.

What is the current status of Open Biological Database Access (OBDA),
and in particular the index files for sequence "flat files" like FASTA or
GenBank (or FASTQ)?

http://www.bioperl.org/wiki/HOWTO:Flat_databases
http://www.bioperl.org/wiki/HOWTO:OBDA
http://obda.open-bio.org/

The spec files are still in CVS (and ViewCVS is still broken since
the recent server move), rather than having been migrated to SVN
which may suggest things are obsolete (or on the bright side, stable).

Presumably BioPerl still uses these index files? What about the
other projects? I know EMBOSS has some indexing system for
example but I have no idea how it works internally.

Thanks,

Peter