[Open-bio-l] Status of OBDA and indexed flatfiles?
Chris Fields
cjfields at illinois.edu
Mon Aug 31 15:33:02 UTC 2009
On Aug 31, 2009, at 7:07 AM, Peter wrote:
> Hi all,
>
> I'm looking at indexing next generation sequence files for Biopython
> (e.g. FASTQ short read files with 10s of millions of entries), where
> even just holding the record names and their file offsets in memory
> is beginning to be a bottleneck.
>
> What is the current status of Open Biological Database Access (OBDA),
> and in particular the index files for sequence "flat files" like
> FASTA or
> GenBank (or FASTQ)?
>
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
> http://www.bioperl.org/wiki/HOWTO:OBDA
> http://obda.open-bio.org/
>
> The spec files are still in CVS (and ViewCVS is still broken since
> the recent server move), rather than having been migrated to SVN
> which may suggest things are obsolete (or on the bright side, stable).
>
> Presumably BioPerl still uses these index files? What about the
> other projects? I know EMBOSS has some indexing system for
> example but I have no idea how it works internally.
>
> Thanks,
>
> Peter
I don't use OBDA, personally, but I can check on the status with Brian
Osborne (he was heading it up last I checked). However, I don't think
BioPerl has an OBDA FASTQ parser.
You may be thinking about Bio::Index::FASTQ? That one is not OBDA,
but just a simple flat file indexer. We could probably set an OBDA
parser up fairly easily if needed.
chris
More information about the Open-Bio-l
mailing list