[Open-bio-l] Status of OBDA and indexed flatfiles?

Mon Aug 31 15:33:02 UTC 2009

On Aug 31, 2009, at 7:07 AM, Peter wrote:

> Hi all,
>
> I'm looking at indexing next generation sequence files for Biopython
> (e.g. FASTQ short read files with 10s of millions of entries), where
> even just holding the record names and their file offsets in memory
> is beginning to be a bottleneck.
>
> What is the current status of Open Biological Database Access (OBDA),
> and in particular the index files for sequence "flat files" like  
> FASTA or
> GenBank (or FASTQ)?
>
> http://www.bioperl.org/wiki/HOWTO:Flat_databases
> http://www.bioperl.org/wiki/HOWTO:OBDA
> http://obda.open-bio.org/
>
> The spec files are still in CVS (and ViewCVS is still broken since
> the recent server move), rather than having been migrated to SVN
> which may suggest things are obsolete (or on the bright side, stable).
>
> Presumably BioPerl still uses these index files? What about the
> other projects? I know EMBOSS has some indexing system for
> example but I have no idea how it works internally.
>
> Thanks,
>
> Peter

I don't use OBDA, personally, but I can check on the status with Brian  
Osborne (he was heading it up last I checked).  However, I don't think  
BioPerl has an OBDA FASTQ parser.

You may be thinking about Bio::Index::FASTQ?  That one is not OBDA,  
but just a simple flat file indexer.  We could probably set an OBDA  
parser up fairly easily if needed.

chris