[Biopython] Bio.SeqIO.index() - gzip support and/or index stored on disk?
Chris Fields
cjfields at illinois.edu
Sat Jun 5 09:31:37 EDT 2010
On Jun 5, 2010, at 7:51 AM, Brad Chapman wrote:
> Laurent and Peter;
>
>> I do believe that building on HDF5 is a better approach:
>> - better use of resources (do not reinvent completely what is
>> already existing unless better)
>> - HDF5 is designed as a rather general storage architecture, and
>> will let one build tailored solutions when needed.
>
> HDF5 does has lots of good technical points, although as Peter mentions
> the lack of community uptake is a concern. To potentially explain this,
> here is my personal HDF5 usage story: I took an in depth look at PyTables
> for some large data sets that were overwhelming SQLite:
>
> http://www.pytables.org/moin
>
> The data loaded quickly without any issues, but the most basic thing
> I needed was indexes to retrieve a subset of the data by chromosome
> and position. Unfortunately, you can't create indexes without
> buying the Pro edition:
>
> http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
>
> That immediately killed my ability to share the script so I ended
> my HDF5 experiment and reworked my SQLite approach.
>
> Also, echoing Peter, the BioHDF download warns you that the code is
> not stable, tested, or supported:
>
> http://www.hdfgroup.org/projects/biohdf/biohdf_downloads.html
>
> BAM is widely used and has tools that are meant to work on it
> in production environments now, while HDF tool support still feels
> experimental. Sometimes it is best to be practical and keep an eye
> on other technical solutions as they evolve,
>
> Brad
Yes, will be interesting to see how far along it is at BOSC.
chris
More information about the Biopython
mailing list