[Biopython] Bio.SeqIO.index() - gzip support and/or index stored on disk?
Laurent
lgautier at gmail.com
Sat Jun 5 09:05:43 EDT 2010
On 05/06/10 14:25, Peter wrote:
> On Sat, Jun 5, 2010 at 1:06 PM, Laurent<lgautier at gmail.com> wrote:
>>>
>>> There is some talk on the samtools mailing list about general improvements
>>> to the chunking in BAM, relocating the header information (and other very
>>> read specific things about representing error models, indels, etc). You
>>> may be right that HDF5 has technical advantages over BAM version 1,
>>> but currently my impression is that SAM/BAM is making good headway
>>> with becoming a defacto standard for next generation data, while HDF5 is
>>> not. Maybe someone should suggest they move to HDF5 internally for BAM
>>> version 2?
>>
>> De-facto standards happen to become so because more people use them
>> at some point (which may involve step during which a lot of people /believe/
>> that most of the people are using a format over an other ;-) ), but this is
>> indeed not necessarily making them the best technical solutions.
>
> Absolutley.
>
>> I do believe that building on HDF5 is a better approach:
>> - better use of resources (do not reinvent completely what is already
>> existing unless better)
>> - HDF5 is designed as a rather general storage architecture, and will let
>> one build tailored solutions when needed.
>>
>> I'd be surprised the BAM/SAM do not know about HDF formats, but I do not
>> know for sure. Is there any BAM/SAM person reading ?
>
> I've been subscribed to the samtools mailing list for a few weeks now. I think
> we (or better yet the BioHDF team) should put this idea forward on their
> mailing list. As I said, they appear to be discussing some fairly dramatic
> changes to the internals of the BAM format (while intending to keep their
> API as close as possible), so now would be a good time to consider a
> switch from their blocked gzip system to something else like HDF instead.
>
> Chris has pointed out some BioHDF people will be at BOSC 2010. There
> is also a "HiTSeq: High Throughput Sequencing" ISMB 2010 SIG meeting
> at the same time as BOSC 2010, so there could be some SAM/BAM
> folk about in Boston to have some in person discussions with. Will you
> be there is year Laurent (or at EuroSciPy or something else instead)?
I'll be at BOSC / ISMB. Hopefully we will all stumble upon each other.
Best,
Laurent
> Regards,
>
> Peter
More information about the Biopython
mailing list