[Biopython] Bio.SeqIO.index() - gzip support and/or index stored on disk?

Brad Chapman chapmanb at 50mail.com
Sat Jun 5 20:42:23 UTC 2010


Aaron and Laurent;

Aaron:
> I am facing a similar situation.  Brad, out of curiosity did you also try h5py?
>
> http://code.google.com/p/h5py/

Yes, I think that's the right way to go. After I found out about
the indexing I re-aligned my thinking around h5py, which is more
hierarchical than table based. Ironically, this led me to a more
compact binned solution which would work fine within SQLite, which is
why I never got very far with h5py. I would start with this next time
a need arises.

Laurent:
> Not tested is not good, but that's mostly a matter of having unit tests.

I'm not knocking the code, only reading the warnings on the download
page. Hopefully this will shape up to be something usable, and like
others am looking forward to the BOSC presentation.

> Also I am referring to using HDF5 (mature, tested), not necessarily
> BioHDF as an higher layer (which I have no experience at all with).
> Should BioHDF not have tests and release cycles, it will probably
> not be the answer for me either.
> 
> Along those lines, a very recent post advertising for a position at
> FHRC (bioconductor's group) suggests that HDF5 (and netCDF) are
> directions considered over there as well.

That's good news. Essentially what I wanted was to build a data
structure that I could sub-select out of into an R data.frame, ala
sqldf:

http://code.google.com/p/sqldf/

> I am reading otherwise that not everyone using BAM/SAM is happy with
> it (and some threatening to fork).
> I might well be wrong, but I don't think that BAM/SAM has (yet) a
> place so prominent that efforts should first go into converting to
> it.

Oh please don't ruin my day by bringing up that possibility;
BAM features pretty prominently in my daily work. Broad's
Picard and GATK pipelines are based solely on BAM, so I might be
biased due to my interactions with them. Hopefully if the community
moves to something else for alignment representation a smooth
transition is planned.

Famous last words,
Brad



More information about the Biopython mailing list