[Bioperl-l] Announcing Bio::SFF

Mon Dec 19 14:19:14 UTC 2011

On Wed, Dec 14, 2011 at 5:44 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> There are two widely used indexes, both from Roche (one with and
> one without an XML manifest, magic bytes .mft and .srt). They are
> both just a simple table of the reads names and offsets, sorted
> alphabetically.

Yeah, that's what I got from the BioPython code. I didn't know it was
sorted though (it doesn't make much sense either, unless they wanted to do
a binary search or something).

This works pretty well for rapid lookup for SFF files
> (because the read count is not so high), and is pretty easy.
>

It's implemented in Bio::SFF 0.003. I did restructure my code into two
readers though, since doing sequential and random-access in the class
didn't make much sense code-wise.

I don't think anyone used the hash table style indexes (.hsh), which
> I assume was a proof of principle or trial in the early days of SFF.
>

I see, too bad.

> One thing to check is what Ion Torrent's SFF files use. I would
> guess they've followed Roche, but I don't know. After all, the
> index structure is not defined in the SFF specification - it was
> left extensible on purpose.
>

Yeah, we should check that too.

Yes, please do.
>

It's added to 0.003. The lack of tests was bothering me, but the SFFs I had
at hand were not suitable.

Leon