[Biopython-dev] Merging Bio.SeqIO SFF support?

Mon Mar 1 23:22:42 UTC 2010

On Thu, Feb 11, 2010 at 12:29 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Mon, Jan 11, 2010 at 5:11 PM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> > I didn't want to rush the SFF support into Biopython 1.53, but its been
> > waiting "ready" for a while now. Any objections or comments about
> > me merging this now?
>
> There were no objections, and I ran this by Brad and Michiel and
> have just merged this into the master branch. Time for some more
> testing!
>
>
I've tried out the recently landed SFF SeqIO code and am pleased to report
that it works very well.  I am parsing gsMapper 454PairAlign.txt output and
converting it to SAM/BAM format to view in IGV (among other things) and
wanted to include per-based quality score information from the SFF files.
 The only glitch so far is that the indexed access mode yields sequences
with no alphabet assigned.  The solution is to add the following to the
beginning of SffDict.__init__:

        if alphabet is None:
          alphabet = Alphabet.generic_dna

My only other comment is that several file reads and struct.unpacks can be
merged in _sff_read_seq_record.  Given the number of records in most 454 SFF
files, I suspect the micro-optimization effort will be worth the slight cost
in code clarity.

Thanks to Peter and Jose for all of their hard work!

Best regards,
-Kevin