[Biopython-dev] Sequential SFF IO

Wed Jan 26 18:30:44 UTC 2011

On Wed, Jan 26, 2011 at 12:19 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> I don't think the above will work without some "magic" to record the
> SFF header (which currently would require using private attributes
> of the SffWriter objects) as done via its write_file method.
>
> Also you can't read in SFF files with "sff-trim" if you want to output
> them, since this discards all the flow space information. You have
> to use format "sff" instead.
>
>
Agreed-- shooting from the hip again.

> > I could use a simple generator if I was merely filtering records, but the
> > write_file interface would require more co-routine functionality than
> > generators provide.
>
> How many output files do you have? Assuming it is small I'd go for
> the simple solution of one loop over the input SFF file for each output
> file.
>
>
We're routinely multiplexing hundreds or thousands of samples per SFF file
and using sequence barcodes to identify them.  The number of outputs make a
one-pass solution is much preferable.  Anyhow, it seems that this has gone
beyond the scope of generic Biopython, so I'm happy to make my modifications
locally (and share the results if anyone is interested).  We're currently
using the Roche/454 sff tools, but they have known bugs and we have 5' and
3' adapters to consider.

Thanks,
-Kevin