[Biopython-dev] Sequential SFF IO

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Wed Jan 26 16:44:53 UTC 2011


On Wed, Jan 26, 2011 at 10:45 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Wed, Jan 26, 2011 at 3:14 PM, Kevin Jacobs wrote:
> > Any objections/worries about converting the SFF writer to use the
> > sequential/incremental writer object interface?  I know it looks
> > specialized for text formats, but
>
> It already uses Bio.SeqIO.Interfaces.SequenceWriter
>
>
Sorry-- was shooting from the hip.  I meant a SequentialSequenceWriter.


> > ... I need to split large SFF files into many smaller ones
> > and would rather not materialize the whole thing.  The SFF writer
> > code already allows for deferred writing of read counts and index
> > creation, so it looks to be only minor surgery.
>
> I don't understand what problem you are having with the SeqIO API.
> It should be quite happy to take a generator function, iterator, etc
> (as opposed to a list of SeqRecord objects which I assume is what
> you mean by "materialize the whole thing").



The goal is to demultiplex a larger file, so I need a "push" interface.
 e.g.

out = dict(...) # of SffWriters

for rec in SeqIO(filename,'sff-trim'):
  out[id(read)].write_record(rec)

for writer in out.itervalues():
  writer.write_footer()

I could use a simple generator if I was merely filtering records, but the
write_file interface would require more co-routine functionality than
generators provide.

> There doesn't seem to be an obvious API for obtaining such a writer
> > using the SeqIO interface.
>
> You can do that with:
>
> from Bio.SeqIO.SffIO import SffWriter
>
>
For my immediate need, this is fine.  However, the more general API doesn't
have a SeqIO.writer to get SequentialSequenceWriter objects.

-Kevin



More information about the Biopython-dev mailing list