[Biopython-dev] Sequential SFF IO
Kevin Jacobs <jacobs@bioinformed.com>
bioinformed at gmail.com
Wed Jan 26 16:44:53 UTC 2011
On Wed, Jan 26, 2011 at 10:45 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> On Wed, Jan 26, 2011 at 3:14 PM, Kevin Jacobs wrote:
> > Any objections/worries about converting the SFF writer to use the
> > sequential/incremental writer object interface? I know it looks
> > specialized for text formats, but
>
> It already uses Bio.SeqIO.Interfaces.SequenceWriter
>
>
Sorry-- was shooting from the hip. I meant a SequentialSequenceWriter.
> > ... I need to split large SFF files into many smaller ones
> > and would rather not materialize the whole thing. The SFF writer
> > code already allows for deferred writing of read counts and index
> > creation, so it looks to be only minor surgery.
>
> I don't understand what problem you are having with the SeqIO API.
> It should be quite happy to take a generator function, iterator, etc
> (as opposed to a list of SeqRecord objects which I assume is what
> you mean by "materialize the whole thing").
The goal is to demultiplex a larger file, so I need a "push" interface.
e.g.
out = dict(...) # of SffWriters
for rec in SeqIO(filename,'sff-trim'):
out[id(read)].write_record(rec)
for writer in out.itervalues():
writer.write_footer()
I could use a simple generator if I was merely filtering records, but the
write_file interface would require more co-routine functionality than
generators provide.
> There doesn't seem to be an obvious API for obtaining such a writer
> > using the SeqIO interface.
>
> You can do that with:
>
> from Bio.SeqIO.SffIO import SffWriter
>
>
For my immediate need, this is fine. However, the more general API doesn't
have a SeqIO.writer to get SequentialSequenceWriter objects.
-Kevin
More information about the Biopython-dev
mailing list