[Biopython] replace header
Peter Cock
p.j.a.cock at googlemail.com
Thu May 31 18:10:11 UTC 2012
On Thu, May 31, 2012 at 6:55 PM, Lenna Peterson <arklenna at gmail.com> wrote:
>> The key point about using SeqIO.write(...) once to do a whole
>> file is this requires an iterator based approach. For example,
>> using a generator expression and a function acting on a single
>> record:
>>
>> def modify_record(record):
>> #Do something sensible to the headers here:
>> record.id = "modified"
>> return record
>> #This is a generator expression:
>> modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq"))
>> count = SeqIO.write(modified, "newsolid_1.fastq", "fastq")
>> print "Modified %i records" % count
>>
>> Equivalently using a generator function which does the
>> looping itself:
>>
>> def modify_records(records):
>> for record in records:
>> #Do something sensible to the headers here:
>> record.id = "modified"
>> yield record
>> count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq",
>> "fastq")), "newsolid_1.fastq", "fastq")
>> print "Modified %i records" % count
>
>
> The generator function is nice, too. I presume this only works because
> SeqIO.write knows how to write from an iterator?
>
> Lenna
Bio.SeqIO.write is *designed* to take a Python iterator of SeqRecord
objects. That can be a generator function, generator expression, a
custom class which supports iteration, or even a simple list or tuple
of SeqRecord objects all in memory.
As a special case connivence it also accepts a single SeqRecord.
Peter
More information about the Biopython
mailing list