[Biopython] replace header

Lenna Peterson arklenna at gmail.com
Thu May 31 17:55:40 UTC 2012


On Wed, May 30, 2012 at 4:56 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Hi Dilara & Lenna,
>
> I would use append mode with caution - it will have side effects
> like if you run this script twice, the output file will double in size
> (the first run plus the second run). Wouldn't opening in write
> mode work just as well here?
> i.e. Open the handle, do the loop, close the handle.
>


Hi Peter,

Thanks for the warning. Python, making me adjust my thought patterns
every day. I'm used to shell, > vs >> for cat etc. I had never tried
multiple writes to a single open file. But the behavior is logical.


>
> The key point about using SeqIO.write(...) once to do a whole
> file is this requires an iterator based approach. For example,
> using a generator expression and a function acting on a single
> record:
>
> def modify_record(record):
>    #Do something sensible to the headers here:
>    record.id = "modified"
>    return record
> #This is a generator expression:
> modified = (modify_record(r) for r in SeqIO.parse("solid_1.fastq", "fastq"))
> count = SeqIO.write(modified, "newsolid_1.fastq", "fastq")
> print "Modified %i records" % count
>
> Equivalently using a generator function which does the
> looping itself:
>
> def modify_records(records):
>    for record in records:
>        #Do something sensible to the headers here:
>        record.id = "modified"
>        yield record
> count = SeqIO.write(modify_records(SeqIO.parse("solid_1.fastq",
> "fastq")), "newsolid_1.fastq", "fastq")
> print "Modified %i records" % count


The generator function is nice, too. I presume this only works because
SeqIO.write knows how to write from an iterator?

Lenna




More information about the Biopython mailing list