[Biopython] changing record attributes while iterating

Peter Cock p.j.a.cock at googlemail.com
Tue Oct 4 08:24:08 UTC 2011


On Tue, Oct 4, 2011 at 9:05 AM, Bala subramanian
<bala.biophysics at gmail.com> wrote:
> Friends,
> I have a fasta file. I need to modify the record id by adding a suffix to
> it. So i used SeqRecord (the code attached below). It is working fine but i
> would like to know if there is any simple way to do that. ie. if i can
> change the record attributes while iterating through the fasta with
> SeqIO.parse itself. I tried something like following but i couldnt get what
> i wanted.
>
> new_list=[]
> for record in SeqIO.parse(open(argv[1], "rU"), "fasta"):
>                    record.id=record.id + '_suffix'
>                    new_list.append(record)

The above looks fine, although depending on the rest of your script
a big list might be a bad idea (too much memory) and an iterator
based approach may be preferable. If as in the rest of your example
you just need to do this for output, perhaps:

#!/usr/bin/env python
from Bio import SeqIO
from sys import argv

def rename(record):
    """Modified record in place AND returns it."""
    record.id +=  '_suffix'
    return record

#This is a generator expression:
records = (rename(r) for r in SeqIO.parse(argv[1], "fasta"))

output_filename = raw_input('Enter the output file:')
SeqIO.write(records, output_filename, "fasta")

The alternative you showed was wasteful, creating lots of new
objects to no benefit.

Peter




More information about the Biopython mailing list