[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp

Peter biopython at maubp.freeserve.co.uk
Wed Sep 16 10:44:09 UTC 2009


On Tue, Sep 15, 2009 at 5:43 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> Rather than using itertools, you could also write a simple generator
> function to do the pairing explicitly. Assuming you are dealing with
> paired end reads, it would make sense to explicitly check the IDs
> match up as expected.

I confess I didn't actually test that example (I don't have Python 2.6
on this machine), and I had miss read the itertools.izip_longest
documentation - that won't actually work as is. Sorry :(

Instead, here is a simple interleaving using a generator function,
which I *have* tested on Python 2.5,

from Bio import SeqIO
#Setup variables (could parse command line args here)
fileA = "SRR001666_1.fastq"
fileB = "SRR001666_2.fastq
fileOut = "SRR001666_interleaved.fastq"
format = "fastq"
#Setup the input
def interleave(iter1, iter2) :
    while True :
        yield iter1.next()
        yield iter2.next()
recordsA = SeqIO.parse(open(fileA,"rU"), format)
recordsB = SeqIO.parse(open(fileB,"rU"), format)
records = interleave(recordsA, recordsB)
#Now the output
handle = open(fileOut, "w")
count = SeqIO.write(records, handle, format)
handle.close()
print "%i records written to %s" % (count, fileOut)

Note that this does not check the number of records in the two
files matches, nor does it do any explicit test on the record ids.

Peter



More information about the Biopython mailing list