[Biopython] reading two fastq files at the same time

Peter biopython at maubp.freeserve.co.uk
Mon Sep 6 13:51:13 UTC 2010


On Mon, Sep 6, 2010 at 2:24 PM, xyz <mitlox at op.pl> wrote:
> Hi,
> How is it possible to read two fastq files at the same time in BioPython? I
> have the following BioRuby example:
>
> require 'bio'
>
> begin
>  fq1 = Bio::FlatFile.open(Bio::Fastq, ARGV[2])
>  fq2 = Bio::FlatFile.open(Bio::Fastq, ARGV[3])
>
>  while (entry1 = fq1.next_entry) and (entry2 = fq2.next_entry)
>
>    fastq_A1 = entry1.entry_id
>    fastq_A2 = entry1.seq
>
>    fastq_B1 = entry2.entry_id
>    fastq_B2 = entry2.seq
>  end
>
> rescue => err
>  raise "Exception: #{err}"
> end
>
> Thank you in advance.

Hi,

If you are using Python 2.6+ then probably itertools.izip_longest
would do what you want. You could use itertools.izip but this
won't catch the error condition when one file has more records
than the other.

Alternatively you could use something like this,

from Bio import SeqIO
iter1 = SeqIO.parse(filename1, "fastq")
iter2 = SeqIO.parse(filename1, "fastq")
while True:
    try:
        rec1 = iter1.next()
    except StopIteration:
        rec1 = None
    try:
        rec2 = iter2.next()
    except StopIteration:
        rec2 = None
    if rec1 is None and rec2 is None:
        break #end of both files
    elif rec1 is None or rec2 is None:
        raise ValueError("Diff record count")
    else:
        print rec1.seq, rec1.id
        print rec2.seq, rec2.id

I haven't tested that but it is based on a similar example in
Bio.SeqIO.QualityIO.PairedFastaQualIterator for a paired
FASTQ and QUAL file.

Peter




More information about the Biopython mailing list