[Bioperl-l] randomizing fastq sequences

simon andrews (BI) simon.andrews at bbsrc.ac.uk
Tue Feb 8 08:41:10 UTC 2011


On 7 Feb 2011, at 22:07, shalu sharma wrote:

> Hi,
>   i am trying to test one program for which i need to change order of
> sequences in a fastq file.
> My fastq file contains about 50,000 sequences.
> Is there any script that can do this task?

Since FastQ is supported in SeqIO you could do something like (untested):

#!/usr/bin/perl
use warnings;
use strict;
use List::Util 'shuffle';
use Bio::SeqIO;

my @seqs;

my $in = Bio::SeqIO->new(-file => 'your_intput.fastq',
			 -format => 'Fastq');

while (my $seq = $in -> next_seq()) {
    push @seqs,$seq;
}

@seqs = shuffle(@seqs);

my $out = Bio::SeqIO->new(-file => '>your_output.fastq',
			  -format => 'Fastq');

foreach my $seq (@seqs) {
    $out->write_seq($seq);
}

## End

This has the disadvantage that it will hold all of the sequences in memory whilst shuffling, but I don't think there's an easy way around that.

Simon.



More information about the Bioperl-l mailing list