[Bioperl-l] randomizing fastq sequences

Roy Chaudhuri roy.chaudhuri at gmail.com
Tue Feb 8 06:31:15 EST 2011


TMTOWTDI, maybe also use the Tie::File module?

Something like:

#!/usr/bin/perl
use warnings FATAL=>qw(all);
use Modern::Perl;
use Tie::File;
use Fcntl qw(O_RDONLY);
use List::Util qw(shuffle);
my @fastq;
tie @fastq, 'Tie::File', $ARGV[0], mode=>O_RDONLY or die $!;
say join "\n", @fastq[4*$_..4*$_+3] for shuffle 0..$#fastq/4;

Cheers,
Roy.

On 08/02/2011 09:08, Frank Schwach wrote:
> If memory is an issue then I guess you could create a file of just the
> sequence IDs (one per line), then shuffle those (using List::Util like
> Simon demonstrated). In the end you would substitute the IDs for the
> whole fastq entry again, which you can do without reading an entire file
> into memory (might be bit slow but that probably doesn't
> matter)
> Frank
>
>
> simon andrews (BI) wrote:
>> On 7 Feb 2011, at 22:07, shalu sharma wrote:
>>
>>
>>> Hi,
>>>    i am trying to test one program for which i need to change order of
>>> sequences in a fastq file.
>>> My fastq file contains about 50,000 sequences.
>>> Is there any script that can do this task?
>>>
>>
>> Since FastQ is supported in SeqIO you could do something like (untested):
>>
>> #!/usr/bin/perl
>> use warnings;
>> use strict;
>> use List::Util 'shuffle';
>> use Bio::SeqIO;
>>
>> my @seqs;
>>
>> my $in = Bio::SeqIO->new(-file =>  'your_intput.fastq',
>> 			 -format =>  'Fastq');
>>
>> while (my $seq = $in ->  next_seq()) {
>>      push @seqs,$seq;
>> }
>>
>> @seqs = shuffle(@seqs);
>>
>> my $out = Bio::SeqIO->new(-file =>  '>your_output.fastq',
>> 			  -format =>  'Fastq');
>>
>> foreach my $seq (@seqs) {
>>      $out->write_seq($seq);
>> }
>>
>> ## End
>>
>> This has the disadvantage that it will hold all of the sequences in memory whilst shuffling, but I don't think there's an easy way around that.
>>
>> Simon.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>



More information about the Bioperl-l mailing list