[Bioperl-l] randomizing fastq sequences
Frank Schwach
fs5 at sanger.ac.uk
Tue Feb 8 04:08:37 EST 2011
If memory is an issue then I guess you could create a file of just the
sequence IDs (one per line), then shuffle those (using List::Util like
Simon demonstrated). In the end you would substitute the IDs for the
whole fastq entry again, which you can do without reading an entire file
into memory (might be bit slow but that probably doesn't
matter)
Frank
simon andrews (BI) wrote:
> On 7 Feb 2011, at 22:07, shalu sharma wrote:
>
>
>> Hi,
>> i am trying to test one program for which i need to change order of
>> sequences in a fastq file.
>> My fastq file contains about 50,000 sequences.
>> Is there any script that can do this task?
>>
>
> Since FastQ is supported in SeqIO you could do something like (untested):
>
> #!/usr/bin/perl
> use warnings;
> use strict;
> use List::Util 'shuffle';
> use Bio::SeqIO;
>
> my @seqs;
>
> my $in = Bio::SeqIO->new(-file => 'your_intput.fastq',
> -format => 'Fastq');
>
> while (my $seq = $in -> next_seq()) {
> push @seqs,$seq;
> }
>
> @seqs = shuffle(@seqs);
>
> my $out = Bio::SeqIO->new(-file => '>your_output.fastq',
> -format => 'Fastq');
>
> foreach my $seq (@seqs) {
> $out->write_seq($seq);
> }
>
> ## End
>
> This has the disadvantage that it will hold all of the sequences in memory whilst shuffling, but I don't think there's an easy way around that.
>
> Simon.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list