[Bioperl-l] randomizing fastq sequences

Roy Chaudhuri roy.chaudhuri at gmail.com
Tue Feb 8 07:09:39 EST 2011


Sorry, I should have included that caveat.

On 08/02/2011 11:57, Frank Schwach wrote:
> nice one - but if I understand it correctly it relies on there being
> exactly 4 lines for each record. This is probably the case but it would
> be a good idea to double-check the fastq file in question, just to make
> sure.
>
> Frank
>
>
> Roy Chaudhuri wrote:
>> TMTOWTDI, maybe also use the Tie::File module?
>>
>> Something like:
>>
>> #!/usr/bin/perl
>> use warnings FATAL=>qw(all);
>> use Modern::Perl;
>> use Tie::File;
>> use Fcntl qw(O_RDONLY);
>> use List::Util qw(shuffle);
>> my @fastq;
>> tie @fastq, 'Tie::File', $ARGV[0], mode=>O_RDONLY or die $!;
>> say join "\n", @fastq[4*$_..4*$_+3] for shuffle 0..$#fastq/4;
>>
>> Cheers,
>> Roy.
>>
>> On 08/02/2011 09:08, Frank Schwach wrote:
>>> If memory is an issue then I guess you could create a file of just the
>>> sequence IDs (one per line), then shuffle those (using List::Util like
>>> Simon demonstrated). In the end you would substitute the IDs for the
>>> whole fastq entry again, which you can do without reading an entire file
>>> into memory (might be bit slow but that probably doesn't
>>> matter)
>>> Frank
>>>
>>>
>>> simon andrews (BI) wrote:
>>>> On 7 Feb 2011, at 22:07, shalu sharma wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>     i am trying to test one program for which i need to change order of
>>>>> sequences in a fastq file.
>>>>> My fastq file contains about 50,000 sequences.
>>>>> Is there any script that can do this task?
>>>>>
>>>>
>>>> Since FastQ is supported in SeqIO you could do something like
>>>> (untested):
>>>>
>>>> #!/usr/bin/perl
>>>> use warnings;
>>>> use strict;
>>>> use List::Util 'shuffle';
>>>> use Bio::SeqIO;
>>>>
>>>> my @seqs;
>>>>
>>>> my $in = Bio::SeqIO->new(-file =>   'your_intput.fastq',
>>>>               -format =>   'Fastq');
>>>>
>>>> while (my $seq = $in ->   next_seq()) {
>>>>       push @seqs,$seq;
>>>> }
>>>>
>>>> @seqs = shuffle(@seqs);
>>>>
>>>> my $out = Bio::SeqIO->new(-file =>   '>your_output.fastq',
>>>>                -format =>   'Fastq');
>>>>
>>>> foreach my $seq (@seqs) {
>>>>       $out->write_seq($seq);
>>>> }
>>>>
>>>> ## End
>>>>
>>>> This has the disadvantage that it will hold all of the sequences in
>>>> memory whilst shuffling, but I don't think there's an easy way
>>>> around that.
>>>>
>>>> Simon.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>




More information about the Bioperl-l mailing list