[Bioperl-l] Select random sequences from a fasta file

Laurent MANCHON lmanchon at univ-montp2.fr
Thu Mar 22 09:03:39 UTC 2012


Le 21/03/2012 20:42, shalabh sharma a écrit :
> Hi All,
>            Is there a way to select random sequences from a multi fasta
> file. I am using some method (not that sophisticated).
> Is there any module in bioperl that can do that?
>
> I have a fasta file containing around 10 million reads, and i want to get
> few thousand sequences out of it (randomly selected).
>
> Thanks
> Shalabh
>

--Hello,

i have a piece of code to randomly pick up lines from a file,
maybe you can adapt this code to your problem:

#!/usr/bin/perl
# pick random lines from a file

use strict;
use warnings;

use List::Util qw(shuffle);

my $GET_LINES = 10000;

my @line_starts;
open( my $fh, '<', 'big_text_file.txt' )
     or die "Oh, fudge: $!\n";

do {
     push @line_starts, tell $fh
} while ( <$fh> );

my $count = @line_starts;
print "Got $count lines\n";

my @shuffled_starts = (shuffle @line_starts)[0..$GET_LINES-1];

for my $start ( @shuffled_starts ) {

     seek $fh, $start, 0
         or die "Unable to seek to line - $!\n";

     print scalar <$fh>;
}

Regards,
Laurent --




More information about the Bioperl-l mailing list