[Bioperl-l] sub sampling

Mon Jun 15 20:07:21 EDT 2009

Shalabh
If you want to do sampling with replacement
this is not bad (if you trust rand() ):

 # open your file into $my_infile, then
 @lines = <$my_infile>;

 my $num_samps = 10;
 my $sample_size_pc = 0.25;
 my @samples;

 for (1..$num_samps) {
    push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc * 
@lines) ) ];
 }

# now, do something, fr'instance
 my @sample_pc;
 foreach (@samples) {
    my $pct=0;
    foreach my $line (@lines[ @$_ ]) {
        @a = split(/\s+/,$line);
        $pct += $a[2];
    }
    $pct /= @$_;
    push @sample_pc, $pct;
 }

R's just better for some things, ain't it?
MAJ

----- Original Message ----- 
From: "shalabh sharma" <shalabh.sharma7 at gmail.com>
To: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 4:06 PM
Subject: [Bioperl-l] sub sampling

> Hi All,           I was just wondering that is there any module is bioperl
> that do subsampling?
> I have a file like this:
>
> 369859  0477    93
> 163417  1348    92
> 228122  0176    88
> 232792  0050    93
> 239636  1850    95
> 300069  0048    96
> 244108  0046    91
> 199087  0055    93
> 206209  0048    96
> -              -         -
> -              -         -
>
> which contain around 100,000 lines and i want to take out a sample of 25%
> from this file. Is there any way i can do this in Bioperl?
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>