[Bioperl-l] sub sampling
Mark A. Jensen
maj at fortinbras.us
Mon Jun 15 20:07:21 EDT 2009
Shalabh
If you want to do sampling with replacement
this is not bad (if you trust rand() ):
# open your file into $my_infile, then
@lines = <$my_infile>;
my $num_samps = 10;
my $sample_size_pc = 0.25;
my @samples;
for (1..$num_samps) {
push @samples = [map { int( @lines * rand ) } ( 0..int($sample_size_pc *
@lines) ) ];
}
# now, do something, fr'instance
my @sample_pc;
foreach (@samples) {
my $pct=0;
foreach my $line (@lines[ @$_ ]) {
@a = split(/\s+/,$line);
$pct += $a[2];
}
$pct /= @$_;
push @sample_pc, $pct;
}
R's just better for some things, ain't it?
MAJ
----- Original Message -----
From: "shalabh sharma" <shalabh.sharma7 at gmail.com>
To: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Monday, June 15, 2009 4:06 PM
Subject: [Bioperl-l] sub sampling
> Hi All, I was just wondering that is there any module is bioperl
> that do subsampling?
> I have a file like this:
>
> 369859 0477 93
> 163417 1348 92
> 228122 0176 88
> 232792 0050 93
> 239636 1850 95
> 300069 0048 96
> 244108 0046 91
> 199087 0055 93
> 206209 0048 96
> - - -
> - - -
>
> which contain around 100,000 lines and i want to take out a sample of 25%
> from this file. Is there any way i can do this in Bioperl?
>
> Thanks
> Shalabh
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list