[Bioperl-l] Some questions about the Bio::PopGen

Mon Apr 11 23:07:27 EDT 2011

Hi
Recently, I am learning how to caculate pi, Fst, Tajima D using Bio::PopGen.
I am not familiar with Perl and I am really confused with the following
problems.
(1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used
to caculate is this:
    __DATA__
01 A01 A
01 A02 A
01 A03 A
01 A04 A
01 A05 A
02 A01 A
02 A02 T
02 A03 T
02 A04 T
02 A05 T
03 A01 G
03 A02 G
03 A03 G
03 A04 G
03 A05 G
04 A01 G
04 A02 G
04 A03 C
04 A04 C
04 A05 G
05 A01 T
05 A02 C
05 A03 T
05 A04 T
05 A05 T
And I am not sure if I can use these sequences below to demostrate the
prettybase format above:
>A01
AAGGT
>A02
ATGGC
>A03
ATGCT
>A04
ATGCT
>A05
ATGGT
The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I
use DnaSP. I find that if the 1.4/5=0.28, which means that if the number
from Bio::PopGen::Statistics is divided by the individula number, the result
would be exactly the same. Is there something wrong in my perl script? The
code I used was below:
#/usr/bin/perl -w
use warnings;
use strict;
use Bio::PopGen::Genotype;
 my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'gene_1',
                                           -individual_id => '001',
                                           -alleles       => ['1','5'] );
use Bio::PopGen::Individual;
 my $ind = Bio::PopGen::Individual->new(-unique_id  => '001',
                                        -genotypes  => [$genotype] );
$ind->add_Genotype(
   Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
                              -marker_name => 'gene_1')
 );
 $ind->add_Genotype(
   Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
                              -marker_name => 'gene_1')
 );
 $ind->add_Genotype(
   Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
                              -marker_name => 'gene_1')
 );
 $ind->add_Genotype(
   Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
                              -marker_name => 'gene_1')
 );
 use Bio::PopGen::Population;
 my $pop = Bio::PopGen::Population->new(-name        => 'Bm',
                                        -description => 'description',
                                        -individuals => [$ind] );
use Bio::PopGen::IO;
use Bio::PopGen::Statistics;
my $nummarkers = $pop->get_marker_names;
my $stats = Bio::PopGen::Statistics->new();
my $io = Bio::PopGen::IO->new (-format => 'prettybase',
                               -file => '1.txt');
if( my $pop = $io->next_population ) {
    my $pi = $stats->pi($pop, $nummarkers);
    print "pi is $pi\n";
my @inds;
    for my $ind ( $pop->get_Individuals ) {
        if( $ind->unique_id =~ /A0[1-3]/ ) {
            push @inds, $ind;
        }
    }
    print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n";
}

(2) I want to use Bio::PopGen::Utilities to translate the alignment file to
the population file. However, I can not find the result file after the
program. I use the following code:
use Bio::PopGen::Utilities;
  use Bio::AlignIO;

  my $in = Bio::AlignIO->new(-file   => 't/data/t7.aln',
                            -format => 'clustalw');
my $aln = $in->next_aln;
my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln);
my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model => 'cod',
                                                         -alignment  =>
$aln);
I am not sure where I should add my result file' name in the code.
(3) If my file contains a lot of individual sequences and one individual has
one genotype. I'd like to know how can I use the  Bio::PopGen::Individual,
Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which
can used in Bio::PopGen::Statistics ?

I will be great appreciated if I can get the answers. Thanks for your time
and Best Wishes!
                                                   Qian