[Bioperl-l] Some questions about the Bio::PopGen

Tue Nov 8 17:29:54 EST 2011

I think the pi calculated in the function isn't really the pi as defined. You
need to divide the value by total number of sites (in your case, it's 5,
which is not your individual number but sequence length). I think the reason
they implemented this way is that sometimes it's easier to work only with
variable sites. 

The aln to population function converts an aln object to a population
object. You can't really see the object unless you write additional codes to
write it out or do some calculations on it. 

The third question depends on your specific needs. For population level
analyses of molecular evolution, I usually create a multiple sequence
alignment with other applications (clustalw etc), then manually adjust the
alignments to make sure they represent homology. I wouldn't touch the
alignment once this is done but only make an aln (or whatever format you
want) for inputting to analyses applications, like Bio::PopGen (usually use
the aln_to_population function you're using now).

Qian Zhao wrote:
> 
> Hi
> Recently, I am learning how to caculate pi, Fst, Tajima D using
> Bio::PopGen.
> I am not familiar with Perl and I am really confused with the following
> problems.
> (1) I use the Bio::PopGen::Statistics to caculate pi. The sequences I used
> to caculate is this:
>     __DATA__
> 01 A01 A
> 01 A02 A
> 01 A03 A
> 01 A04 A
> 01 A05 A
> 02 A01 A
> 02 A02 T
> 02 A03 T
> 02 A04 T
> 02 A05 T
> 03 A01 G
> 03 A02 G
> 03 A03 G
> 03 A04 G
> 03 A05 G
> 04 A01 G
> 04 A02 G
> 04 A03 C
> 04 A04 C
> 04 A05 G
> 05 A01 T
> 05 A02 C
> 05 A03 T
> 05 A04 T
> 05 A05 T
> And I am not sure if I can use these sequences below to demostrate the
> prettybase format above:
>>A01
> AAGGT
>>A02
> ATGGC
>>A03
> ATGCT
>>A04
> ATGCT
>>A05
> ATGGT
> The pi is 1.4 using Bio::PopGen::Statistics. However, the pi is 0.28 if I
> use DnaSP. I find that if the 1.4/5=0.28, which means that if the number
> from Bio::PopGen::Statistics is divided by the individula number, the
> result
> would be exactly the same. Is there something wrong in my perl script? The
> code I used was below:
> #/usr/bin/perl -w
> use warnings;
> use strict;
> use Bio::PopGen::Genotype;
>  my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'gene_1',
>                                            -individual_id => '001',
>                                            -alleles       => ['1','5'] );
> use Bio::PopGen::Individual;
>  my $ind = Bio::PopGen::Individual->new(-unique_id  => '001',
>                                         -genotypes  => [$genotype] );
> $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  $ind->add_Genotype(
>    Bio::PopGen::Genotype->new(-alleles     => ['1', '5'],
>                               -marker_name => 'gene_1')
>  );
>  use Bio::PopGen::Population;
>  my $pop = Bio::PopGen::Population->new(-name        => 'Bm',
>                                         -description => 'description',
>                                         -individuals => [$ind] );
> use Bio::PopGen::IO;
> use Bio::PopGen::Statistics;
> my $nummarkers = $pop->get_marker_names;
> my $stats = Bio::PopGen::Statistics->new();
> my $io = Bio::PopGen::IO->new (-format => 'prettybase',
>                                -file => '1.txt');
> if( my $pop = $io->next_population ) {
>     my $pi = $stats->pi($pop, $nummarkers);
>     print "pi is $pi\n";
> my @inds;
>     for my $ind ( $pop->get_Individuals ) {
>         if( $ind->unique_id =~ /A0[1-3]/ ) {
>             push @inds, $ind;
>         }
>     }
>     print "pi for inds 1,2,3 is ", $stats->pi(\@inds),"\n";
> }
> 
> (2) I want to use Bio::PopGen::Utilities to translate the alignment file
> to
> the population file. However, I can not find the result file after the
> program. I use the following code:
> use Bio::PopGen::Utilities;
>   use Bio::AlignIO;
> 
>   my $in = Bio::AlignIO->new(-file   => 't/data/t7.aln',
>                             -format => 'clustalw');
> my $aln = $in->next_aln;
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment => $aln);
> my $synpop = Bio::PopGen::Utilities->aln_to_population(-site_model =>
> 'cod',
>                                                          -alignment  =>
> $aln);
> I am not sure where I should add my result file' name in the code.
> (3) If my file contains a lot of individual sequences and one individual
> has
> one genotype. I'd like to know how can I use the  Bio::PopGen::Individual,
> Bio::PopGen::Population and Bio::PopGen::Genotype to create the file which
> can used in Bio::PopGen::Statistics ?
> 
> I will be great appreciated if I can get the answers. Thanks for your time
> and Best Wishes!
>                                                    Qian
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://old.nabble.com/Some-questions-about-the-Bio%3A%3APopGen-tp31378987p32805996.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.