[Bioperl-l] PopGen

Wed Jan 3 10:24:34 UTC 2007

To add a bit more info. Using the example.hap file in the t/data dir
of bioperl, you can see that the alleles correspond to the
nucleotides, and the marker name corresponds to the dbSNP rs id (I
guess in your case it can be something that relates to the coords of
the sequence):

#!/usr/local/bin/perl

use Bio::PopGen::IO;
my $io = new Bio::PopGen::IO(-format => 'hapmap',
                             -file   => '../../t/data/example.hap');

# Some IO might support reading in a population at a time

my @population;
while ( my $ind = $io->next_individual ) {
    push @population, $ind;
}

foreach my $individual (@population) {
    my @genotypes = $individual->get_Genotypes;
    foreach my $genotype (@genotypes) {
        print "individual_id ", $genotype->individual_id ,"\n";
        print "alleles ", $genotype->get_Alleles ,"\n";
        print "marker_name ", $genotype->marker_name ,"\n";
    }
}

1;

On 1/3/07, Albert Vilella <avilella at gmail.com> wrote:
> Well, in that cases the alleles are numerical ids instead of
> nucleotides... but in your case you will have the nucleotide
> corresponding to the coordinate with polymorphism...
>
> On 1/3/07, Marian Thieme <marian.thieme at klinik.uni-regensburg.de> wrote:
> > Albert,
> >
> > thank you very much for this hint. I did completely overlook the PopGen
> > package. But at least one question remains, because I didnt fully
> > understand the allele attribute of the Bio::PopGen::Genotype object,
> > perhaps you can help me:
> >
> > in the HOWTO (http://www.bioperl.org/wiki/HOWTO:PopGen) there is a
> > Genotype created by:
> >
> > my $genotype = Bio::PopGen::Genotype->new(-marker_name   => 'D7S123',
> >                                            -individual_id => '1001',
> >                                            -alleles       =>
> > ['104','107'] );
> >
> > Can you explain me what the numbers mean (-alleles=> ['104','107'] );) ?
> > I would expect that an allele is specified by a position AND the bases
> > which are different to the bases in the original (reference) sequence.
> >
> > Regards,
> > Marian
> >
> > Albert Vilella wrote:
> >
> > > The Bio::PopGen modules contain Individual, population and genotype
> > > objects, among other utilities. There are some input/output formats in
> > > Bio::PopGen::IO and also some methods to go from an aln to a
> > > population.
> > >
> > > That said, I am not entirely sure about how much of that overlaps with
> > > Bio::Variation.
> > >
> > > If you think anything missing that you would like to have implemented
> > > in bioperl, we would greatly appreciate your feedback,
> > >
> > > Cheers,
> > >
> > >    Albert.
> > >
> > > On 1/2/07, Marian Thieme <marian.thieme at klinik.uni-regensburg.de> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I am quite new to bioperl and I have a question about sequence data: I
> > >> am working on a resequencing project. Here we have resequenced 1000
> > >> genes of a certain gene. My question: What is easiest way to store each
> > >> discovered variation of each individual and get a fasta sequence for an
> > >> arbitrary individual.
> > >>
> > >> I would expect that there is some way to set up a reference sequence and
> > >> store all variationsm relative to this reference sequence. Afterward it
> > >> should be possible to genereate sequences for each indiviudal in
> > >> question, right ?
> > >>
> > >> My approach was the following:
> > >>
> > >> I have created an seqdiff object:
> > >>
> > >> $seqDiff = Bio::Variation::SeqDiff->new (...)
> > >>
> > >>
> > >> and I have assigned the reference sequence to that object via:
> > >>
> > >> $seqDiff->dna_ori('atgcgtatatg');
> > >>
> > >>
> > >> Now I thought, I can create some variations via DNAMutation object:
> > >>
> > >> $dnamut = Bio::Variation::DNAMutation->new (
> > >>   -start => 6,
> > >>   -end => 6,
> > >>   -length => 1,
> > >>   -isMutation => 1,
> > >>   -upStreamSeq => 'atgcg',
> > >>   -dnStreamSeq => 'atatg'
> > >> );
> > >>
> > >> $a1 = Bio::Variation::Allele->new;
> > >> $a1->seq('t');
> > >> $dnamut->allele_ori($a1);
> > >>
> > >> my $a2 = Bio::Variation::Allele->new;
> > >> $a2->seq('a');
> > >> $dnamut->add_Allele($a2);
> > >>
> > >>
> > >>
> > >> Is that the correct way to describe the reference sequence, describe a
> > >> variation and attach this to seqdiff object ?
> > >> Probably I didnt understand the api right. (I did assume start/end means
> > >> start/endposition of the mutation). Is it possible to get a complete
> > >> sequence print (fast format) of each variation/indiviudal ?
> > >>
> > >> Regards,
> > >> Marian
> > >>
> > >> --
> > >> Marian Thieme
> > >> University Regensburg
> > >> Institute of Functional Genomics
> > >> Josef-Engert-Str. 9
> > >> 93053
> > >> Regensburg
> > >> Germany
> > >> P: 0049 (0)941 943 5055
> > >> F: 0049 (0)941 943 5020
> > >> E: marian.thieme at klinik.uni-regensburg.de
> > >> W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> >
> >
> > --
> > Marian Thieme
> > University Regensburg
> > Institute of Functional Genomics
> > Josef-Engert-Str. 9
> > 93053
> > Regensburg
> > Germany
> > P: 0049 (0)941 943 5055
> > F: 0049 (0)941 943 5020
> > E: marian.thieme at klinik.uni-regensburg.de
> > W: http://www-cgi.uni-regensburg.de/Klinik/FunktionelleGenomik
> >
> >
>