[Bioperl-l] Classifying SNPs

Abhishek Pratap abhishek.vit at gmail.com
Mon Jul 13 15:10:04 UTC 2009


Dear Mark
Sorry I was not able to reply earlier. Many Thanks for your detailed
explanation. However this is not exactly what I am looking for. May be my
initial mail was not well articulated or I am not able to infer your reply
fully. My bad.

Well as an input what we have is the just the genomic coordinates for SNP's
predicted by Illumina propriety software CASAVA. What we would like to do is
to further classify these predicted SNP's  . If they fall into Coding region
then whether they are synonymous/non-syn SNPs.

So I guess something which translates
1. SNP genomic coordinate into mRNA offset
2. Then identify the ORF and target codon and check whether the SNP
substitution will be syn/non-syn.

Thanks,
-Abhi

On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Hey Abhishek-
> You might root around in Bio::PopGen. Here's a script to get stuff from
> raw fasta data--see comments within.
> cheers
> Mark
>
> use Bio::AlignIO;
> use Bio::PopGen::Utilities;
>
> $file = "your_raw_file.fas";
>
>
> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
> # get the alignment into a Bio::PopGen::Population format, with codons
> # as the marker sites
> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
> -site_model=>'cod');
> # here are your variable codons...
> my @cdnpos = $pop->get_marker_names;
> # here are your individuals represented in the alignment
> my @inds = $pop->get_Individuals;
> # which have names like "Codon-3-9", "Codon-4-12", etc
> foreach my $cdn (@cdnpos) {
>   # calculate the unique codons represented at this codon position
>   my (%ucdns, @ucdns);
>   @genos = $pop->get_Genotypes(-marker=>$cdn);
>   $ucdns{$_->get_Alleles}++ for @genos;
>   @ucdns = sort keys %ucdns;
>   #
>   # here, use translate or something faster to identify syn/non-syn
>   # check out code in Bio::Align::DNAStatistics for various methods
>
> }
> # relate back to individuals with this
> foreach my $ind (@inds) {
>   print "Individual ".$ind->unique_id."\n";
>   print "Site\tAllele\n";
>   foreach my $cdn (@cdnpos) {
> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>   }
> }
>
>
> 1;
>
> ----- Original Message ----- From: "Abhishek Pratap" <
> abhishek.vit at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, July 08, 2009 10:24 AM
> Subject: [Bioperl-l] Classifying SNPs
>
>
>
> Hi All
>
> This might seem to be an old track question. However I was not able to
> find a good answer in the many diff mailing list archives.
>
> For all our SNP predictions we would like to know whether they are
> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
> position on the gene where amino acid is getting changed and to what
> ...Also info about indels will help.
>
> I am not sure if something like this already exists. If not even some
> pointers on how to move forward will help.
>
> Thanks,
> -Abhi
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>



More information about the Bioperl-l mailing list