[Bioperl-l] Classifying SNPs

Abhishek Pratap abhishek.vit at gmail.com
Mon Jul 13 17:13:08 UTC 2009


Hi Jason

Thanks for a detailed insight. I would definitely go the ensembl way first
and try to see if it can do exactly what we want.

 In case it does/'nt I will report back on this same thread. I think having
something like this in the Bioperl will help the NGS community. Lot of
people are predicting SNPs from NGS.oops(next generation sequencing ) data
and looking for ways to better annotate/classify their predictions.

Thanks guys .. It is a pleasure to interact with you all. Just overwhelmed
to see the responses.

best,
-Abhi



On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason at bioperl.org> wrote:

> Ensembl would be best place to go if you are working with human SNPs but
> for those who aren't so data lucky...
>
> Aspects of this also relates to the dn/dS code in the
> Bio::Align::DNAStatistics -- thought it does the classification and
> comparison all at once so you'd have to dig code out.
>
> And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a
> synonymous or nonsynonymous via lookup table that is stored in
> Bio::MolEvol::CodonModel which compares the edit path which is encoded as
> the two codons concatenated together -- i.ee
>
> use Bio::MolEvol::CodonModel;
>  my $codon_path = Bio::MolEvol::CodonModel->codon_path;
>  my ($ns, $syn) = $codon_path->{'AATAAC'};
>  print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";
>
>
> It all kind of depends on how you have the data organized, if it is just
> SNPs and you are trying to figure out if they are syn or non-syn then you
> kind of need a good database to do this since you'll have to know what gene
> they are in, CDS of the gene, etc.  It is possible to do with something as
> basic as GFF3 for your genome and the SNP locations and
> Bio::DB::SeqFeature::Store.  While I can think of a way to code it up from
> those bare-bones - maybe you should report back if you can just use the
> Ensembl classification of the SNPs?
>
> -jason
>
>
> On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:
>
>  Bio::Coordinate might help with coordinate conversion.  However, much of
>> this sounds very Ensembl-like.  Have you looked at the Ensembl perl API?  It
>> can do #1 (coordinate conversion), and I'm sure something could be written
>> up to do the second.
>>
>> chris
>>
>> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
>>
>>  Thanks Abhi-- I had a feeling there was more (or "less") to it-- this
>>> would be a nice feature to have, don't think it exists. Will think about
>>> it-- cheers
>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>> abhishek.vit at gmail.com>
>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>> Cc: <bioperl-l at lists.open-bio.org>
>>> Sent: Monday, July 13, 2009 11:10 AM
>>> Subject: Re: [Bioperl-l] Classifying SNPs
>>>
>>>
>>>  Dear Mark
>>>> Sorry I was not able to reply earlier. Many Thanks for your detailed
>>>> explanation. However this is not exactly what I am looking for. May be
>>>> my
>>>> initial mail was not well articulated or I am not able to infer your
>>>> reply
>>>> fully. My bad.
>>>>
>>>> Well as an input what we have is the just the genomic coordinates for
>>>> SNP's
>>>> predicted by Illumina propriety software CASAVA. What we would like to
>>>> do is
>>>> to further classify these predicted SNP's  . If they fall into Coding
>>>> region
>>>> then whether they are synonymous/non-syn SNPs.
>>>>
>>>> So I guess something which translates
>>>> 1. SNP genomic coordinate into mRNA offset
>>>> 2. Then identify the ORF and target codon and check whether the SNP
>>>> substitution will be syn/non-syn.
>>>>
>>>> Thanks,
>>>> -Abhi
>>>>
>>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj at fortinbras.us>
>>>> wrote:
>>>>
>>>>  Hey Abhishek-
>>>>> You might root around in Bio::PopGen. Here's a script to get stuff from
>>>>> raw fasta data--see comments within.
>>>>> cheers
>>>>> Mark
>>>>>
>>>>> use Bio::AlignIO;
>>>>> use Bio::PopGen::Utilities;
>>>>>
>>>>> $file = "your_raw_file.fas";
>>>>>
>>>>>
>>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
>>>>> # get the alignment into a Bio::PopGen::Population format, with codons
>>>>> # as the marker sites
>>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
>>>>> -site_model=>'cod');
>>>>> # here are your variable codons...
>>>>> my @cdnpos = $pop->get_marker_names;
>>>>> # here are your individuals represented in the alignment
>>>>> my @inds = $pop->get_Individuals;
>>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>>>> foreach my $cdn (@cdnpos) {
>>>>> # calculate the unique codons represented at this codon position
>>>>> my (%ucdns, @ucdns);
>>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>>> $ucdns{$_->get_Alleles}++ for @genos;
>>>>> @ucdns = sort keys %ucdns;
>>>>> #
>>>>> # here, use translate or something faster to identify syn/non-syn
>>>>> # check out code in Bio::Align::DNAStatistics for various methods
>>>>>
>>>>> }
>>>>> # relate back to individuals with this
>>>>> foreach my $ind (@inds) {
>>>>> print "Individual ".$ind->unique_id."\n";
>>>>> print "Site\tAllele\n";
>>>>> foreach my $cdn (@cdnpos) {
>>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> 1;
>>>>>
>>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>>> abhishek.vit at gmail.com>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>>>> Subject: [Bioperl-l] Classifying SNPs
>>>>>
>>>>>
>>>>>
>>>>> Hi All
>>>>>
>>>>> This might seem to be an old track question. However I was not able to
>>>>> find a good answer in the many diff mailing list archives.
>>>>>
>>>>> For all our SNP predictions we would like to know whether they are
>>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>>>>> position on the gene where amino acid is getting changed and to what
>>>>> ...Also info about indels will help.
>>>>>
>>>>> I am not sure if something like this already exists. If not even some
>>>>> pointers on how to move forward will help.
>>>>>
>>>>> Thanks,
>>>>> -Abhi
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Jason Stajich
> jason at bioperl.org
>
>



More information about the Bioperl-l mailing list