[Bioperl-l] Classifying SNPs
Pablo Marin-Garcia
pg4 at sanger.ac.uk
Tue Jul 14 22:59:51 UTC 2009
Hello Abhishek
Ensembl has a module for calculate SNP consequences in a transcript.
The script that they use to create their consequences is located in:
ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl
The important bit is to convert your snp
coordenates and the variation_allele into a ConsequenceType object
$consequence_type =
Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles);
and pass this and a transcript to the type_variation
Bio::EnsEMBL::Utils::TranscriptAlleles exported method
$consequences = type_variation($tr, "", $consequence_type);
in the module
ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm
The other important bit in this script is that now the functional_genomics
consequences are calculated in this script instead in the type_variation()
The only drawback is that it return only the ensembl classes of
consequences , but you can extend that later if you need more specific
consequences (I have done that in the past for different projects).
This ensembl aproach will save you a lot of problems with the mapping from
gene to protein and with multiple snps in a codon.
If you have experience with ensembl then is easy to follow the code. If
not you can always ask for help in the ensembl-dev mailing list
(ensembl-dev at ebi.ac.uk)
If you want to read the code without checking out the whole api:
http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup
http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log
hope this helps
- Pablo
Abhishek Pratap <abhishek.vit <at> gmail.com> writes:
>
> Hi Jason
>
> Thanks for a detailed insight. I would definitely go the ensembl way
first
> and try to see if it can do exactly what we want.
>
> In case it does/'nt I will report back on this same thread. I think
having
> something like this in the Bioperl will help the NGS community. Lot of
> people are predicting SNPs from NGS.oops(next generation sequencing )
data
> and looking for ways to better annotate/classify their predictions.
>
> Thanks guys .. It is a pleasure to interact with you all. Just
overwhelmed
> to see the responses.
>
> best,
> -Abhi
>
> On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason <at> bioperl.org>
wrote:
>
> > Ensembl would be best place to go if you are working with human SNPs
but
> > for those who aren't so data lucky...
> >
> > Aspects of this also relates to the dn/dS code in the
> > Bio::Align::DNAStatistics -- thought it does the classification and
> > comparison all at once so you'd have to dig code out.
> >
> > And the mcdonald_kreitman code in Bio::PopGen::Statistics which
computes a
> > synonymous or nonsynonymous via lookup table that is stored in
> > Bio::MolEvol::CodonModel which compares the edit path which is encoded
as
> > the two codons concatenated together -- i.ee
> >
> > use Bio::MolEvol::CodonModel;
> > my $codon_path = Bio::MolEvol::CodonModel->codon_path;
> > my ($ns, $syn) = $codon_path->{'AATAAC'};
> > print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";
> >
> >
> > It all kind of depends on how you have the data organized, if it is
just
> > SNPs and you are trying to figure out if they are syn or non-syn then
you
> > kind of need a good database to do this since you'll have to know what
gene
> > they are in, CDS of the gene, etc. It is possible to do with
something as
> > basic as GFF3 for your genome and the SNP locations and
> > Bio::DB::SeqFeature::Store. While I can think of a way to code it up
from
> > those bare-bones - maybe you should report back if you can just use
the
> > Ensembl classification of the SNPs?
> >
> > -jason
> >
> >
> > On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:
> >
> > Bio::Coordinate might help with coordinate conversion. However, much
of
> >> this sounds very Ensembl-like. Have you looked at the Ensembl perl
API? It
> >> can do #1 (coordinate conversion), and I'm sure something could be
written
> >> up to do the second.
> >>
> >> chris
> >>
> >> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
> >>
> >> Thanks Abhi-- I had a feeling there was more (or "less") to it--
this
> >>> would be a nice feature to have, don't think it exists. Will think
about
> >>> it-- cheers
> >>> ----- Original Message ----- From: "Abhishek Pratap" <
> >>> abhishek.vit <at> gmail.com>
> >>> To: "Mark A. Jensen" <maj <at> fortinbras.us>
> >>> Cc: <bioperl-l <at> lists.open-bio.org>
> >>> Sent: Monday, July 13, 2009 11:10 AM
> >>> Subject: Re: [Bioperl-l] Classifying SNPs
> >>>
> >>>
> >>> Dear Mark
> >>>> Sorry I was not able to reply earlier. Many Thanks for your
detailed
> >>>> explanation. However this is not exactly what I am looking for. May
be
> >>>> my
> >>>> initial mail was not well articulated or I am not able to infer
your
> >>>> reply
> >>>> fully. My bad.
> >>>>
> >>>> Well as an input what we have is the just the genomic coordinates
for
> >>>> SNP's
> >>>> predicted by Illumina propriety software CASAVA. What we would like
to
> >>>> do is
> >>>> to further classify these predicted SNP's . If they fall into
Coding
> >>>> region
> >>>> then whether they are synonymous/non-syn SNPs.
> >>>>
> >>>> So I guess something which translates
> >>>> 1. SNP genomic coordinate into mRNA offset
> >>>> 2. Then identify the ORF and target codon and check whether the SNP
> >>>> substitution will be syn/non-syn.
> >>>>
> >>>> Thanks,
> >>>> -Abhi
> >>>>
> >>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj <at>
fortinbras.us>
> >>>> wrote:
> >>>>
> >>>> Hey Abhishek-
> >>>>> You might root around in Bio::PopGen. Here's a script to get stuff
from
> >>>>> raw fasta data--see comments within.
> >>>>> cheers
> >>>>> Mark
> >>>>>
> >>>>> use Bio::AlignIO;
> >>>>> use Bio::PopGen::Utilities;
> >>>>>
> >>>>> $file = "your_raw_file.fas";
> >>>>>
> >>>>>
> >>>>> my $aln = Bio::AlignIO->new(-format=>'fasta',
-file=>$file)->next_aln;
> >>>>> # get the alignment into a Bio::PopGen::Population format, with
codons
> >>>>> # as the marker sites
> >>>>> my $pop =
Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
> >>>>> -site_model=>'cod');
> >>>>> # here are your variable codons...
> >>>>> my @cdnpos = $pop->get_marker_names;
> >>>>> # here are your individuals represented in the alignment
> >>>>> my @inds = $pop->get_Individuals;
> >>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
> >>>>> foreach my $cdn (@cdnpos) {
> >>>>> # calculate the unique codons represented at this codon position
> >>>>> my (%ucdns, @ucdns);
> >>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
> >>>>> $ucdns{$_->get_Alleles}++ for @genos;
> >>>>> @ucdns = sort keys %ucdns;
> >>>>> #
> >>>>> # here, use translate or something faster to identify syn/non-syn
> >>>>> # check out code in Bio::Align::DNAStatistics for various methods
> >>>>>
> >>>>> }
> >>>>> # relate back to individuals with this
> >>>>> foreach my $ind (@inds) {
> >>>>> print "Individual ".$ind->unique_id."\n";
> >>>>> print "Site\tAllele\n";
> >>>>> foreach my $cdn (@cdnpos) {
> >>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
> >>>>> }
> >>>>> }
> >>>>>
> >>>>>
> >>>>> 1;
> >>>>>
> >>>>> ----- Original Message ----- From: "Abhishek Pratap" <
> >>>>> abhishek.vit <at> gmail.com>
> >>>>> To: <bioperl-l <at> lists.open-bio.org>
> >>>>> Sent: Wednesday, July 08, 2009 10:24 AM
> >>>>> Subject: [Bioperl-l] Classifying SNPs
> >>>>>
> >>>>>
> >>>>>
> >>>>> Hi All
> >>>>>
> >>>>> This might seem to be an old track question. However I was not
able to
> >>>>> find a good answer in the many diff mailing list archives.
> >>>>>
> >>>>> For all our SNP predictions we would like to know whether they are
> >>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find
the
> >>>>> position on the gene where amino acid is getting changed and to
what
> >>>>> ...Also info about indels will help.
> >>>>>
> >>>>> I am not sure if something like this already exists. If not even
some
> >>>>> pointers on how to move forward will help.
> >>>>>
> >>>>> Thanks,
> >>>>> -Abhi
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l <at> lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l <at> lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l <at> lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l <at> lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> > --
> > Jason Stajich
> > jason <at> bioperl.org
> >
> >
=====================================================================
Pablo Marin-Garcia, PhD
\\// (Argiope bruennichi
\/\/`(||>O:'\/\/ with stabilimentum)
//\\
Sanger Institute | PostDoc / Computer Biologist
Wellcome Trust Genome Campus | team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH | room : N333
United Kingdom | email: pablo.marin at sanger.ac.uk
====================================================================
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list