[Bioperl-l] Classifying SNPs

Tue Jul 14 23:57:52 UTC 2009

fixing a typo and explaining a gotcha

On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote:

>
> Hello Abhishek
>
> Ensembl has a module for calculate SNP consequences in a transcript.
>
> The script that they use to create their consequences is located in:
>
> ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl
>
> The important bit is to  convert your snp coordenates and the 
> variation_allele into a ConsequenceType object
>
> $consequence_type = 
> Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles);
>

fixing typo: (instead $chr it would be a $variation_id)

Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles);

warning:

The transcript_id and the variation_id are not important if you are not 
building a ensembl database.

BUT the gotcha part is that the start and end of the variation should 
refer to the same slice start than the transcript used in the next step 
(type_variation). Be careful because depending how you select the gene or slice to retrieve 
your transcripts your transcript start and end would be the chromosome 
coordinates or a relative start/end from the slice start.

You should work with chr positions for the variations and the transcripts 
(where start/end == seq_region_start/seq_region_end) to avoid problems.

> and pass this and a transcript to the  type_variation
> Bio::EnsEMBL::Utils::TranscriptAlleles exported method
>
>  $consequences = type_variation($tr, $gene, $consequence_type);
>

The $gene is optional

> in the module
>
> ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm
>
> The other important bit in this script is that now the functional_genomics 
> consequences are calculated in this script instead in the type_variation()
>
> The only drawback is that it return only the ensembl classes of consequences 
> , but you can extend that later if you need more specific consequences (I 
> have done that in the past for different projects).
>
> This ensembl aproach will save you a lot of problems with the mapping from 
> gene to protein and with multiple snps in a codon.
>
> If you have experience with ensembl then is easy to follow the code. If not 
> you can always ask for help in the ensembl-dev mailing list 
> (ensembl-dev at ebi.ac.uk)
>
>
> If you want to read the code without checking out the whole api:
>
>
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup
> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log
>
>
> hope this helps
>
>
>  - Pablo
>
>
>
>

=====================================================================
                      Pablo Marin-Garcia, PhD

                     \\//          (Argiope bruennichi
                \/\/`(||>O:'\/\/   with stabilimentum)
                     //\\

Sanger Institute                |  PostDoc / Computer Biologist
Wellcome Trust Genome Campus    |  team : 128/108 (Human Genetics)
Hinxton, Cambridge CB10 1HH     |  room : N333
United Kingdom                  |  email: pablo.marin at sanger.ac.uk
====================================================================

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.