[Biopython] Samtools Pileup format - NGS data
Sean Davis
sdavis2 at mail.nih.gov
Wed Oct 20 16:44:38 UTC 2010
On Wed, Oct 20, 2010 at 12:16 PM, Adrian Johnson
<oriolebaltimore at gmail.com>wrote:
> Dear group,
>
> I am wondering about any functionality in BioPython that deals with
> annotation of SNPs identified through NGS pipelines.
>
> For instance if given a Pileup format :
>
> chr1 799195 * */+G 115 115 33 37 * +G
> chr1 811750 a G 36 36 60 3 Ggg AB?
> chr1 815761 C A 2 33 46 3 A.a CCC
> chr1 815777 C T 2 33 46 3 T.t CCC
>
>
> Now it would be very interesting to have a module that connects to
> NCBI or UCSC servers and compute the following questions:
>
> 1. Identify what mutation type at a given position on a chromosome (
> 815777@ chr1). The mutation could be a synonymous, frame-shift etc.
>
> 2. Get gene name, accession and protein accession.
>
> 3. Get the type of amino-acid change such as Gly -> Ser
>
> 4. If this SNP is observed in dbSNP, 1000 genomes data and other
> mutation databases.
>
> 5. Get the allele frequencies from dbSNP for this SNP if found in dbSNP
>
> 6. Location of the SNP - viz. intron, 5'UTR, 3'UTR or splice site.
>
>
> A web service from Shedure lab is available for this type of
> questions. Given MAQ or Pileup format, this website reports answers to
> all the questions above. However, the website is slow and cannot be
> used in a pipeline.
>
> Any BioPython user or developer working on this kind of functionality?
>
>
Hi, Adrian. You might look at the SIFT application. It can be downloaded
and includes precomputed results for 1,2,3, and dbSNP part of 4 as several
sqlite database files. We dump those databases out and use the text files
directly. With BEDtools (and there python libraries like bxPython with
similar functionality), number 6 is also quite straightforward (single
command line, basically), also. If you have other tab-delimited text files
with genomic things of interest, consider using tabix (from the samtools
site) to index the compressed, sorted files. tabix includes a python
wrapper that allows nearly instantaneous overlap queries and returns rows
from the text file.
Sean
More information about the Biopython
mailing list