[Biopython] Samtools Pileup format - NGS data

Adrian Johnson oriolebaltimore at gmail.com
Wed Oct 20 16:16:43 UTC 2010


Dear group,

I am wondering about any functionality in BioPython that deals with
annotation of SNPs identified through NGS pipelines.

For instance if given a Pileup format :

chr1    799195  *       */+G    115     115     33      37      *       +G
chr1    811750  a       G       36      36      60      3       Ggg     AB?
chr1    815761  C       A       2       33      46      3       A.a     CCC
chr1    815777  C       T       2       33      46      3       T.t     CCC


Now it would be very interesting to have a module that connects to
NCBI or UCSC servers and compute the following questions:

1. Identify what mutation type at a given position on a chromosome (
815777@ chr1). The mutation could be a synonymous, frame-shift etc.

2. Get gene name, accession and protein accession.

3. Get the type of amino-acid change such as Gly -> Ser

4. If this SNP is observed in dbSNP, 1000 genomes data and other
mutation databases.

5. Get the allele frequencies from dbSNP for this SNP if found in dbSNP

6. Location of the SNP - viz.  intron, 5'UTR, 3'UTR or splice site.


A web service from Shedure lab is available for this type of
questions. Given MAQ or Pileup format, this website reports answers to
all the questions above.  However, the website is slow and cannot be
used in a pipeline.

Any BioPython user or developer working on this kind of functionality?

thanks

Adrian



More information about the Biopython mailing list