[BioPython] EUtils strange behaviour

Bonis Sanz, Julio JBonis at imim.es
Tue Nov 9 11:58:55 EST 2004


Hi!

I am playing again with EUtils and have detected a strange behaviour....

Let me focus in HTR2A receptor in humans' gene:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=10835174

Well, what I want to do is to retrieve in a gb_record (GenBank python library) all the SNPs in that gene... And it is getting more complex than I guessed....

I have passed to EUtils efetch this parameters:

db=nucleotide
id=10835174 (is the gi for my mRNA: HTR2A)
retmode=html
rettype=gb (GenBank mode)
extrafeat=1 (to get the SNPs... if you dont send extrafeat=1 EUtils will not show you any SNP or variation)....


This is the URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1

Well, there are 7 variations.... 4 of them are outside the limits of the CDS (so are not transcribed). I am only interested in the SNPs of the exons....

Finally I get these 3 SNPs:

     variation       219
                     /gene="HTR2A"
                     /replace="a"
                     /replace="c"
                     /db_xref="dbSNP:1805055"
     variation       734
                     /gene="HTR2A"
                     /replace="a"
                     /replace="g"
                     /db_xref="dbSNP:6304"
     variation       1407
                     /gene="HTR2A"
                     /replace="c"
                     /replace="t"
                     /db_xref="dbSNP:1058576"


BUT.... here is the problem....

If you go to LocusLink for HTR2A gene you will find it:

http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=3356&mrna=NM_000621&ctg=NT_024524&prot=NP_000612&orien=reverse&view+rs+=view+rs+&chooseRs=coding

This is the list of SNPs for the exons in that HTR2A mRNA....

rs6314	
rs6308
rs1058576  
rs6304     
rs6305
rs6313
rs1805055  

That compared with the list we get from EUtils is:

LocusLink            EUtils

rs6314	
rs6308
rs1058576  ========= 1058576
rs6304     ========= 6304
rs6305
rs6313
rs1805055  ========= 1805055

You can see that there are 5 missing SNPs.... why?

Well, I have explanation for two of them (rs6305 and rs6313) are synonymous SNPs (no change in the protein sequence), but what about 6308 and rs6314? where have they gone?

I need to solve this mistery, to build an script that retrieves on-the-fly (from EUtils) all the SNPs of a given mRNA. I dont want to take all the contigs or FTPing the hole RefSeq human genome database. :S.

Of course I need to retrieve synonymous SNPs also (in fact one of the most clinically relevant SNPs in HTR2A is that synonymous SNPs rs6313... and nobody seems to know why).

Any comment, suggestion, idea (apart from RTFM... I promise I have read and read!!!!)



More information about the BioPython mailing list