[BioPython] EUtils strange behaviour

Bonis Sanz, Julio JBonis at imim.es
Wed Nov 10 08:56:18 EST 2004


Hi all, hi Andrew, 

I have been working on the problem of SNPs....

BTW: I found the "extrafeat" param by emailing with NCBI staff. No documented parameter! (really, the documentation of EUtils is very poor).

Something curious about extrafeat is that if you put extrafeat=0 you dont get any feature.variation, if you send extrafeat=1 you get feature.variation /s ... but (and this is the curious thing) if you send extrafeat=3 or 5 or 7 or 9 ... you get the same results that for extrafeat=1 ... :D .... only God knows how that param works internally in Entrez logic :).

Turning on the original problem:

One indirect way to solve it is by using elink... in this way:

You send to elink.fcgi:

dbfrom=nucleotide
db=snp
id= __GI of the mRNA of your interest__

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=snp&id=10835174

so you get: 


<!-- PART OF XML RESULTS -->
<?xml version="1.0"?>
<!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd">
<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>10835174</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>633737</Id>
		</Link>
		<Link>
			<Id>9534508</Id>
		</Link>

		<Link>
			<Id>4142900</Id>
		</Link>
<!-- END OF PART OF XML RESULTS -->

Then in some way you have to "extract" the list of rs IDs in a list rsList = []... I have not work on it but should not be hard... 

is in biopython any class or method already implemented to do this task?

Then you can make a kind of 
	for rs in rsList:
		....

and for each  rs do the following:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html

sending to efetch.fcgi the following:

db = snp
id = __the rs__
rettype = flt
retmode = html

so you get this: 


<!-- START OF SNP INFORMATION -->

1: rs6313 [Homo sapiens] 
rs6313 | Hs | 9606 | snp | genotype=NO | submitterlink=YES | updated 08/05/2004 16:40:00
ss7941 | WIAF-CSNP | WIAF-10853 | orient=+
ss4928286 | YUSUKE | IMS-JST093413 | orient=-
ss11087375 | BCM_SSAHASNP | chr13.NT_024524.12_16044432 | orient=-
ss13329237 | SC_SNP | NT_024524.12_16044432 | orient=-
ss19284906 | CSHL-HAPMAP | CSHL-HuDD-200402.chr13.NT_024524.13_28449941 | orient=-
ss21114665 | SSAHASNP | WGSA-200403-chr13.chr13.NT_024524.13_28449941 | orient=-
ss22887275 | IMCJ-GDT | IMCJ-HTR2A_4-CT | orient=+
SNP | alleles='C/T' | het=0.499997 | se(het)=0.00128609
VAL | validated=YES | min_prob=0 | max_prob=? | notwithdrawn
MAP | ncbi_num_chr=1 | ncbi_num_ctg=1 | ncbi_num_sec_loc=1 | ncbi_weight=1
CTG | chr=13 | chr-pos=45267941 | Hs13_24680_34:13 | ctg-start=28449941 | ctg-end=28449941 | loctype=2 | orient=-
LOC | HTR2A | locus_id=3356 | fxn-class=coding-synon | allele=T | frame=3 | residue=S | aa_position=34
LOC | HTR2A | locus_id=3356 | fxn-class=reference | allele=C | frame=3 | residue=S | aa_position=34
GBL | HTR2A | locus_id=3356 | fxn-class=coding-synon
SEQ | AF498982:1 | source-db=gb-mrna | seq-pos=102 | orient=+
SEQ | AL160397:1 | source-db=hgs-finish | seq-pos=45162 | orient=-
SEQ | G28536:1 | source-db=gb-sts | seq-pos=247 | orient=+
SEQ | M86841:1 | source-db=gb-mrna | seq-pos=78 | orient=+
SEQ | NM_000621:1 | source-db=ref-mrna | seq-pos=247 | orient=+
SEQ | S42165:1 | source-db=hgs-finish | seq-pos=102 | orient=+
SEQ | S71229:1 | source-db=gb-mrna | seq-pos=152 | orient=+
SEQ | X57830:1 | source-db=gb-mrna | seq-pos=247 | orient=+

<!-- END OF SNP INFORMATION -->

Again, somekind of script should extract the information you are interested in... and that's all!

I will work in it and send my script soon...

cheers, 

Julio Bonis Sanz MD
http://www.juliobonis.com/portal/



-----Mensaje original-----
De: Andrew Dalke [mailto:dalke at dalkescientific.com]
Enviado el: miércoles, 10 de noviembre de 2004 9:06
Para: Bonis Sanz, Julio
CC: biopython at biopython.org
Asunto: Re: [BioPython] EUtils strange behaviour


Hi again Julio,

> Any comment, suggestion, idea (apart from RTFM... I promise I have 
> read and read!!!!)

No doubt you were as frustrated as I on the documentation.  It's
opaque and incomplete.  I spent a lot of time doing what you were
doing, testing things and seeing if I could make sense of it all.

I didn't figure out the SNPs interface.  I just don't know enough
about that domain and there's pretty much no documentation for that.
You should try the NCBI EUtils help desk at eutilities at ncbi.nlm.nih.gov 
.


					Andrew
					dalke at dalkescientific.com




More information about the BioPython mailing list