[BioPython] EUtils strange behaviour
Bonis Sanz, Julio
JBonis at imim.es
Wed Nov 10 08:56:18 EST 2004
Hi all, hi Andrew,
I have been working on the problem of SNPs....
BTW: I found the "extrafeat" param by emailing with NCBI staff. No documented parameter! (really, the documentation of EUtils is very poor).
Something curious about extrafeat is that if you put extrafeat=0 you dont get any feature.variation, if you send extrafeat=1 you get feature.variation /s ... but (and this is the curious thing) if you send extrafeat=3 or 5 or 7 or 9 ... you get the same results that for extrafeat=1 ... :D .... only God knows how that param works internally in Entrez logic :).
Turning on the original problem:
One indirect way to solve it is by using elink... in this way:
You send to elink.fcgi:
dbfrom=nucleotide
db=snp
id= __GI of the mRNA of your interest__
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=snp&id=10835174
so you get:
<!-- PART OF XML RESULTS -->
<?xml version="1.0"?>
<!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD eLinkResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eLink_020511.dtd">
<eLinkResult>
<LinkSet>
<DbFrom>nucleotide</DbFrom>
<IdList>
<Id>10835174</Id>
</IdList>
<LinkSetDb>
<DbTo>snp</DbTo>
<LinkName>nucleotide_snp</LinkName>
<Link>
<Id>633737</Id>
</Link>
<Link>
<Id>9534508</Id>
</Link>
<Link>
<Id>4142900</Id>
</Link>
<!-- END OF PART OF XML RESULTS -->
Then in some way you have to "extract" the list of rs IDs in a list rsList = []... I have not work on it but should not be hard...
is in biopython any class or method already implemented to do this task?
Then you can make a kind of
for rs in rsList:
....
and for each rs do the following:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html
sending to efetch.fcgi the following:
db = snp
id = __the rs__
rettype = flt
retmode = html
so you get this:
<!-- START OF SNP INFORMATION -->
1: rs6313 [Homo sapiens]
rs6313 | Hs | 9606 | snp | genotype=NO | submitterlink=YES | updated 08/05/2004 16:40:00
ss7941 | WIAF-CSNP | WIAF-10853 | orient=+
ss4928286 | YUSUKE | IMS-JST093413 | orient=-
ss11087375 | BCM_SSAHASNP | chr13.NT_024524.12_16044432 | orient=-
ss13329237 | SC_SNP | NT_024524.12_16044432 | orient=-
ss19284906 | CSHL-HAPMAP | CSHL-HuDD-200402.chr13.NT_024524.13_28449941 | orient=-
ss21114665 | SSAHASNP | WGSA-200403-chr13.chr13.NT_024524.13_28449941 | orient=-
ss22887275 | IMCJ-GDT | IMCJ-HTR2A_4-CT | orient=+
SNP | alleles='C/T' | het=0.499997 | se(het)=0.00128609
VAL | validated=YES | min_prob=0 | max_prob=? | notwithdrawn
MAP | ncbi_num_chr=1 | ncbi_num_ctg=1 | ncbi_num_sec_loc=1 | ncbi_weight=1
CTG | chr=13 | chr-pos=45267941 | Hs13_24680_34:13 | ctg-start=28449941 | ctg-end=28449941 | loctype=2 | orient=-
LOC | HTR2A | locus_id=3356 | fxn-class=coding-synon | allele=T | frame=3 | residue=S | aa_position=34
LOC | HTR2A | locus_id=3356 | fxn-class=reference | allele=C | frame=3 | residue=S | aa_position=34
GBL | HTR2A | locus_id=3356 | fxn-class=coding-synon
SEQ | AF498982:1 | source-db=gb-mrna | seq-pos=102 | orient=+
SEQ | AL160397:1 | source-db=hgs-finish | seq-pos=45162 | orient=-
SEQ | G28536:1 | source-db=gb-sts | seq-pos=247 | orient=+
SEQ | M86841:1 | source-db=gb-mrna | seq-pos=78 | orient=+
SEQ | NM_000621:1 | source-db=ref-mrna | seq-pos=247 | orient=+
SEQ | S42165:1 | source-db=hgs-finish | seq-pos=102 | orient=+
SEQ | S71229:1 | source-db=gb-mrna | seq-pos=152 | orient=+
SEQ | X57830:1 | source-db=gb-mrna | seq-pos=247 | orient=+
<!-- END OF SNP INFORMATION -->
Again, somekind of script should extract the information you are interested in... and that's all!
I will work in it and send my script soon...
cheers,
Julio Bonis Sanz MD
http://www.juliobonis.com/portal/
-----Mensaje original-----
De: Andrew Dalke [mailto:dalke at dalkescientific.com]
Enviado el: miércoles, 10 de noviembre de 2004 9:06
Para: Bonis Sanz, Julio
CC: biopython at biopython.org
Asunto: Re: [BioPython] EUtils strange behaviour
Hi again Julio,
> Any comment, suggestion, idea (apart from RTFM... I promise I have
> read and read!!!!)
No doubt you were as frustrated as I on the documentation. It's
opaque and incomplete. I spent a lot of time doing what you were
doing, testing things and seeing if I could make sense of it all.
I didn't figure out the SNPs interface. I just don't know enough
about that domain and there's pretty much no documentation for that.
You should try the NCBI EUtils help desk at eutilities at ncbi.nlm.nih.gov
.
Andrew
dalke at dalkescientific.com
More information about the BioPython
mailing list