[Biopython] Formatting of the xml return from Entrez DB Gene is not parsed correctly
Michael S. Koeris
michael.koeris at gmail.com
Fri Sep 11 09:52:40 EDT 2009
Hi Peter,
that helps a lot. Indeed that's what I am really looking for. So the
NCACs I get back appear to be in the order in which they appear in the
GeneDB listing (from Chromosome to the various mRNA variants).
Searching those then I can easily narrow it down further to the NM_*
type listings I really need (since I am looking for the full length
mRNA of all variants usually.
Thanks!
Mike
On Sep 11, 2009, at 9:37 AM, Peter wrote:
> Hi Michael,
>
> I've CC'd this to the list.
>
> On Fri, Sep 11, 2009 at 1:51 PM, Michael S. Koeris
> <michael.koeris at gmail.com> wrote:
>>
>> Yes indeed that does help - go dyslexia....
>
> Easily done. Actually, on looking a little closer the NCBI returned
> "XML presented with HTML" (full of < and > entities) - still
> quite
> unsuitable for parsing, but not actually an error page as I assumed.
>
>> what seems to happen though is that it's not a dictionary but a list
>> made up of multiple dictionaries is that right?
>
> Probably - the Bio.Entrez parser will turn the XML nested structure
> into
> lists and dictionaries as appropriate.
>
> Going back to your original email, you just wanted "to parse out the
> nucleic acid accession numbers from an Entrez.efetch query made
> to the Gene database.", so I would actually suggest you should be
> using elink instead of efetch. See for example,
>
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html
> http://lists.open-bio.org/pipermail/biopython/2009-August/005472.html
>
> In your case something like this:
>
>>>> from Bio import Entrez
>>>> data = Entrez.read(Entrez.elink(db="nuccore",
>>>> dbfrom="gene",id="90", retmode="xml"))
>>>> for db in data :
> ... print "Links for", db["IdList"], "from database", db["DbFrom"]
> ... for link in db["LinkSetDb"][0]["Link"] : print link["Id"]
> ...
> Links for ['90'] from database gene
> 224589811
> 224514625
> 194387497
> 190194409
> 187169269
> 187169268
> 164694819
> 157724517
> 157696421
> 89161198
> 88958353
> 74230050
> 50504351
> 22450871
> 21707501
> 18097079
> 15668129
> 2295237
> 402184
> 338218
>
> Peter
More information about the Biopython
mailing list