[Biopython] Formatting of the xml return from Entrez DB Gene is not parsed correctly

Michael S. Koeris michael.koeris at gmail.com
Fri Sep 11 09:52:40 EDT 2009


Hi Peter,

that helps a lot. Indeed that's what I am really looking for. So the  
NCACs I get back appear to be in the order in which they appear in the  
GeneDB listing (from Chromosome to the various mRNA variants).  
Searching those then I can easily narrow it down further to the NM_*  
type listings I really need (since I am looking for the full length  
mRNA of all variants usually.

Thanks!

Mike


On Sep 11, 2009, at 9:37 AM, Peter wrote:

> Hi Michael,
>
> I've CC'd this to the list.
>
> On Fri, Sep 11, 2009 at 1:51 PM, Michael S. Koeris
> <michael.koeris at gmail.com> wrote:
>>
>> Yes indeed that does help - go dyslexia....
>
> Easily done. Actually, on looking a little closer the NCBI returned
> "XML presented with HTML" (full of &lt; and &gt; entities) - still  
> quite
> unsuitable for parsing, but not actually an error page as I assumed.
>
>> what seems to happen though is that it's not a dictionary but a list
>> made up of multiple dictionaries is that right?
>
> Probably - the Bio.Entrez parser will turn the XML nested structure  
> into
> lists and dictionaries as appropriate.
>
> Going back to your original email, you just wanted "to parse out the
> nucleic acid accession numbers from an Entrez.efetch query made
> to the Gene database.", so I would actually suggest you should be
> using elink instead of efetch. See for example,
>
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/elink_help.html
> http://lists.open-bio.org/pipermail/biopython/2009-August/005472.html
>
> In your case something like this:
>
>>>> from Bio import Entrez
>>>> data = Entrez.read(Entrez.elink(db="nuccore",  
>>>> dbfrom="gene",id="90", retmode="xml"))
>>>> for db in data :
> ...     print "Links for", db["IdList"], "from database", db["DbFrom"]
> ...     for link in db["LinkSetDb"][0]["Link"] : print link["Id"]
> ...
> Links for ['90'] from database gene
> 224589811
> 224514625
> 194387497
> 190194409
> 187169269
> 187169268
> 164694819
> 157724517
> 157696421
> 89161198
> 88958353
> 74230050
> 50504351
> 22450871
> 21707501
> 18097079
> 15668129
> 2295237
> 402184
> 338218
>
> Peter



More information about the Biopython mailing list