[Biopython] Entrez.read return value is typed as a string??

Peter biopython at maubp.freeserve.co.uk
Thu Oct 29 10:29:43 UTC 2009


On Thu, Oct 29, 2009 at 3:19 AM, Ben O'Loghlin <bassbabyface at yahoo.com> wrote:
>
> Hi Peter,
>
> Many thanks for your post, you cleared up a world of confusion for me.
>
> A few answers/comments:
>
>>> Oh dear - were you working though the Entrez chapter in the Tutorial?
>>> If not, what where you looking at?
>
> No, I didn't find the tutorial until you mentioned it.

Did you look at the Biopython website at all? We do try and highlight
the Tutorial as it is the primary documentation, especially for newcomers.
Perhaps you can suggest how to make it more prominent? A fresh set
of eyes can give useful perspective.

> I came across
> BioPython by Googling "python pubmed", the most relevant hit on the first
> screenful seemed to be the first one, at
> http://baoilleach.blogspot.com/2008/02/searching-pubmed-with-python.html.
>
> This brief blog describes access via the Bio.EUtils package which seems to
> have disappeared, and it took me about 45 mins to realise that it was no
> longer in the distro and to track down Bio.Entrez.

Deprecations are recorded in the DEPRECATED file included with the
source code, the latest version can be viewed here:
http://github.com/biopython/biopython/blob/master/DEPRECATED

The removal of Bio.EUtils happened in Biopython 1.52, and was in this
case also noted in the NEWS file, but not the actual release notice:
http://github.com/biopython/biopython/blob/master/NEWS
http://news.open-bio.org/news/2009/09/biopython-release-152/

> Then Googling BioPython Entrez, the first hit took me to the documentation
> (I missed spotting the tutorial link!) and all subsequent attempts were
> based on reading this doco and the source code, and scratching my head and
> trying random things.

Do you mean the API documentation, available via Python though the help
command and viable online here:

http://www.biopython.org/DIST/docs/api/Bio.Entrez-module.html

You can probably tell we put more effort into the Tutorial as an introduction
document.

>>So you see by default, the NCBI is returning HTML. We can ask for XML:
>>
>>>>> handle = Entrez.efetch(db="pubmed", id="17206916", retmode="XML")
>>>>> print handle.readline()
>><?xml version="1.0"?>
>
> This all makes sense now, I wasn't aware of the different 'retmode' options.
> The Bio.Entrez.efetch() documentation points me to
> http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetch_help.html, which
> doesn't mention the 'retmode' or 'rettype' parameters. In fact I couldn't
> find any explicit reference to it in the Tutorial either, just the use of
> 'rettype=text' in one of the example code snippets.
>
> I subsequently tracked down this page
> http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchlit_help.html which
> does at least indicate the different rettypes and retmodes available.

I agree the NCBI Entrez documentation is very unhelpful to beginners.
We do try and make this easier in our tutorial, but perhaps "retmode"
and "rettype" need to be discussed more on the EFetch section (they
are mentioned a little later in the chapter in the context of other formats)

>>You could parse this with Bio.Entrez.read() if you wanted to:
>>
>>>>> handle = Entrez.efetch(db="pubmed", id="17206916", retmode="XML")
>>>>> record = Entrez.read(handle)
>>>>> print record
>>[{u'MedlineCitation': ... ]
>
> I'm interested in using this format, however I don't understand how to
> read/write fields and subtrees of the object type
> 'Bio.Entrez.Parser.ListElement' returned by Entrez.read(handle) with retmode
> XML.
>
> I'm finding it hard to track down references to this [{u'x':['y']}] object
> format in Python , possibly due to the fact that I can't get Google to
> search for strings like [{u'. I am however appreciative that there appears
> to be a u'SpaceFlightMission' tag in Pubmed's default rettype. :)

Michiel has tried to answer this. Are you familiar with the basic Python
datatypes?

> I'm also a little confused about why handle.read() returns a string in XML
> format whereas Entrez.read(handle) returns the
> Bio.Entrez.Parser.ListElement. In fact I only knew about this latter method
> from your email, since the example in the Bio.Entrez doco only uses the
> handle.read() syntax, and doesn't mention that there's any distinction, nor
> which might be more appropriate for which task.

In handle.read(), read is a method of an object called handle, in this
case a handle to a network connection.

In Entrez.read(), read is a function of the Entrez module.

In Python, xxx.yyy() means either the "yyy" method of object "xxx" (where
"xxx" is a variable), or the "yyy" could be a function or class of the module
"xxx".

>> Does that help?
>
> Immensely.
>
> If you (or any other Bio.Wizards) have the time and the inclination to help
> me further, I'd be very grateful for any thoughts relevant to my ponderings
> above.

I would suggest you read through some Python introductions, and then
go through the Biopython tutorial again. We have to assume our readers
know a bit of Python - and my guess is from your questions that many
of your issues are with Python itself rather than Biopython. But you are
learning :)

Peter



More information about the Biopython mailing list