[Biopython] Entrez.esearch sort by publication date

Renato Alves rjalves at igc.gulbenkian.pt
Mon Jun 1 13:49:47 EDT 2009


Quoting Peter on 06/01/2009 11:30 AM:
> On Sun, May 31, 2009 at 6:16 PM, Renato Alves <rjalves at igc.gulbenkian.pt> wrote:
>> Hi everyone,
>>
>> I've been using Entrez.esearch for a while without problems but today I
>> wanted to have the results sorted by publication date.
>>
>> According to the docs at:
>> http://www.ncbi.nlm.nih.gov/corehtml/query/static/esearch_help.html#Sort
>> I should use 'pub+date', however this doesn't work. If I use 'author'
>> and 'journal' I have no problems but if I use 'last+author' or
>> 'pub+date' I get an empty reply:
>>
>>>>> Entrez.esearch(db='pubmed', term=search, retmax=5,
>> sort='pub+date').read()
>> <?xml version="1.0" ?>\n<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD
>> eSearchResult, 11 May 2002//EN"
>> "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">\n<eSearchResult><Count/><RetMax/><RetStart/><TranslationSet/><QueryTranslation/></eSearchResult>\n'
>>
>> Any suggestions on how to make this work?
> 
> The NCBI documentation for "sort" says "Use in conjunction with Web
> Environment to display sorted results in ESummary and EFetch.", and in
> the example above you are not using the Web Environment (history)
> mode.
> 
> i.e. I think you need to do an ESearch with history="Y" and
> sort="pub+date", then an EFetch which will be in date order.
> 
> If you get this working, perhaps you could share a complete example?
> It would make a nice cookbook entry for the wiki.
> 
> Peter
Hi again Peter,

After further testing I came to the conclusion that this is a problem of
character escaping. The '+' sign in the 'pub+date' statement is
converted to '%2B' giving wrong results. Since ' ' is escaped to '+'
then the correct syntax would be 'pub date' instead of 'pub+date'.

A working example would be: (Feel free to add it to the cookbook)

#! /usr/bin/env python

from Bio import Entrez, Medline
from datetime import datetime

# Make sure you change this to your email
Entrez.email = 'somemail at somehost.domain'

def fetch(t, s):
    h = Entrez.esearch(db='pubmed', term=t, retmax=5, sort=s)
    idList = Entrez.read(h)['IdList']

    if idList:
        handle = Entrez.efetch(db='pubmed', id=idList,
rettype='medline', retmode='text')
        records = Medline.parse(handle)

        for record in records:
            title = record['TI']
            author = ', '.join(record['AU'])
            source = record['SO']
            pub_date = datetime.strptime(record['DA'], '%Y%m%d').date()
            pmid = record['PMID']

            print("Title: %s\nAuthor(s): %s\nSource: %s\n"\
                    "Publication Date: %s\nPMID: %s\n" % (title, author,
                        source, pub_date, pmid))

print('-- Sort by publication date --\n')
fetch('Dmel wings', 'pub date')

print('-- Sort by first author --\n')
fetch('Dmel wings', 'author')

# EOF

--
Renato


More information about the Biopython mailing list