[Biopython] Searching for and downloading sequences using the history

Fri Sep 18 18:23:26 UTC 2009

Hi Carlos,
  I had a look what the Entrez.esearch object has in its properties and
I see RetMax attribute.

>>> search_results
{u'Count': '9279', u'RetMax': '20', u'IdList': ['189229275', '189229274', '189229273', '189229272', '189229271', '189229
270', '189229269', '189229268', '189229267', '189229266', '189229265', '189229264', '189229263', '189229262', '189229261
', '189229260', '189229199', '189229198', '189229197', '189229196'], u'TranslationStack': [{u'Count': '9279', u'Field': 
'All Fields', u'Term': 'Genus[All Fields]', u'Explode': 'Y'}, 'GROUP'], u'QueryTranslation': 'Genus[All Fields]', u'Erro
rList': {u'FieldNotFound': [], u'PhraseNotFound': ['specie']}, u'TranslationSet': [], u'RetStart': '0', u'QueryKey': '1'
, u'WebEnv': 'NCID_1_3207467_130.14.22.148_9001_1253297878'}
>>>

So, here we go:

>>> search_handle = Entrez.esearch(db=dbname,term=query_term,usehistory="y",RetMax=99999)
>>> search_results = Entrez.read(search_handle)
>>> search_handle.close()
>>> len(search_results["IdList"])
9279
>>> 

BTW:
>>> search_handle = Entrez.esearch(db=dbname,term=query_term,usehistory="y",RetMax=9999999999)
>>> search_results = Entrez.read(search_handle)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 297, in read
    record = handler.run(handle)
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 90, in run
    self.parser.ParseFile(handle)
  File "/usr/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 141, in endElement
    raise RuntimeError(value)
RuntimeError: Search Backend failed: NCBI C++ Exception:
    Error:        CORELIB(CStringException::eConvert) "/pubmed_gen/rbuild/version/20090819/entrez/c++/src/corelib/ncbist
r.cpp", line 411: --- Cannot convert string '9999999999' to int, overflow (m_Pos = 0)

>>>

Hope this helps,
M.

Carlos Javier Borroto wrote:
> Hi all,
> 
> I'm trying to download all of the EST from a specie, I'm following the
> example on the tutorial which seems to be exactly what I need. But I
> running into this problem:
> 
>>>> from Bio import Entrez
>>>> Entrez.email = "carlos.borroto at gmail.com"
>>>> dbname = "nucest"
>>>> query_term = "Genus specie"
>>>> search_handle = Entrez.esearch(db=dbname,term=query_term,usehistory="y")
>>>> search_results = Entrez.read(search_handle)
>>>> search_handle.close()
>>>> len(search_results["IdList"])
> 20
>>>> print search_results["Count"]
> 193951
> 
> So the assert statement if failing:
>>>> gi_list = search_results["IdList"]
>>>> count = int(search_results["Count"])
>>>> assert count == len(gi_list)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AssertionError
> 
> And most important I'm not getting all of the ids.
> 
> Did someone knows what I'm doing wrong?
> 
> thanks in advance