[Biopython] History, Efetch, and returned records limits

Fields, Christopher J cjfields at illinois.edu
Sat Apr 14 09:22:55 EDT 2012


On Apr 14, 2012, at 7:54 AM, Peter Cock wrote:

> On Sat, Apr 14, 2012 at 1:35 PM, Mariam Reyad Rizkallah
> <mrrizkalla at gmail.com> wrote:
>> Dear community,
>> 
>> I aim to get sequences by a list of gi (using efetch and history
>> variables), for a certain taxid (using esearch). I always get the first
>> 10,000 records. For example, I need10,300 gi_ids, I split them into list of
>> 10,000 and submit them consecutively and still getting the first 10,000
>> records. I tried batch approach in Biopython tutorial, didn't even reach
>> 10,000 sequences.
>> 
>> Is there a limit for NCBI's returned sequences?
>> 
>> Thank you.
>> 
>> Mariam
> 
> It does sound like you've found some sort of Entrez limit,
> it might be worth emailing the NCBI to clarify this.
> 
> Have you considered downloading the GI/taxid mapping
> table from their FTP site instead? e.g.
> http://lists.open-bio.org/pipermail/biopython/2009-June/005295.html
> 
> Peter

This wouldn't surprise me, they have long suggested breaking up record retrieval into batches of a few thousand or more, using retstart/retmax.

chris




More information about the Biopython mailing list