[Bioperl-l] Downloading refseq genomes in batch

shalabh sharma shalabh.sharma7 at gmail.com
Fri Apr 6 18:27:29 UTC 2012


Hi Chris,
             I am using the method you suggested.
But i have a question. The UIDs that i am searching using "esearch" are not
same as the number of proteins in that genome.

For Example:
for  'Homo sapiens" if i use esearch i get "3277310" UIDs but if i go to
NCBI genome page i see there are only =~ 32,00 proteins.
http://www.ncbi.nlm.nih.gov/genome?term=Homo%20sapiens

Thanks
Shalabh


On Thu, Apr 5, 2012 at 10:40 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>         Thanks for all the suggestions.
> Thanks a lot Chris, i am using your method to pull out genomes. Its
> working fine.
>
> Thanks
> Shalabh
>
>
> On Tue, Apr 3, 2012 at 5:19 PM, Fields, Christopher J <
> cjfields at illinois.edu> wrote:
>
>> 500 sequences isn't too bad for a remote lookup (I have run about ~20K
>> myself).  It's much easier if you can grab them as a batch, e.g. run
>> esearch for the IDs, use efetch with the webenv/key to grab the sequences.
>>  NCBI is more worried about the number of requests made, the length of time
>> between requests, and the time of day requests are made.
>>
>> In fact, I recall updating EUtilities recently so it can use a POST, so
>> you can grab ~2000 seqs at a time w/o having to iterate through them.
>>
>> chris
>>
>> On Apr 3, 2012, at 4:02 PM, Juan Jovel wrote:
>>
>> >
>> > Hi Shalab
>> > You can try use Bio::DB::GenBank, but I believe the NCBI does not like
>> people doing many remote lookups. I would advise you download the whole
>> database you are interested in, and then you parse it locally.
>> > Cheers, Juan
>> >> Date: Tue, 3 Apr 2012 14:15:16 -0400
>> >> From: shalabh.sharma7 at gmail.com
>> >> To: carandraug+dev at gmail.com
>> >> CC: Bioperl-l at lists.open-bio.org
>> >> Subject: Re: [Bioperl-l] Downloading refseq genomes in batch
>> >>
>> >> Hi Came,
>> >>              Thanks for your reply.
>> >> I tried to get UID from genome names but i cant find on EUtilities.
>> >> I have taxa id for those genomes, can i download genomes with taxa id
>> in
>> >> batch ?
>> >>
>> >> Thanks
>> >> Shalabh
>> >>
>> >>
>> >> On Tue, Apr 3, 2012 at 11:53 AM, Carnë Draug <carandraug+dev at gmail.com
>> >wrote:
>> >>
>> >>> On 3 April 2012 16:34, shalabh sharma <shalabh.sharma7 at gmail.com>
>> wrote:
>> >>>> Hi All,
>> >>>>        I am trying to download refseq genomes in batch. But instead
>> of
>> >>>> accession number i have genome names (=~ 500).
>> >>>> Is there any way i can download them using some bioperl module ?
>> >>>
>> >>> If you have their name/official symbol, then searching on the database
>> >>> should nly return one hit, therefore one UID. Make the search, get
>> >>> that number, and use it for download. The EUtilities module should do
>> >>> that.
>> >>>
>> >>> Carnë
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Shalabh Sharma
>> >> Scientific Computing Professional Associate (Bioinformatics Specialist)
>> >> Department of Marine Sciences
>> >> University of Georgia
>> >> Athens, GA 30602-3636
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> Shalabh Sharma
> Scientific Computing Professional Associate (Bioinformatics Specialist)
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
>



-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636




More information about the Bioperl-l mailing list