[Bioperl-l] Best method for downloading 100 sequences

Barry Moore bmoore at genetics.utah.edu
Fri Sep 2 14:53:17 EDT 2005


Most of the bioperl remote database modules have something like this
from Bio::DB::GenBank...

    my $seqio = $gb->get_Stream_by_acc(['AC013798', 'AC021953'] );

You should put some sore of a sleep in there.  It runs in my mind that
NCBI asks for 3 seconds, but I could be wrong about that.  100 shouldn't
be a problem for them.  1000+ you might want to think about downloading
directly by ftp and parsing at home.

Barry

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Amir Karger
Sent: Friday, September 02, 2005 7:55 AM
To: 'bioperl-l at portal.open-bio.org'
Subject: [Bioperl-l] Best method for downloading 100 sequences

Hi.

I'm using Bioperl's nice get_sequence in my Scriptome toolbox, to fetch
a
single sequence. What would be the best method for downloading 100
sequences? Do I write a loop to call get_sequence N times? Will the
various
websites get angry at me for doing that? Would they be less angry if I
did a
1-second sleep after each download? I know NCBI has methods to pull in N
sequences, but I don't know whether Swiss et al. do too. I'm happy to
use
other Bioperl code, rather than get_sequence. I just need to have a
script
that people can cut and paste, where they just input the filename with
sequence IDs and the database to download from (sort of like
http://cgr.harvard.edu/cbg/scriptome/Tools/Fetch.html#fetch_a_sequence_f
rom_
a_popular_internet_database__fetch_sequence_web_)

Thanks,

-Amir Karger
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list