[Bioperl-l] Downloading a sequence in genbank format - related problem

Chris Fields cjfields at uiuc.edu
Wed May 16 13:05:59 UTC 2007


It's likely from a timeout issue on the remote server.  One thing  
which will speed things up is to retrieve the remote sequences in  
fasta format to begin with (described in the Bio::DB::GenBank POD):

my $gb_obj = Bio::DB::GenBank->new(-retrievaltype => 'tempfile' ,
			                      -format => 'fasta');
my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);

while (my $seq_obj = $stream_obj->next_seq) {
   $out->write_seq($seq_obj);
}

I also suggest using the direct ftp downloads if at all possible  
(i.e. you are downloading WGS or contig sequences).  It's much faster.

chris

On May 16, 2007, at 4:19 AM, Georg Otto wrote:

>
> Dear all,
>
> I have a problem that has to do with downloading data from GenBank as
> well, therefor I put it in this thread.
>
> I try to get all entries from organism Danio rerio using the something
> like this:
>
>
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $query = "Danio rerio[ORGN]";
> my $query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
> 					       -query => $query);
> my $gb_obj = Bio::DB::GenBank->new;
> my $stream_obj = $gb_obj->get_Stream_by_query($query_obj);
>
>
> while (my $seq_obj = $stream_obj->next_seq) {
>   my $out = Bio::SeqIO->new(-format => 'fasta',
> 			    -file => '>>output.fas');
>   $out->write_seq($seq_obj);
> }
>
>
> However, the download process aborts after a few thousand entries. I
> do not think that this is due to the request itself or problems with
> specific entries, since the number of transferred sequences varies
> before the stop. It might rather have to do with GenBank terminating
> the connection.
>
> Has anybody a suggestion of a better strategy to achieve what I want
> (e.g. a different kind of query, a method to reassume the download at
> the point where it terminated etc.)?
>
> Best,
>
> Georg
>
>
> "Diogo Tschoeke" <diogoat at gmail.com> writes:
>
>> Dear All,
>>
>> I need to download a lot of sequence of Leishmania major in genbank
>> format...
>> But i can't download on the page of NCBI, because the downloaded  
>> file are
>> corrupted... when i use a browser to download this sequences
>> And them i looking for some script to download that`s file and fink
>> something like that:
>>
>>
>> #########################################################
>> use strict;
>> use warnings;
>>
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> my $query = Bio::DB::Query::GenBank->new
>>                                 (-query   =>'Leishmania major  
>> [Organism]',
>>                                 -db      => 'nucleotide');
>> my $gb = new Bio::DB::GenBank;
>> my $seqio = $gb->get_Stream_by_query($query);
>>
>> my $out = Bio::SeqIO->new(-format => 'genbank',
>>                           -file => '>>teste6.gb');
>> $out->write_seq($seqio);
>> #########################################################
>>
>> And the system return me this erros
>> [diogo1 at genome perl]$ perl teste6.pl
>>
>> -------------------- WARNING ---------------------
>> MSG:  Bio::SeqIO::genbank=HASH(0x96c0f08) is not a SeqI compliant  
>> module.
>> Attempting to dump, but may fail!
>> ---------------------------------------------------
>> Can't locate object method "seq" via package "Bio::SeqIO::genbank" at
>> /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/genbank.pm line 692.
>>
>> Any Ideia?
>>
>> Thank`s
>>
>> Diogo Tschoeke
>> Laboratory of Molecular Biology of Trypanosomatides
>> Fundação Osvaldo Cruz - Fiocruz RJ, Brazil
>> http:biowebdb.org <http://www.ncbs.res.in/>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list