[Bioperl-l] downloading multiple contigs from ncbi nucleotide database
Rohit Ghai
ghai.rohit at gmail.com
Fri Aug 21 07:34:49 EDT 2009
Hello all
I would like to download the wgs sequences of the unfinished genomes from
ncbi.
(genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
here's an example accession
NZ_ACVD00000000
and here's the link to the accession at genbank
http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000
This record contains the accessions that belong to this record in the
following line in the genbank output
WGS NZ_ACVD01000001-NZ_ACVD01000139
The NZ_ACVD01000001-NZ_ACVD01000139 is the range of accession
numbers that are
are specified by this range.
here's a link
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC]
The bioperl related question is...
Since these are unassembled genomes, there are several contigs for each one,
and they all available in this record.
Is it possible to download a range without trying to recreate each accession
number?
on the other hand, it is possible to download each individually , this would
mean making the following
NZ_ACVD01000001
NZ_ACVD01000002
NZ_ACVD01000003
.
.
.
NZ_ACVD01000139
from NZ_ACVD01000001-NZ_ACVD01000139
I can recreate these numbers and download each one separately. However,
sometimes I get a timeout exception
and the whole thing stops.
the code ( copied shamelessly from the bioperl website, works great to get
single accessions)
my $id = "NZ_ACVD00000000";
my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
-db =>
'nucleotide',
-id =>
$id,
-rettype
=> 'gbwithparts');
$factory->get_Response(-file => 'fullcontig.gb');
I did try and catch the exceptions from the get_Response..but its not
working as expected... maybe someone can point out what I'm doing wrong
here. For some reason, the code never seems to go any print statement in the
catch construct...
$ele = "somecontig id";
try {
print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n";
$factory->get_Response(-file => "$genbank_file");
} catch Bio::Root::Exception with {
my $err = shift;
if (! defined $err) {
print "MAY HAVE DOWNLOADED $ele..\n";
} else {
print "PROBABLE TIMEOUT ERROR\n";
print "$err\n";
}
};
Or is it possible to somehow increase the timeout time for the get_Response
method?
thanks in advance!
regards
Rohit
More information about the Bioperl-l
mailing list