[Bioperl-l] downloading multiple contigs from ncbi nucleotide database

Rohit Ghai ghai.rohit at gmail.com
Fri Aug 21 07:34:49 EDT 2009


Hello all

I would like to download the wgs sequences of the unfinished genomes from
ncbi.
(genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi

here's an example accession

NZ_ACVD00000000

and here's the link to the accession at genbank

http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000

This record contains the accessions that belong to this record in the
following line in the genbank output

WGS         NZ_ACVD01000001-NZ_ACVD01000139

The        NZ_ACVD01000001-NZ_ACVD01000139  is the range of accession
numbers that are

are specified by this range.

here's a link

http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC]


The bioperl related question is...

Since these are unassembled genomes, there are several contigs for each one,
and they all available in this record.

Is it possible to download a range without trying to recreate each accession
number?

on the other hand, it is possible to download each individually , this would
mean making the following

NZ_ACVD01000001
NZ_ACVD01000002
NZ_ACVD01000003
.
.
.
NZ_ACVD01000139

from  NZ_ACVD01000001-NZ_ACVD01000139


I can recreate these numbers and download each one separately. However,
sometimes I get a timeout exception
and the whole thing stops.

the code ( copied shamelessly from the bioperl website, works great to get
single accessions)

my $id = "NZ_ACVD00000000";
my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
                                                                   -db =>
'nucleotide',
                                                                   -id =>
$id,
                                                                  -rettype
=> 'gbwithparts');

$factory->get_Response(-file => 'fullcontig.gb');


I did try and catch the exceptions from the get_Response..but its not
working as expected... maybe someone can point out what I'm doing wrong
here. For some reason, the code never seems to go any print statement in the
catch construct...

$ele = "somecontig id";

    try {
        print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n";
        $factory->get_Response(-file => "$genbank_file");

    } catch Bio::Root::Exception with {
            my $err = shift;
        if (! defined $err) {
            print "MAY HAVE DOWNLOADED $ele..\n";
        } else {
                print "PROBABLE TIMEOUT ERROR\n";
                print "$err\n";
        }
    };


Or is it possible to somehow increase the timeout time for the get_Response
method?

thanks in advance!


regards

Rohit



More information about the Bioperl-l mailing list