[Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase

Fri Aug 21 08:50:08 EDT 2009

Hi Rohit-
Re: timeout, you could try
$factory->ua->timeout($number_greater_than_180_sec)
before issuing the request.
cheers MAJ
----- Original Message ----- 
From: "Rohit Ghai" <ghai.rohit at gmail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Friday, August 21, 2009 7:34 AM
Subject: [Bioperl-l] downloading multiple contigs from ncbi nucleotidedatabase


> Hello all
>
> I would like to download the wgs sequences of the unfinished genomes from
> ncbi.
> (genomes in progress) from http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
>
> here's an example accession
>
> NZ_ACVD00000000
>
> and here's the link to the accession at genbank
>
> http://www.ncbi.nlm.nih.gov/nuccore/NZ_ACVD00000000
>
> This record contains the accessions that belong to this record in the
> following line in the genbank output
>
> WGS         NZ_ACVD01000001-NZ_ACVD01000139
>
> The        NZ_ACVD01000001-NZ_ACVD01000139  is the range of accession
> numbers that are
>
> are specified by this range.
>
> here's a link
>
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=Nucleotide&cmd=Search&term=NZ_ACVD01000001:NZ_ACVD01000139[PACC]
>
>
> The bioperl related question is...
>
> Since these are unassembled genomes, there are several contigs for each one,
> and they all available in this record.
>
> Is it possible to download a range without trying to recreate each accession
> number?
>
> on the other hand, it is possible to download each individually , this would
> mean making the following
>
> NZ_ACVD01000001
> NZ_ACVD01000002
> NZ_ACVD01000003
> .
> .
> .
> NZ_ACVD01000139
>
> from  NZ_ACVD01000001-NZ_ACVD01000139
>
>
> I can recreate these numbers and download each one separately. However,
> sometimes I get a timeout exception
> and the whole thing stops.
>
> the code ( copied shamelessly from the bioperl website, works great to get
> single accessions)
>
> my $id = "NZ_ACVD00000000";
> my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch',
>                                                                   -db =>
> 'nucleotide',
>                                                                   -id =>
> $id,
>                                                                  -rettype
> => 'gbwithparts');
>
> $factory->get_Response(-file => 'fullcontig.gb');
>
>
> I did try and catch the exceptions from the get_Response..but its not
> working as expected... maybe someone can point out what I'm doing wrong
> here. For some reason, the code never seems to go any print statement in the
> catch construct...
>
> $ele = "somecontig id";
>
>    try {
>        print "\t[$numtries] TRYING TO DOWNLOAD $ele...\n";
>        $factory->get_Response(-file => "$genbank_file");
>
>    } catch Bio::Root::Exception with {
>            my $err = shift;
>        if (! defined $err) {
>            print "MAY HAVE DOWNLOADED $ele..\n";
>        } else {
>                print "PROBABLE TIMEOUT ERROR\n";
>                print "$err\n";
>        }
>    };
>
>
> Or is it possible to somehow increase the timeout time for the get_Response
> method?
>
> thanks in advance!
>
>
> regards
>
> Rohit
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>