[Bioperl-l] Bio::DB::GenBank batch mode usage

Chris Fields cjfields at illinois.edu
Thu Jul 2 15:29:29 EDT 2009


If you are just downloading the records to a file it might be better  
to retrieve the raw records using EUtilities, providing you have  
either the accession number or the GI.  If downloading files via  
Bio::DB::GenBank, it requires a preparse and write to file via  
Bio::SeqIO.

---------------------------

use Bio::DB::EUtilities;
use Bio::SeqIO;

my @ids = (); # your GI/acc here

my $factory = Bio::DB::EUtilities->new(
    -eutil => 'efetch',
    -db    => 'nucleotide',
    -rettype => 'genbank',
    -id => \@ids);

$factory->get_Response(-file => "records.gb");

---------------------------

If you have a long lost of IDs you can use epost first, then efetch  
using the search history.  This page has a few recipe scripts:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

chris

On Jul 2, 2009, at 1:50 PM, John Tyree wrote:

> I'm trying to use Bio::DB::GenBank to download a large number of files
> by accession number. The docs say not to do this in normal mode to
> reduce server load. There is some kind of helper function associated
> with this.
>
>   %params = Bio::DB::GenBank->get_params('batch');
>
> But I don't understand how to use it. If you pass the hash using:
>
>    Bio::DB::GenBank->new(%params);
>
> it raises the following and dies:
>
> --------------------- WARNING ---------------------
> MSG: invalid retrieval type tool must be one of  
> (pipeline,io_string,tempfile
> ---------------------------------------------------
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: seq_start() must be integer value if set
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357
> STACK: Bio::DB::NCBIHelper::seq_start
> /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416
> STACK: Bio::DB::NCBIHelper::new
> /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117
> STACK: Find_Patient_By_AccNo.pl:93
>
> There is a deprecated method called get_Stream_by_batch() but how does
> one achieve batch mode using the proper get_Stream_by_id() ?
>
> Thanks,
> John
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list