[Bioperl-l] How to set "complexity" param using EUtilities

Phillip San Miguel pmiguel at purdue.edu
Wed Mar 24 09:49:55 EDT 2010


Just a little FYI that might help someone using GenBank efetch (here 
with bioperl EUtilities) and, contrary to expectation, retrieving a 
bunch of accessions (or GIs) when that single accession is what is 
wanted. The trick is to change the "complexity" parameter from its 
apparent default of "1" to "0".

Actually, this parameter might be worth adding to the HOWTO because it 
causes the EUtilities efetch to perform similar to a normal Entrez 
search. Which, to me, would be the expected behavior.

Details below.

Some accessions/GIs appear to be embedded in bundles of related 
sequences. Here is an example:

gi|158819346|gb|EU011641.1|


If I search Entrez Nucleotide

http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar

with the either "158819346" (the GI) or "EU011641.1", I get a single 
record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal RNA 
gene, partial sequence". This what I want.

If I use the following code derived from the Eutils HOWTO:

   use Bio::DB::EUtilities;
   use Bio::SeqIO;
   my @ids;
    my $id  ='gb|EU011641.1|';
    push @ids   ,$id;
    my $factory = Bio::DB::EUtilities->new(
                         -eutil => 'efetch',
                         -db => 'nucleotide',
                         -rettype => 'genbank',
                         -id => \@ids);

    my $file = "test.gb";
    $factory->get_Response(-file => $file);

I get a bundle of accessions: EU011584-EU011663.
Same result using the GI number instead.

 From reading:

http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam

it looks like I would get what I want were I to set the efetch 
"complexity" parameter to "1".

But how do I set that parameter?  Below is how I did it. Not the most 
efficient path, but did not take that long to traverse...

The HowTo does not mention it. I usually look to the the Deobfuscator:

http://bioperl.org/cgi-bin/deob_interface.cgi

to help me when I want some documentation for a method. But this is a 
parameter not a class. What class sets this parameter? Not sure. So I 
googled:

complexity eutil site:bioperl.org

The top ranked hit is actually to the deprecated 1.5.2 version of 
EUtilities. But the 2nd hit is to the (auto generatated?) email posted 
to the bioperl-guts email list by Chris Fields upon his commit of the 
new EUtilities overhaul:

http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html


 From here it looks like the obvious way to set the parameter would be 
possible. And indeed:


   use Bio::DB::EUtilities;
   use Bio::SeqIO;
   my @ids;
    my $id  ='gb|EU011641.1|';
    push @ids   ,$id;
    my $factory = Bio::DB::EUtilities->new(
                         -eutil => 'efetch',
                         -db => 'nucleotide',
                         -rettype => 'genbank',
                         -complexity    =>1,
                         -id => \@ids);

    my $file = "test.gb";
    $factory->get_Response(-file => $file);

works!

Also a good idea to add -email parameter so that Genbank might chastise 
me via email, rather than banning my IP, if I try to send more than 100 
requests in a series outside of the acceptable 9PM-5AM Eastern Time hours.

Phillip





More information about the Bioperl-l mailing list