[Bioperl-l] How to set "complexity" param using EUtilities

Phillip San Miguel pmiguel at purdue.edu
Wed Mar 24 10:59:50 EDT 2010


Sorry, I got that backwards. The default is "0", apparently. But to get 
entrez-like performance you want "complexity" to be set to "1".
Phillip

Phillip San Miguel wrote:
> Just a little FYI that might help someone using GenBank efetch (here 
> with bioperl EUtilities) and, contrary to expectation, retrieving a 
> bunch of accessions (or GIs) when that single accession is what is 
> wanted. The trick is to change the "complexity" parameter from its 
> apparent default of "1" to "0".
>
> Actually, this parameter might be worth adding to the HOWTO because it 
> causes the EUtilities efetch to perform similar to a normal Entrez 
> search. Which, to me, would be the expected behavior.
>
> Details below.
>
> Some accessions/GIs appear to be embedded in bundles of related 
> sequences. Here is an example:
>
> gi|158819346|gb|EU011641.1|
>
>
> If I search Entrez Nucleotide
>
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&itool=toolbar
>
> with the either "158819346" (the GI) or "EU011641.1", I get a single 
> record for "Pachysolen tannophilus strain NRRL Y-2460 26S ribosomal 
> RNA gene, partial sequence". This what I want.
>
> If I use the following code derived from the Eutils HOWTO:
>
>   use Bio::DB::EUtilities;
>   use Bio::SeqIO;
>   my @ids;
>    my $id  ='gb|EU011641.1|';
>    push @ids   ,$id;
>    my $factory = Bio::DB::EUtilities->new(
>                         -eutil => 'efetch',
>                         -db => 'nucleotide',
>                         -rettype => 'genbank',
>                         -id => \@ids);
>
>    my $file = "test.gb";
>    $factory->get_Response(-file => $file);
>
> I get a bundle of accessions: EU011584-EU011663.
> Same result using the GI number instead.
>
> From reading:
>
> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html#seqparam 
>
>
> it looks like I would get what I want were I to set the efetch 
> "complexity" parameter to "1".
>
> But how do I set that parameter?  Below is how I did it. Not the most 
> efficient path, but did not take that long to traverse...
>
> The HowTo does not mention it. I usually look to the the Deobfuscator:
>
> http://bioperl.org/cgi-bin/deob_interface.cgi
>
> to help me when I want some documentation for a method. But this is a 
> parameter not a class. What class sets this parameter? Not sure. So I 
> googled:
>
> complexity eutil site:bioperl.org
>
> The top ranked hit is actually to the deprecated 1.5.2 version of 
> EUtilities. But the 2nd hit is to the (auto generatated?) email posted 
> to the bioperl-guts email list by Chris Fields upon his commit of the 
> new EUtilities overhaul:
>
> http://bioperl.org/pipermail/bioperl-guts-l/2007-May/025717.html
>
>
> From here it looks like the obvious way to set the parameter would be 
> possible. And indeed:
>
>
>   use Bio::DB::EUtilities;
>   use Bio::SeqIO;
>   my @ids;
>    my $id  ='gb|EU011641.1|';
>    push @ids   ,$id;
>    my $factory = Bio::DB::EUtilities->new(
>                         -eutil => 'efetch',
>                         -db => 'nucleotide',
>                         -rettype => 'genbank',
>                         -complexity    =>1,
>                         -id => \@ids);
>
>    my $file = "test.gb";
>    $factory->get_Response(-file => $file);
>
> works!
>
> Also a good idea to add -email parameter so that Genbank might 
> chastise me via email, rather than banning my IP, if I try to send 
> more than 100 requests in a series outside of the acceptable 9PM-5AM 
> Eastern Time hours.
>
> Phillip
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list