[Bioperl-l] Downloading from dbEST by taxon range

Albert Vilella avilella at gmail.com
Sun Jan 3 04:08:33 EST 2010


Thanks Jason!
For the sake of completion, here is the script I needed:

---------------------
#!/usr/bin/perl
use strict;

use Bio::SeqIO;
use Bio::DB::Taxonomy;
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;
use Bio::SeqIO;
use Getopt::Long;

my $keyword_type = 'EST';
my $outdir = '.';
my $taxon_name = undef;
my $db_type = 'nucest';

GetOptions('keyword_type:s' => \$keyword_type,
           't|taxon_name:s' => \$taxon_name,
           'db_type:s' => \$db_type,
           'outdir:s' => \$outdir);

my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]";
my $db = Bio::DB::Query::GenBank->new
  (-db => $db_type,
   -query => $query_string,
   -mindate => '2007',
   -maxdate => '2010');

my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g;
my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta";
my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta');

print $db->count,"\n";
my $gb = Bio::DB::GenBank->new();
my $stream = $gb->get_Stream_by_query($db);
while (my $seq = $stream->next_seq) {
  # Filtering reads shorter than 800
  next unless (length($seq->seq) > 800);
  $out->write_seq($seq);
}
$out->close;
---------------------

On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich <jason at bioperl.org> wrote:
> DId you try Bio::DB::Query::GenBank ?
> You'd want to use -db => 'nucest' and then you just put in an Entrez query
> as per the example.  you can include dates in the query so you can do
> updates to your locally retrieved data in a script that runs periodically.
>
> -jason
> On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:
>
>> Hi all and happy 2010 for those that follow the Gregorian calendar,
>>
>> A question that is a bit in between bioperl and NCBI. I would like to use
>> bioperl to download sequences fom dbEST. For that, my idea is to use
>> Bio::DB::Genbank and get the sequences by gi id.
>>
>> Now, I want my script to download sequences for a given NCBI taxonomy
>> clade.
>>
>> For example, if I want to download all fish (clupeocephala) sequences in
>> dbEST,
>> I can browse it around with the dbEST webpage using
>> "clupeocephala[taxonomy]",
>> so I am thinking there should be a way to do it programmatically.
>>
>> How can I query NCBI dbEST through bioperl to give me the list of GI ids I
>> am
>> looking for given a taxon id?
>>
>> Thanks in advance,
>>
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>




More information about the Bioperl-l mailing list