[Bioperl-l] Re: Trying to get sequences from a DB

Jason Stajich jason@chg.mc.duke.edu
Sun, 2 Sep 2001 11:43:04 -0400 (EDT)


[cc-ing bioperl list since others may have similar problems]

This has to do with NCBI not providing a very reliable service through
their webservers to do sequence retrieval as this happens regularly to
many people.  We have efforts underway to use the netentrez retrieval
system by attaching to the ncbi toolkit, but I suspect that will be a
number of months off.

You can 
a) install bioperl-0.9.0 DEVELOPMENT (unstable) modules
   and use the module Bio::DB::EMBL.  This will run a query to a server in
   England (even further than Bethesda from Chile, but a more reliable
   query cgi-script)
b) download nr,nt or whatever db you are querying 
   ftp://ftp.ncbi.nlm.nih.gov/blast/db
   and use Bio::Index::Fasta module to build a local index file for
   querying
 
Are you shooting for sequence or sequence with annotation?

If you want sequence w/ annotation (b) will not work for you, you will
need to download an entire GenBank (ftp://ftp.ncbi.nlm.nih.gov/genbank)
division - primate, rodent, vertebrate, etc... depending on your sequence
type of interest.  The index these files with Bio::Index::GenBank. 

This process can take you a lot of diskspace and fair amount of time, but
if you are doing many queries will allow your queries to run extremely
fast.

I need to write up a tutorial on how the indexing is done, unless someone
reading this has already done so.  Sanger center folks use the Bio::Index
modules regularly so this has a good track record.

-jason
On Sat, 1 Sep 2001, Felipe Veloso wrote:
> 
> 	Hi Jason!
> 
> 	Thank you very much for your help. I installed all the modules you
> told me to, but when trying to run this EVEN SIMPLER script:
> 
> 	#!/usr/bin/perl
> 	use Bio::DB::GenBank;
> 	$gb = new Bio::DB::GenBank;
> 	$seq = $gb->get_Seq_by_acc('P11665');
> 	print "$seq\n";   
> 
> It returned this:
> 
> 	Bio::Seq::RichSeq=HASH(0x478e88)
> 
> 
> 
> Well, i tried some changes, such as (in line 3):
> 
> 	$gb = new Bio::DB::GenBank(-retrievaltype => 'tempfile' , 
>                                -format => 'Fasta');
> 
> And then then it printed this:
> 
> -------------------- EXCEPTION --------------------
> MSG: Attempting to set the sequence to [<html] which does not look healthy
> STACK Bio::PrimarySeq::seq /Library/Perl/Bio/PrimarySeq.pm:243
> STACK Bio::PrimarySeq::new /Library/Perl/Bio/PrimarySeq.pm:218
> STACK Bio::Seq::new /Library/Perl/Bio/Seq.pm:132
> STACK Bio::SeqIO::fasta::next_primary_seq
> /Library/Perl/Bio/SeqIO/fasta.pm:130
> STACK Bio::SeqIO::fasta::next_seq /Library/Perl/Bio/SeqIO/fasta.pm:85
> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
> /Library/Perl/Bio/DB/WebDBSeqI.pm:159
> STACK toplevel /Users/fveloso/bin/vemos.pl:5
> -------------------------------------------
> 
> Well, "does not look healthy" made me try many accesion numbers,
> and the results were the same, in all cases.
> 
> What could be wrong now?
> 
> 	Sorry if i'm bothering you, i hope this time it will work.
> 
> 
> 					Felipe 
> 
>