[Bioperl-l] accessing EMBL database

Chris Fields cjfields at illinois.edu
Thu Nov 19 13:47:16 UTC 2009


On Nov 19, 2009, at 7:23 AM, Hotz, Hans-Rudolf wrote:

> 
> Sandipan
> 
> 
>> I have 3 questions all related to the retreival of sequences from online
>> databases.
>> 
>> (1) I have been trying to download a protein sequence from the EMBL database
>> and trying to write the sequence into a text file, as a string. I am using the
>> following code: 
>> 
>> use Bio::DB::EMBL;
>> open b,">","s.txt";
>> $em_obj = Bio::DB::EMBL->new;
>>  $seq_obj = $em_obj->get_Seq_by_acc("CAB95729");
>>  $s_str = $seq_obj->seq;
>>  print b "$s_str\n";
>> close b;
>> 
>> The script is not working and gives the messege:
>> "MSG: EMBL stream with no ID. Not embl in my book
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm: 368
>> STACK: Bio::SeqIO::embl::next_seq C:/Perl/site/lib/Bio\SeqIO\embl.pm: 203
>> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc
>> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm: 194
>> STACK: trial2.pl"
>> 
>> I am not sure what this means. A similar version of the script works for the
>> Swissprot, GenBank and RefSeq databases but not for the EMBL. What is the way
>> around this so that I can download the embl sequence?
> 
> "CAB95729" is a protein sequence, ie a translation of the CDS of
> 'AJ277028.1'.
> 
> As far as I know, Bio::DB::EMBL is only designed to get EMBL entries, ie the
> nucleotides sequence
> 
> 
> 
>> (2) Also, is there anyway I can download sequences from DDBJ (database of
>> Japan)?
> 
> Unless, for network/speed reason, why do you want to download data from
> DDBJ? It contains the same data as GenBank and EMBL. Those three databases
> exchange their data on a daily basis.
> 
>> (3) Can GI numbers be used to retreive the sequences? If so then how?
> 
> Have you looked at Bio::DB::Eutilities ? See the 'HOWTOs'  page in the
> Bioperl Wiki
> 
> 
> 
> Regards, Hans
> 
> 
> 
>> Answers to these questions would be greatly appreciated. I am very new to
>> Perl/Bioperl and am not really familiar with the advanced programming
>> features, so I would need to your help to find my way out of this situation.
>> 
>> Many Thanks
>> Sandipan

To add to that, if you want the protein sequences as a Bio::Seq you can use Bio::DB::GenPept (Bio::DB::EUtilities will retrieve raw data only).

chris





More information about the Bioperl-l mailing list