[Bioperl-l] Fwd: Why cannot I get sequences for some entries?

Warren Gallin wgallin at ualberta.ca
Thu Aug 22 19:25:05 UTC 2013



> I think I've seen this before.
> 
> NCBI has changed the way that highly redundant protein sequences from bacterial genomes are stored.  Although a sequence appears when you access the NCBI web site, that protein sequence is not retrieved by the up-to-now-functional BioPerl approaches.
> 
> The give-away is the line:
> 
> CONTIG      join(WP_011348599.1:1..564)
> 
> The WP designation is for these problematic sequences.
> 
> THe work-around that I used was to do the sequence retrieval within an eval block and if there was no sequence forthcoming, then use the gi number to retrieve the sequence in fast format and grab it that way.
> 
> Not pretty, but it will make your pipeline work.
> 
> Given that this is the standard way of things at NCBI, would it be a good thing if those wiser than me added this extra check and alternative retrieval to the GenPept object class?
> 
> Warren Gallin
> 
> On 2013-08-21, at 9:06 PM, cacaucenturion2 <cacaucenturion2 at gmail.com> wrote:
> 
>> Hi,
>> 
>> Thanks for your help! I used GenPept but it did not work (also did not work when I was using Bio::DB::GenPept). I am pasting the code as follows.
>> 
>> use Bio::DB::GenPept;
>> $gb = Bio::DB::GenPept->new();
>> $seq = $gb->get_Seq_by_id('162138530');
>> print $seq->seq."\n";
>> 
>> But it would work if I want to get the id of this sequence by typing 
>> print $seq->display_id."\n"
>> 
>> 发件人: Jason Stajich
>> 发送时间: 2013-08-23 01:01
>> 收件人: cacaucenturion2
>> 抄送: Bioperl-l
>> 主题: Re: [Bioperl-l] Why cannot I get sequences for some entries?
>> are you using Bio::DB::GenPept as your object since it is a protein ID.
>> 
>> On Aug 19, 2013, at 1:31 AM, cacaucenturion2 <cacaucenturion2 at gmail.com> wrote:
>> 
>>> Hi all, 
>>> 
>>> I tried 
>>> 
>>> my $seq = $gb->get_Seq_by_gi('162138530');
>>>              print $seq->seq."\n";
>>> 
>>> I am sure that the sequence whose GI is 162138530 does exist in the database (http://www.ncbi.nlm.nih.gov/protein/162138543). However, no sequences could be given. Has anyone encountered the similar problem? Thanks.
>>> 
>>> Sincerely yours,
>>> Cacau
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 





More information about the Bioperl-l mailing list