[Bioperl-l] genbank
Dimitar Kenanov
dimitark at bii.a-star.edu.sg
Tue Nov 30 02:54:50 UTC 2010
On 11/30/2010 10:19 AM, Jason Stajich wrote:
> Dimitar -
>
> In terms of your question - a GenBank db query previously (ie 4-5
> years ago when this was written) WOULD NOT return a sequence if a
> GenPept ID was specified so we had to have a separate module for
> GenBank and GenPept db querying since there was a different set of
> parameters - I think that changed so that most of the queries can run
> through GenBank
>
> I see that must have been improved at NCBI. For the record if you
> want the full GenPept record with features and annotations you just
> request a different db, in this case 'gb' for genbank instead of the
> fasta source
> As in: http://gist.github.com/721012
>
> But maybe you already figured it out?
>
> -jason
> Chris Fields wrote, On 11/29/10 6:39 AM:
>> On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote:
>>
>>> Hi again,
>>> it seems that when i download (with 'download_query_genbank.pl') the
>>> whole proteome from NCBI in fasta format it is first being
>>> downloaded and from it is being created some kind of
>>> SeqFastaSpeedFactory and after that from it is being copied to the
>>> output file. But i want to download and write to output file one by
>>> one so i can see the download progress(which is working for genbank
>>> data).
>>>
>>> Its frustrating :)
>>>
>>> Any ideas where to look for solution
>>> Cheers
>>> Dimitar
>>
>> You can't do this with the default script, but you can use a modified
>> version and, where you are retrieving a sequence stream, in the last
>> four lines:
>>
>> my $stream = $dbh->get_Stream_by_query($query);
>> while( my $seq = $stream->next_seq ) {
>> $out->write_seq($seq);
>> }
>>
>> insert an iterator in the loop that indicates progress. Realize the
>> sequence data is processed through Bio::SeqIO, so it won't be exactly
>> the same as what is retrieved from GenBank, but it should be very close.
>>
>> If you want raw sequence, you can use Bio::DB::EUtilities, but it's a
>> bit more complicated.
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
Thank you Jason,
i figured that out yes :)
I know am wandering how to get the GI list so i can download by ID or
ACC and not by query when i download fasta. I have the simple code:
-------------
my $query_str='Caenorhabditis elegans[organism] AND refseq[filter]';
my $query=Bio::DB::Query::GenBank->new(-db=>'protein',
-query=>$query_str);
my $count=$query->count;
my @ids=$query->ids; <------- error msg here
------------
but i get an error msg:
MSG: Id list has been truncated even after maxids requested.
How can i get the ID/ACC? Any idea?
Thank you
Dimitar
--
Dimitar Kenanov
Post doctoral fellow
Bioinformatics Institute
A*STAR Singapore
tel: +65 6478 8514
email: dimitark at bii.a-star.edu.sg
More information about the Bioperl-l
mailing list