[Bioperl-l] Bio::DB::GenBank question (acc vs. version)

bill at genenformics.com bill at genenformics.com
Wed Sep 16 17:22:56 UTC 2009


>
> As for generic accession w/o version, efetch does support it but it
> does have problems (pulling up more than one sequence in rare cases,
> for instance).
>

This is probably because NCBI ID servers are not completely synchronized
or are in the process of synchronization. get_Seq_by_acc is not as safe as
other functions.

Bill

>
> On Sep 13, 2009, at 10:47 AM, bill at genenformics.com wrote:
>
>> I would like to make a few comments about get_Seq_by_version and
>> get_Seq_by_acc. Although both functions use the same NCBI eUtils
>> API, they
>> are interpreted differently for a Seq_id with version or without
>> version.
>>
>> 1. If the Seq_id has a version, GenBank ID server will locate
>> corresponding GI and emit the correct sequence.
>> 2. If the Seq_id does not have a version, GBDataLoader  will try to
>> find
>> the latest version number for that Seq_id, which is relatively
>> slower and
>> the version number the ID server find out may NOT always be the
>> latest.
>>
>> IMHO, for both efficiency and consistency,
>> get_Seq_by_gi > get_Seq_by_version >> get_Seq_by_acc
>>
>> Bill
>>
>>
>>>
>>> It looks like get Bio::DB::GenBank::get_Seq_by_{version,acc} are
>>> functionally identical.  They seem to trickle down to the same place
>>> and walking through these two requests yields almost identical http
>>> requests:
>>>
>>>  $db->get_Seq_by_version('J00522.1')
>>>  GET
>>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gbwithparts&db=nucleotide&tool=bioperl&id=J00522.1&usehistory=n
>>>
>>>  $db->get_Seq_by_acc('J00522')
>>>  GET
>>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&rettype=gbwithparts&db=nucleotide&tool=bioperl&id=J00522&usehistory=n
>>>
>>> The only difference that I can see is that they index into different
>>> secions of %PARAMSTRING defined in Bio::DB::GenBank, but those
>>> sections contain the same information.
>>>
>>> I'd like a general purpose tool that does The Right Thing whether
>>> there's a .1 on the end of an identifier or not, and am just trying
>>> to
>>> make sure I'm not doing something troublesome.
>>>
>>> Am I correct about the above?
>>>
>>> While I'm at it, I think that the comment
>>>
>>>  # note that get_Stream_by_version is not implemented
>>>
>>> in Bio::DB::GenBank was made obsolete by whoever commented out the
>>>
>>>  $self->throw(...)
>>>
>>> in get_Stream_by_version in Bio::WebDBSeqI.pm.
>>>
>>> I'll happily commit the trivial doc fix if no one shoots down the
>>> idea. (can't help big, might as well help small...).
>>>
>>> Thanks,
>>>
>>> g.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>





More information about the Bioperl-l mailing list