[Bioperl-l] Annotation-DBLink- version numbers repeating

Jason Stajich jason at bioperl.org
Thu Oct 19 17:44:51 UTC 2006


Yikes - I was worried that it might have been me.....

Okay I'll look into fixing it -- ChrisF - check in with me before  
diving in, in case I've gotten it done and I expect your enzyme  
assays might take up the time.

-jason
On Oct 19, 2006, at 10:11 AM, Hilmar Lapp wrote:

> Actually you did that Jason: http://tinyurl.com/ye2edk
>
> Apparently the motivation was to "parse swissprot fields in genpept  
> file (dbsource)"?
>
> It clearly looks wrong to add the version. You've probably had a  
> reason why you did this at the time but if we (you :) can't recover  
> that I guess it's best to just fix it to do the right thing (in  
> both places obviously).
>
> 	-hilmar
>
> On Oct 19, 2006, at 11:50 AM, Jason Stajich wrote:
>
>> Well there is explicit addition of the version to the primary id  
>> so it isn't so much a parsing error as a deliberate decision to  
>> append it.
>> see Bio::SeqIO::genbank
>>
>> to make the dblink
>>                                               $annotation- 
>> >add_Annotation
>>                                                     ('dblink',
>>                                                       
>> Bio::Annotation::DBLink->new
>>                                                      (-primary_id  
>> => $id . "." . $version,
>>                                                       -version =>  
>> $version,
>>                                                       -database =>  
>> $db,
>>                                                       -tagname =>  
>> 'dblink'));
>>
>> and the code to print the dblink back out in the writer already  
>> assumes the version number is appended...
>>
>>         foreach my $ref ( $seq->annotation->get_Annotations 
>> ('dblink') ) {
>>             # if ($ref->comment eq 'DBSOURCE') {
>>             $self->_print('DBSOURCE    accession ',
>>                           $ref->primary_id, "\n");
>>             # }
>>         }
>>
>> On Oct 19, 2006, at 6:56 AM, Hilmar Lapp wrote:
>>
>>> Here is the overload code:
>>>
>>> use overload '""' => sub {
>>> 	(($_[0]->database ? $_[0]->database . ':' : '' )
>>> 	. ($_[0]->primary_id ? $_[0]->primary_id : '')
>>> 	. ($_[0]->version ? '.' . $_[0]->version : ''))
>>> 	|| '' };
>>>
>>> Except that the last '||' is redundant and unnecessary (it either  
>>> does nothing or replaces an empty string with an empty string), I  
>>> don't see the potential for duplicating the version number here -  
>>> unless primary_id() did that, which I don't see it doing.
>>>
>>> So, to me this seems to come from a parsing error in the  
>>> beginning, rather than an erroneous mangling of version into  
>>> primary_id later.
>>>
>>> Is someone in the position to confirm this?
>>>
>>> 	-hilmar
>>>
>>> On Oct 19, 2006, at 1:00 AM, Jason Stajich wrote:
>>>
>>>> So I'm unsure what we should do here.
>>>>
>>>> We can certainly fix the problem which you report which is  
>>>> relying on
>>>> the "" method -- if you were to do instead:
>>>> print $_->database, ":", $_->primary_id, "\n";
>>>>
>>>> you'll get the right answer.  We at a minimum just fix the auto-
>>>> string converting method to do The Right Thing.
>>>>
>>>> But I am not sure if we should keep the version out of the  
>>>> primary_id
>>>> field.  This will require some rejiggering in several modules  
>>>> when it
>>>> comes to printing DBlinks and I don't want to do this before the
>>>> release. I also am not sure if there was an explicit reason why
>>>> someone did put the version information in the primary_id. (I  
>>>> hope it
>>>> wasn't me because I don't think I'm going to remember why).
>>>>
>>>> Does anyone else have a strong feeling?
>>>>
>>>> -jason
>>>> On Oct 17, 2006, at 12:01 PM, Erikjan wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I noticed a little problem with the Annotation "DBLink" from
>>>>> GenBank entries
>>>>>
>>>>> When I run:
>>>>>
>>>>> perl -MBio::DB::GenBank -e 'my $gi =
>>>>> 56205924;$db=Bio::DB::GenBank->new(-format => "genbank"); my  
>>>>> $seqio =
>>>>> $db->get_Stream_by_id($gi); my$seq = $seqio->next_seq; my
>>>>> $ac=$seq->annotation(); my @annotations = $ac->get_Annotations
>>>>> ("dblink");
>>>>> for(@annotations) { print $_, "\n";} print $INC{
>>>>> "Bio/Annotation/DBLink.pm" }, "\n"; '
>>>>>
>>>>> This yields:
>>>>>
>>>>>    GenBank:AL591065.17.17
>>>>>
>>>>> and the place where the used Bio/Annotation/DBLink.pm resides.
>>>>>
>>>>> Can others repeat this?
>>>>>
>>>>> I have dug into the source a little and Bio::Annotation::DBLink
>>>>> seems to
>>>>> be the place where this happens: it has a concatenation which  
>>>>> leads to
>>>>> that repeated version number.
>>>>>
>>>>> It this something that I should fix "client-side", so to speak, or
>>>>> is it
>>>>> worthwhile to add some logic to that concatenation to prevent  
>>>>> this?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich, PhD
>>>> Miller Research Fellow
>>>> University of California
>>>> Dept of Plant and Microbial Biology
>>>> 321 Koshland Hall #3102
>>>> Berkeley, CA 94720-3102
>>>> lab: 510.642.8441
>>>> http://pmb.berkeley.edu/~taylor/people/js.html
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> -- 
>>> ===========================================================
>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>> ===========================================================
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Jason Stajich, PhD
>> Miller Research Fellow
>> University of California
>> Dept of Plant and Microbial Biology
>> 321 Koshland Hall #3102
>> Berkeley, CA 94720-3102
>> lab: 510.642.8441
>> http://pmb.berkeley.edu/~taylor/people/js.html
>>
>>
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

--
Jason Stajich, PhD
Miller Research Fellow
University of California
Dept of Plant and Microbial Biology
321 Koshland Hall #3102
Berkeley, CA 94720-3102
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html





More information about the Bioperl-l mailing list