[Biojava-l] GappedSymbolList behaviour is wierd, bug ?
Kalle Näslund
Kalle.Naslund@genpat.uu.se
Wed, 12 Dec 2001 14:05:22 +0100
HI!
I am writing some small app that uses GappedSymbolList and i see some
wierd behaviour.
The first "problem" is when i have a GappedSymbolList and i insert a gap
into the View sequence ( the one that shows gaps ). As long as i insert
a gap/ gaps at a positin where there isnt any gap, all is fine. On the
other hand, if i inserta gap at a position where there is a gap, the
gap gets inserted into the NEXT block of gaps, and if there isnt any
next block of gaps, the gap gets appended at the end of the sequence. A
simple text example will describe this much better. the example basicly
just inserts a gap at position 3 in the view, a couple of time, and then
prints the output, and it looks like this:
aattggcc Initial sequence
aa-ttggcc 1 gap inserted at position 3
aa-ttggcc- 1 additional gap inserted at position 3
aa-ttggcc-- 1 additional gap inserted at position 3
aa-ttggcc--- 1 additional gap inserted at position 3
for me, this is not the way i think anyone would expect it to work. I
think most people would just expect that gap insertion should work the
same, irrespectively of what symbol is at the position where the gap
gets inserted. And that the end result should look like this.
aa----ttggcc
The second ting i am having some thoughts about is the viewToSource
function, if you try to convert from view to source coordinates, and the
view coordinate contains a gap, you get a return value of -1. The
JavaDoc dont mention anything about what happens when you try to go from
view to source coordinates and the view coordinate contains a gap, but
it returns a -1 and that is ok i guess. But, this gives me lots of
problems, as i have users graphicly specify an intervall on the
GapedSequenceList, and i then want the source coordinates. If the user
chooses one endpoint that is a gap, i will have to start scaning symbol
for symbol, in the View coordinates, and then use the first non gap
symbol.So would it be wrong, to change the viewToSource method to not
return -1, but to actualy return the source position where the gap is
inserted, multiplied by -1 ? This would most likely dont break any code
that just checks if viewToSource returns -1 as most people will have
done it like if( x < 0 ) and not like if( x == -1 ). And then you can
get a meaningfull conversoin from view to source, and if you dont care,
you can only chec if the return value is negaitve.
to clarify what i mean, i will give a short eample here aswell.
aa---ttggcc
as it is now, viewToSoruce( 4 ) will return -1, i would propose that it
should return -3 instead, because it is at position three in the source
sequence, the gaps are inserted. And the value shold be negative, to
indicate that there is no direct link between the view position and the
source, as the view is a gap.
I do understand that there might be things this little proposal does to
other parts, that are not wanted, and therefore, this should only be
seen as a little question / proposal, and nothing more, if there is a
reason to only return -1 and nothing else, i will just do the dirty
solution of walking along the view sequence until i find a non gap symbol.
Anyway, i have tested this on linux ( jdk 1.3.1 from sun ) and windows (
jdk 1.4.0b3 ), using both the binary biojava-20010920.jar release aswell
as one of the older releases, and the behaviour is the same in all
combinations.
to finnish this off, i would just like to say thanks to all who have
contributed to biojava as it simplifies many nasty tasks a lot.
Sincerely, Kalle Näslund