[Biojava-l] Wrapping SimpleGappedSequence
Ditlev Egeskov Brodersen
deb at mb.au.dk
Mon Nov 19 09:46:01 EST 2007
Dear Richard and all,
I've been dissecting the delegation problem encountered when instantiating
SimpleGappedSequence(Sequence) with an already gapped sequence. The
constructor calls the parent SimpleGappedSymbolList(), which in Richard's
CVS update of 161107 now contains a separate overloaded constructor for the
gapped case:
public SimpleGappedSymbolList(GappedSymbolList gappedSource)
However, when instantiating a new SimpleGappedSequence based on an
existing gapped sequence (with several blocks), the blocks were lost.
After checking the path of code execution it appeared that for some
reason, the old SimpleGappedSymbolList(SymbolList) was called. So I modified
SimpleGappedSequence.java to include an overloaded constructor also for the
descendant class, identical to the other constructor but with a
GappedSequence argument:
public SimpleGappedSequence(GappedSequence seq) {
super(seq);
this.sequence = seq;
createOnUnderlying = false;
}
Now, the correct parent constructor
(SimpleGappedSymbolList(GappedSymbolList)) was called. However, there are
two other problems with the new SimpleGappedSymbolList constructor that
needs to be corrected for it to work as expected: First, the initial
introduction of a single, large block is missing from the new code, so
insert:
Block b = new Block(1, length, 1, length);
blocks.add(b);
Secondly, the code for transferring the gaps from the sequence string need
to use two separate indices, otherwise the gaps will be placed wrongly
because their position is affected by previously inserted gaps:
int n=1;
for(int i=1;i<=this.length();i++) {
if(this.alpha.getGapSymbol().equals(gappedSource.symbolAt(i)))
this.addGappInSource(n);
else
n++;
In other words, the index giving the position of the gaps should only
increment when there are NO gaps at the corresponding position in the gapped
string.
Following these changes, the GappedSequenceTest program from last week now
works as expected:
aSymbolList = MSE--KLMPRT---TWAKG
aSequence = MSE--KLMPRT---TWAKG
Gaps are not parsed when a SimpleGappedSequence is constructed from a
gapped Sequence object:
aGapped = MSE--KLMPRT---TWAKG
Gapped position 10 = Plain position 10
aSymbolList = MSEKLMPRTTWAKG
aSequence = MSEKLMPRTTWAKG
Gaps introduced through addGapsInSource work ok:
aGapped = MS--EKLMPR---TTWAKG
Gapped position 10 = Plain position 8
Now a new SimpleGappedSequence object is created from the previous one:
aGapped2 = MS--EKLMPR---TTWAKG
Gapped position 10 = Plain position 8
-- Ditlev
--
Ditlev E. Brodersen, Ph.D.
Lektor, Associate Professor
Department of Molecular Biology Office: +45 89425259
University of Aarhus Lab: +45 89425022
Gustav Wieds Vej 10c Fax: +45 86123178
DK-8000 Aarhus C Email: deb at mb.au.dk
Denmark Lab WWW: www.bioxray.dk/~deb
-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
bounces at lists.open-bio.org] On Behalf Of Richard Holland
Sent: 18 November 2007 18:12
To: Ditlev Egeskov Brodersen
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] Wrapping SimpleGappedSequence
Interesting stuff. I'm not sure why it isn't working so I'll have to
have
a closer look.
I'm currently on annual leave but will investigate when I return (Nov
27th).
cheers,
Richard
On Sun, November 18, 2007 10:50 am, Ditlev Egeskov Brodersen wrote:
Hi Richard,
I thought that was also correct what you say, but I can't get it to
work.
Below is a small test program to check this. First, I create a
SimpleGappedSequence through Text with
gaps-SymbolList-Sequence-GappedSequence. Gaps are there but not
"understood", as expected. Next, I create the same sequence non-
gapped in
the above way, then introduce gaps with addGapsInSource. A gapped
location
is now properly translated to a non-gapped sequence position.
Finally, I
create a new SimpleGappedSequence based on the working one - as you
can
see
the gaps are still there but not "understood"...
aSymbolList = MSE--KLMPRT---TWAKG
aSequence = MSE--KLMPRT---TWAKG
Gaps are not parsed when a SimpleGappedSequence is constructed from a
gapped
Sequence object:
aGapped = MSE--KLMPRT---TWAKG
Gapped position 10 = Plain position 10
aSymbolList = MSEKLMPRTTWAKG
aSequence = MSEKLMPRTTWAKG
Gaps introduced through addGapsInSource work ok:
aGapped = MS--EKLMPR---TTWAKG
Gapped position 10 = Plain position 8
Now a new SimpleGappedSequence object is created from the previous
one:
aGapped2 = MS--EKLMPR---TTWAKG
Gapped position 10 = Plain position 10
This should have been compiled with the new biojava.jar of 161107
(updated
via CVS), but perhaps I made a mistake updating?
Any clues?
Thanks,
Ditlev
---
package gappedsequencetest;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.impl.*;
import org.biojava.bio.symbol.*;
public class Main {
public static void main(String[] args) {
SymbolList aSymbolList = null;
try {
aSymbolList =
ProteinTools.createProtein("MSE--KLMPRT---TWAKG");
}
catch(BioException ex) {}
System.out.println("aSymbolList = " +
aSymbolList.seqString());
Sequence aSequence = new SimpleSequence(aSymbolList, "",
"mySequence", null);
System.out.println("aSequence = " + aSequence.seqString() +
"\n");
SimpleGappedSequence aGapped = new
SimpleGappedSequence(aSequence);
System.out.println("Gaps are not parsed when a
SimpleGappedSequence
is constructed from a gapped Sequence object:");
System.out.println("aGapped = " + aGapped.seqString());
System.out.println("Gapped position 10 = Plain position " +
aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
try {
aSymbolList =
ProteinTools.createProtein("MSEKLMPRTTWAKG");
}
catch(BioException ex) {}
System.out.println("aSymbolList = " +
aSymbolList.seqString());
aSequence = new SimpleSequence(aSymbolList, "", "mySequence",
null);
System.out.println("aSequence = " + aSequence.seqString() +
"\n");
aGapped = new SimpleGappedSequence(aSequence);
aGapped.addGapsInSource(9, 3);
aGapped.addGapsInSource(3, 2);
System.out.println("Gaps introduced through addGapsInSource
work
ok:");
System.out.println("aGapped = " + aGapped.seqString());
System.out.println("Gapped position 10 = Plain position " +
aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
SimpleGappedSequence aGapped2 = new
SimpleGappedSequence(aGapped);
System.out.println("Now a new SimpleGappedSequence object is
created
from the previous one:");
System.out.println("aGapped2 = " + aGapped2.seqString());
System.out.println("Gapped position 10 = Plain position " +
aGapped2.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
}
}
--
Ditlev Egeskov Brodersen
Lektor
Bakkefaldet 30, Hasle
8210 Århus V
www.lindeman-brodersen.dk
-----Original Message-----
From: Richard Holland [mailto:holland at ebi.ac.uk]
Sent: 16 November 2007 13:46
To: Ditlev Egeskov Brodersen
Cc: biojava-l at biojava.org
Subject: Re: Wrapping SimpleGappedSequence
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
SimpleGappedSequence extends SimpleGappedSymbolList, and the
constructor
delegates to the SimpleGappedSymbolList constructor.
When you extend SimpleGappedSequence you should delegate in your new
constructor to the existing SimpleGappedSequence constructor, which
in
turn will delegate as above and preserve the gaps.
By passing any object which implements GappedSymbolList to the
SimpleGappedSequence constructor, e.g. SimpleGappedSequence or
SimpleGappedSymbolList, it will automatically choose the new
constructor
from SimpleGappedSymbolList which you hopefully should be able to
see
in
the code you have just checked out. If passed any other
non-GappedSymbolList object, it will use the old constructor that
already existed from before.
cheers,
Richard
Ditlev Egeskov Brodersen wrote:
Hi again,
I updated CVS and got the new SimpleGappedSymbolList class, but
there
seems to be no changes to the SimpleGappedSequence class, which is
the one I
need to extend...have I missed something?
Ditlev
--
Ditlev E. Brodersen, Ph.D.
Lektor, Associate Professor
Department of Molecular Biology Office: +45 89425259
University of Aarhus Lab: +45 89425022
Gustav Wieds Vej 10c Fax: +45 86123178
DK-8000 Aarhus C Email: deb at mb.au.dk
Denmark Lab WWW: www.bioxray.dk/~deb
-----Original Message-----
From: Richard Holland [mailto:holland at ebi.ac.uk]
Sent: 16 November 2007 11:47
To: Ditlev Egeskov Brodersen
Cc: biojava-l at biojava.org
Subject: Re: Wrapping SimpleGappedSequence
The easiest way is simply for me to alter the constructor to
SimpleGappedSequence (and equivalently to SimpleGappedSymbolList)
to
copy all gaps if passed another instance of GappedSymbolList as
the
parameter. I've just done this in CVS so you should be able to
update
your copy and observe the new behaviour.
cheers,
Richard
Ditlev Egeskov Brodersen wrote:
Hi again,
thanks for the info - will do the check just to be proper. I
have
another
question: In my application, I would like to wrap the retrieved
SimpleGappedSequence objects inside another object that extends
the
functionality with application-specific stuff. Ideally, I would
do
this by
extending the SimpleGappedSequence object and create it by
passing
the
SimpleGappedSequence from the alignment import to the
constructor
of
the
parent, like so:
class AlignedSequence extends SimpleGappedSequence {
public AlignedSequence(SimpleGappedSequence aGapped) {
super(aGapped);
}
..custom stuff..
}
However, the problem is that there is only one constructor for
the
SimpleGappedSequence, one which takes a simple Sequence object.
I
can
pass
the derived class alright, but all gap information is lost
again,
presumably
because the SimpleGappedSequence constructor just takes out the
seqString()
and puts it into its own sequence object.
Shouldn't the constructor of the SimpleGappedSequence class
recognise
when a
derived (and gapped) sequence object is passed, and process it
accordingly?
As it stands, I am forced to include the SimpleGappedSequence
as a
private
member of the AlignedSequence class, which is not near as nice
since
all
statement using the class will have to do something like
class AlignedSequence extends SimpleGappedSequence {
private SimpleGappedSequence gapped_sequence;
public AlignedSequence(SimpleGappedSequence aGapped) {
gapped_sequence = aGapped;
}
public SimpleGappedSequence getGappedSequence() {
return(gapped_sequence);
}
..custom stuff..
}
...
AlignedSequence aAligned = new AlignedSequence(aGapped);
aAligned.getGappedSequence().seqString();
rather than simply:
AlignedSequence aAligned = new AlignedSequence(aGapped);
aAligned.seqString();
In other words, is there any solution with the current setup
that
would
allow me to extend SimpleGappedSequence and not loose the gap
information?
-- Ditlev
--
Ditlev E. Brodersen, Ph.D.
Lektor, Associate Professor
Department of Molecular Biology Office: +45 89425259
University of Aarhus Lab: +45 89425022
Gustav Wieds Vej 10c Fax: +45 86123178
DK-8000 Aarhus C Email: deb at mb.au.dk
Denmark Lab WWW: www.bioxray.dk/~deb
-----Original Message-----
From: Richard Holland [mailto:holland at ebi.ac.uk]
Sent: 16 November 2007 10:50
To: Ditlev Egeskov Brodersen
Cc: biojava-l at biojava.org
Subject: Re: [Biojava-l] Parsing exising gaps
The returned gapped sequences are all properly set up with
gaps,
name etc.
But as for other users, I think there may be some problems,
since
the
SimpleAlignment object only has a general symbol list
iterator,
the
user
will have to cast each statement extracting a sequence
object,
and
SimpleSequence aSimple =
(SimpleSequence)aSequences.next();
returns an ClassCastException at run time. So old code might
not
run
with
the update as far as I can see.
This is true. However, such code would be unsupported by us as
the
API
clearly states that SimpleAlignment returns SymbolList
instances,
and
does not make any guarantees about the exact implementation
details
of
the objects it returns. To attempt to cast it to anything other
than
SymbolList would be a mistake! (Although actually it is now
returning
a
guarantee of GappedSymbolList, which is what your code can now
take
advantage of). To assume it will return SimpleSequence is
outside
the
behaviour defined by the API and therefore should not be relied
upon.
A more correct behaviour would be to test each item returned:
SymbolList symlist = aSequences.next();
if (symlist instanceof SimpleSequence) {
SimpleSequence seq = (SimpleSequence)symlist;
// Do simple-sequence stuff
} else {
// Do something else!
}
In future, I will modify the API to change the SymbolList
guarantee
to
a
GappedSymbolList guarantee, but I can't do this right now as
this
really
would break everyone's code!
We are currently planning a redesign as you may be aware, so
issues
like
this will hopefully be resolved as part of that process. For a
start,
if
we use Java 5 generics in future as we plan, we can strictly
specify
what kinds of objects will be returned by things such as the
alignment
API, making it easier for us to enforce API-compliant behaviour
in
user's code.
cheers,
Richard
- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416
http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHPZEf4C5LeMEKA/QRAr/JAJ4p/DvZRqkCwPqgKNkcY0LLJvnanQCeJcWx
H0QV01cFreNi1SNLRPbhepg=
=023Y
-----END PGP SIGNATURE-----
--
Richard Holland
BioMart (http://www.biomart.org/)
EMBL-EBI
Hinxton, Cambridgeshire CB10 1SD, UK
_______________________________________________
Biojava-l mailing list - Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list