[Biojava-l] Wrapping SimpleGappedSequence
Richard Holland
holland at ebi.ac.uk
Mon Nov 26 07:55:23 EST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I have made the changes you suggest below in CVS. Hopefully it will work
for you now.
cheers,
Richard
Ditlev Egeskov Brodersen wrote:
> Dear Richard and all,
>
> I've been dissecting the delegation problem encountered when instantiating
> SimpleGappedSequence(Sequence) with an already gapped sequence. The
> constructor calls the parent SimpleGappedSymbolList(), which in Richard's
> CVS update of 161107 now contains a separate overloaded constructor for the
> gapped case:
>
> public SimpleGappedSymbolList(GappedSymbolList gappedSource)
>
> However, when instantiating a new SimpleGappedSequence based on an
> existing gapped sequence (with several blocks), the blocks were lost.
>
> After checking the path of code execution it appeared that for some
> reason, the old SimpleGappedSymbolList(SymbolList) was called. So I modified
> SimpleGappedSequence.java to include an overloaded constructor also for the
> descendant class, identical to the other constructor but with a
> GappedSequence argument:
>
> public SimpleGappedSequence(GappedSequence seq) {
> super(seq);
> this.sequence = seq;
> createOnUnderlying = false;
> }
>
> Now, the correct parent constructor
> (SimpleGappedSymbolList(GappedSymbolList)) was called. However, there are
> two other problems with the new SimpleGappedSymbolList constructor that
> needs to be corrected for it to work as expected: First, the initial
> introduction of a single, large block is missing from the new code, so
> insert:
>
> Block b = new Block(1, length, 1, length);
> blocks.add(b);
>
> Secondly, the code for transferring the gaps from the sequence string need
> to use two separate indices, otherwise the gaps will be placed wrongly
> because their position is affected by previously inserted gaps:
>
> int n=1;
> for(int i=1;i<=this.length();i++) {
> if(this.alpha.getGapSymbol().equals(gappedSource.symbolAt(i)))
> this.addGappInSource(n);
> else
> n++;
>
> In other words, the index giving the position of the gaps should only
> increment when there are NO gaps at the corresponding position in the gapped
> string.
>
> Following these changes, the GappedSequenceTest program from last week now
> works as expected:
>
> aSymbolList = MSE--KLMPRT---TWAKG
> aSequence = MSE--KLMPRT---TWAKG
>
> Gaps are not parsed when a SimpleGappedSequence is constructed from a
> gapped Sequence object:
> aGapped = MSE--KLMPRT---TWAKG
> Gapped position 10 = Plain position 10
>
> aSymbolList = MSEKLMPRTTWAKG
> aSequence = MSEKLMPRTTWAKG
>
> Gaps introduced through addGapsInSource work ok:
> aGapped = MS--EKLMPR---TTWAKG
> Gapped position 10 = Plain position 8
>
> Now a new SimpleGappedSequence object is created from the previous one:
> aGapped2 = MS--EKLMPR---TTWAKG
> Gapped position 10 = Plain position 8
>
> -- Ditlev
>
> --
>
> Ditlev E. Brodersen, Ph.D.
> Lektor, Associate Professor
>
> Department of Molecular Biology Office: +45 89425259
> University of Aarhus Lab: +45 89425022
> Gustav Wieds Vej 10c Fax: +45 86123178
> DK-8000 Aarhus C Email: deb at mb.au.dk
> Denmark Lab WWW: www.bioxray.dk/~deb
>
>
> -----Original Message-----
> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
> bounces at lists.open-bio.org] On Behalf Of Richard Holland
> Sent: 18 November 2007 18:12
> To: Ditlev Egeskov Brodersen
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Wrapping SimpleGappedSequence
>
> Interesting stuff. I'm not sure why it isn't working so I'll have to
> have
> a closer look.
>
> I'm currently on annual leave but will investigate when I return (Nov
> 27th).
>
> cheers,
> Richard
>
> On Sun, November 18, 2007 10:50 am, Ditlev Egeskov Brodersen wrote:
> Hi Richard,
>
> I thought that was also correct what you say, but I can't get it to
> work.
> Below is a small test program to check this. First, I create a
> SimpleGappedSequence through Text with
> gaps-SymbolList-Sequence-GappedSequence. Gaps are there but not
> "understood", as expected. Next, I create the same sequence non-
> gapped in
> the above way, then introduce gaps with addGapsInSource. A gapped
> location
> is now properly translated to a non-gapped sequence position.
> Finally, I
> create a new SimpleGappedSequence based on the working one - as you
> can
> see
> the gaps are still there but not "understood"...
>
> aSymbolList = MSE--KLMPRT---TWAKG
> aSequence = MSE--KLMPRT---TWAKG
>
> Gaps are not parsed when a SimpleGappedSequence is constructed from a
> gapped
> Sequence object:
> aGapped = MSE--KLMPRT---TWAKG
> Gapped position 10 = Plain position 10
>
> aSymbolList = MSEKLMPRTTWAKG
> aSequence = MSEKLMPRTTWAKG
>
> Gaps introduced through addGapsInSource work ok:
> aGapped = MS--EKLMPR---TTWAKG
> Gapped position 10 = Plain position 8
>
> Now a new SimpleGappedSequence object is created from the previous
> one:
> aGapped2 = MS--EKLMPR---TTWAKG
> Gapped position 10 = Plain position 10
>
> This should have been compiled with the new biojava.jar of 161107
> (updated
> via CVS), but perhaps I made a mistake updating?
>
> Any clues?
>
> Thanks,
>
> Ditlev
>
> ---
>
> package gappedsequencetest;
>
> import org.biojava.bio.*;
> import org.biojava.bio.seq.*;
> import org.biojava.bio.seq.impl.*;
> import org.biojava.bio.symbol.*;
>
> public class Main {
>
> public static void main(String[] args) {
> SymbolList aSymbolList = null;
> try {
> aSymbolList =
> ProteinTools.createProtein("MSE--KLMPRT---TWAKG");
>
> }
> catch(BioException ex) {}
>
> System.out.println("aSymbolList = " +
> aSymbolList.seqString());
>
> Sequence aSequence = new SimpleSequence(aSymbolList, "",
> "mySequence", null);
> System.out.println("aSequence = " + aSequence.seqString() +
> "\n");
>
> SimpleGappedSequence aGapped = new
> SimpleGappedSequence(aSequence);
> System.out.println("Gaps are not parsed when a
> SimpleGappedSequence
> is constructed from a gapped Sequence object:");
> System.out.println("aGapped = " + aGapped.seqString());
> System.out.println("Gapped position 10 = Plain position " +
> aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>
> try {
> aSymbolList =
> ProteinTools.createProtein("MSEKLMPRTTWAKG");
> }
> catch(BioException ex) {}
>
> System.out.println("aSymbolList = " +
> aSymbolList.seqString());
>
> aSequence = new SimpleSequence(aSymbolList, "", "mySequence",
> null);
> System.out.println("aSequence = " + aSequence.seqString() +
> "\n");
>
> aGapped = new SimpleGappedSequence(aSequence);
> aGapped.addGapsInSource(9, 3);
> aGapped.addGapsInSource(3, 2);
> System.out.println("Gaps introduced through addGapsInSource
> work
> ok:");
> System.out.println("aGapped = " + aGapped.seqString());
> System.out.println("Gapped position 10 = Plain position " +
> aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>
> SimpleGappedSequence aGapped2 = new
> SimpleGappedSequence(aGapped);
> System.out.println("Now a new SimpleGappedSequence object is
> created
> from the previous one:");
> System.out.println("aGapped2 = " + aGapped2.seqString());
> System.out.println("Gapped position 10 = Plain position " +
> aGapped2.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
> }
>
> }
>
> --
>
> Ditlev Egeskov Brodersen
> Lektor
> Bakkefaldet 30, Hasle
> 8210 Århus V
>
> www.lindeman-brodersen.dk
>
>
> -----Original Message-----
> From: Richard Holland [mailto:holland at ebi.ac.uk]
> Sent: 16 November 2007 13:46
> To: Ditlev Egeskov Brodersen
> Cc: biojava-l at biojava.org
> Subject: Re: Wrapping SimpleGappedSequence
>
> SimpleGappedSequence extends SimpleGappedSymbolList, and the
> constructor
> delegates to the SimpleGappedSymbolList constructor.
>
> When you extend SimpleGappedSequence you should delegate in your new
> constructor to the existing SimpleGappedSequence constructor, which
>> in
> turn will delegate as above and preserve the gaps.
>
> By passing any object which implements GappedSymbolList to the
> SimpleGappedSequence constructor, e.g. SimpleGappedSequence or
> SimpleGappedSymbolList, it will automatically choose the new
> constructor
> from SimpleGappedSymbolList which you hopefully should be able to
>> see
> in
> the code you have just checked out. If passed any other
> non-GappedSymbolList object, it will use the old constructor that
> already existed from before.
>
> cheers,
> Richard
>
> Ditlev Egeskov Brodersen wrote:
> Hi again,
>
> I updated CVS and got the new SimpleGappedSymbolList class, but
> there
> seems to be no changes to the SimpleGappedSequence class, which is
> the one I
> need to extend...have I missed something?
>
> Ditlev
>
> --
>
> Ditlev E. Brodersen, Ph.D.
> Lektor, Associate Professor
>
> Department of Molecular Biology Office: +45 89425259
> University of Aarhus Lab: +45 89425022
> Gustav Wieds Vej 10c Fax: +45 86123178
> DK-8000 Aarhus C Email: deb at mb.au.dk
> Denmark Lab WWW: www.bioxray.dk/~deb
>
>
> -----Original Message-----
> From: Richard Holland [mailto:holland at ebi.ac.uk]
> Sent: 16 November 2007 11:47
> To: Ditlev Egeskov Brodersen
> Cc: biojava-l at biojava.org
> Subject: Re: Wrapping SimpleGappedSequence
>
> The easiest way is simply for me to alter the constructor to
> SimpleGappedSequence (and equivalently to SimpleGappedSymbolList)
>> to
> copy all gaps if passed another instance of GappedSymbolList as
>> the
> parameter. I've just done this in CVS so you should be able to
>> update
> your copy and observe the new behaviour.
>
> cheers,
> Richard
>
> Ditlev Egeskov Brodersen wrote:
> Hi again,
>
> thanks for the info - will do the check just to be proper. I
> have
> another
> question: In my application, I would like to wrap the retrieved
> SimpleGappedSequence objects inside another object that extends
> the
> functionality with application-specific stuff. Ideally, I would
>> do
> this by
> extending the SimpleGappedSequence object and create it by
>> passing
> the
> SimpleGappedSequence from the alignment import to the
>> constructor
> of
> the
> parent, like so:
>
> class AlignedSequence extends SimpleGappedSequence {
> public AlignedSequence(SimpleGappedSequence aGapped) {
> super(aGapped);
> }
>
> ..custom stuff..
> }
>
> However, the problem is that there is only one constructor for
>> the
> SimpleGappedSequence, one which takes a simple Sequence object.
>> I
> can
> pass
> the derived class alright, but all gap information is lost
>> again,
> presumably
> because the SimpleGappedSequence constructor just takes out the
> seqString()
> and puts it into its own sequence object.
>
> Shouldn't the constructor of the SimpleGappedSequence class
> recognise
> when a
> derived (and gapped) sequence object is passed, and process it
> accordingly?
> As it stands, I am forced to include the SimpleGappedSequence
>> as a
> private
> member of the AlignedSequence class, which is not near as nice
> since
> all
> statement using the class will have to do something like
>
> class AlignedSequence extends SimpleGappedSequence {
> private SimpleGappedSequence gapped_sequence;
>
> public AlignedSequence(SimpleGappedSequence aGapped) {
> gapped_sequence = aGapped;
> }
>
> public SimpleGappedSequence getGappedSequence() {
> return(gapped_sequence);
> }
>
> ..custom stuff..
> }
>
> ...
>
> AlignedSequence aAligned = new AlignedSequence(aGapped);
> aAligned.getGappedSequence().seqString();
>
> rather than simply:
>
> AlignedSequence aAligned = new AlignedSequence(aGapped);
> aAligned.seqString();
>
> In other words, is there any solution with the current setup
>> that
> would
> allow me to extend SimpleGappedSequence and not loose the gap
> information?
> -- Ditlev
>
> --
>
> Ditlev E. Brodersen, Ph.D.
> Lektor, Associate Professor
>
> Department of Molecular Biology Office: +45 89425259
> University of Aarhus Lab: +45 89425022
> Gustav Wieds Vej 10c Fax: +45 86123178
> DK-8000 Aarhus C Email: deb at mb.au.dk
> Denmark Lab WWW: www.bioxray.dk/~deb
>
>
> -----Original Message-----
> From: Richard Holland [mailto:holland at ebi.ac.uk]
> Sent: 16 November 2007 10:50
> To: Ditlev Egeskov Brodersen
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Parsing exising gaps
>
> The returned gapped sequences are all properly set up with
> gaps,
> name etc.
> But as for other users, I think there may be some problems,
> since
> the
> SimpleAlignment object only has a general symbol list
>> iterator,
> the
> user
> will have to cast each statement extracting a sequence
>> object,
> and
>
> SimpleSequence aSimple =
> (SimpleSequence)aSequences.next();
>
> returns an ClassCastException at run time. So old code might
> not
> run
> with
> the update as far as I can see.
> This is true. However, such code would be unsupported by us as
>> the
> API
> clearly states that SimpleAlignment returns SymbolList
>> instances,
> and
> does not make any guarantees about the exact implementation
> details
> of
> the objects it returns. To attempt to cast it to anything other
> than
> SymbolList would be a mistake! (Although actually it is now
> returning
> a
> guarantee of GappedSymbolList, which is what your code can now
> take
> advantage of). To assume it will return SimpleSequence is
>> outside
> the
> behaviour defined by the API and therefore should not be relied
> upon.
>
> A more correct behaviour would be to test each item returned:
>
> SymbolList symlist = aSequences.next();
> if (symlist instanceof SimpleSequence) {
> SimpleSequence seq = (SimpleSequence)symlist;
> // Do simple-sequence stuff
> } else {
> // Do something else!
> }
>
> In future, I will modify the API to change the SymbolList
> guarantee
> to
> a
> GappedSymbolList guarantee, but I can't do this right now as
>> this
> really
> would break everyone's code!
>
> We are currently planning a redesign as you may be aware, so
> issues
> like
> this will hopefully be resolved as part of that process. For a
> start,
> if
> we use Java 5 generics in future as we plan, we can strictly
> specify
> what kinds of objects will be returned by things such as the
> alignment
> API, making it easier for us to enforce API-compliant behaviour
>> in
> user's code.
>
> cheers,
> Richard
>
> --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
>
> http://www.biomart.org/
> http://www.biojava.org/
> --
> Richard Holland
> BioMart (http://www.biomart.org/)
> EMBL-EBI
> Hinxton, Cambridgeshire CB10 1SD, UK
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416
http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHSsI64C5LeMEKA/QRAg21AKCieEvT2KaWBFdqLFUtxazhHXmD2wCgiRwk
Bz79hrJxD/eZrrCUXUAh758=
=0Jpp
-----END PGP SIGNATURE-----
More information about the Biojava-l
mailing list