[Biojava-l] Wrapping SimpleGappedSequence
Richard Holland
holland at ebi.ac.uk
Sun Nov 18 17:12:04 UTC 2007
Interesting stuff. I'm not sure why it isn't working so I'll have to have
a closer look.
I'm currently on annual leave but will investigate when I return (Nov 27th).
cheers,
Richard
On Sun, November 18, 2007 10:50 am, Ditlev Egeskov Brodersen wrote:
> Hi Richard,
>
> I thought that was also correct what you say, but I can't get it to
> work.
> Below is a small test program to check this. First, I create a
> SimpleGappedSequence through Text with
> gaps->SymbolList->Sequence->GappedSequence. Gaps are there but not
> "understood", as expected. Next, I create the same sequence non-gapped in
> the above way, then introduce gaps with addGapsInSource. A gapped location
> is now properly translated to a non-gapped sequence position. Finally, I
> create a new SimpleGappedSequence based on the working one - as you can
> see
> the gaps are still there but not "understood"...
>
> aSymbolList = MSE--KLMPRT---TWAKG
> aSequence = MSE--KLMPRT---TWAKG
>
> Gaps are not parsed when a SimpleGappedSequence is constructed from a
> gapped
> Sequence object:
> aGapped = MSE--KLMPRT---TWAKG
> Gapped position 10 = Plain position 10
>
> aSymbolList = MSEKLMPRTTWAKG
> aSequence = MSEKLMPRTTWAKG
>
> Gaps introduced through addGapsInSource work ok:
> aGapped = MS--EKLMPR---TTWAKG
> Gapped position 10 = Plain position 8
>
> Now a new SimpleGappedSequence object is created from the previous one:
> aGapped2 = MS--EKLMPR---TTWAKG
> Gapped position 10 = Plain position 10
>
> This should have been compiled with the new biojava.jar of 161107 (updated
> via CVS), but perhaps I made a mistake updating?
>
> Any clues?
>
> Thanks,
>
> Ditlev
>
> ---
>
> package gappedsequencetest;
>
> import org.biojava.bio.*;
> import org.biojava.bio.seq.*;
> import org.biojava.bio.seq.impl.*;
> import org.biojava.bio.symbol.*;
>
> public class Main {
>
> public static void main(String[] args) {
> SymbolList aSymbolList = null;
> try {
> aSymbolList =
> ProteinTools.createProtein("MSE--KLMPRT---TWAKG");
>
> }
> catch(BioException ex) {}
>
> System.out.println("aSymbolList = " + aSymbolList.seqString());
>
> Sequence aSequence = new SimpleSequence(aSymbolList, "",
> "mySequence", null);
> System.out.println("aSequence = " + aSequence.seqString() +
> "\n");
>
> SimpleGappedSequence aGapped = new
> SimpleGappedSequence(aSequence);
> System.out.println("Gaps are not parsed when a
> SimpleGappedSequence
> is constructed from a gapped Sequence object:");
> System.out.println("aGapped = " + aGapped.seqString());
> System.out.println("Gapped position 10 = Plain position " +
> aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>
> try {
> aSymbolList = ProteinTools.createProtein("MSEKLMPRTTWAKG");
> }
> catch(BioException ex) {}
>
> System.out.println("aSymbolList = " + aSymbolList.seqString());
>
> aSequence = new SimpleSequence(aSymbolList, "", "mySequence",
> null);
> System.out.println("aSequence = " + aSequence.seqString() +
> "\n");
>
> aGapped = new SimpleGappedSequence(aSequence);
> aGapped.addGapsInSource(9, 3);
> aGapped.addGapsInSource(3, 2);
> System.out.println("Gaps introduced through addGapsInSource work
> ok:");
> System.out.println("aGapped = " + aGapped.seqString());
> System.out.println("Gapped position 10 = Plain position " +
> aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
>
> SimpleGappedSequence aGapped2 = new SimpleGappedSequence(aGapped);
> System.out.println("Now a new SimpleGappedSequence object is
> created
> from the previous one:");
> System.out.println("aGapped2 = " + aGapped2.seqString());
> System.out.println("Gapped position 10 = Plain position " +
> aGapped2.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
> }
>
> }
>
> --
>
> Ditlev Egeskov Brodersen
> Lektor
> Bakkefaldet 30, Hasle
> 8210 Århus V
>
> www.lindeman-brodersen.dk
>
>
>> -----Original Message-----
>> From: Richard Holland [mailto:holland at ebi.ac.uk]
>> Sent: 16 November 2007 13:46
>> To: Ditlev Egeskov Brodersen
>> Cc: biojava-l at biojava.org
>> Subject: Re: Wrapping SimpleGappedSequence
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> SimpleGappedSequence extends SimpleGappedSymbolList, and the
>> constructor
>> delegates to the SimpleGappedSymbolList constructor.
>>
>> When you extend SimpleGappedSequence you should delegate in your new
>> constructor to the existing SimpleGappedSequence constructor, which in
>> turn will delegate as above and preserve the gaps.
>>
>> By passing any object which implements GappedSymbolList to the
>> SimpleGappedSequence constructor, e.g. SimpleGappedSequence or
>> SimpleGappedSymbolList, it will automatically choose the new
>> constructor
>> from SimpleGappedSymbolList which you hopefully should be able to see
>> in
>> the code you have just checked out. If passed any other
>> non-GappedSymbolList object, it will use the old constructor that
>> already existed from before.
>>
>> cheers,
>> Richard
>>
>> Ditlev Egeskov Brodersen wrote:
>> > Hi again,
>> >
>> > I updated CVS and got the new SimpleGappedSymbolList class, but
>> there
>> > seems to be no changes to the SimpleGappedSequence class, which is
>> the one I
>> > need to extend...have I missed something?
>> >
>> > Ditlev
>> >
>> > --
>> >
>> > Ditlev E. Brodersen, Ph.D.
>> > Lektor, Associate Professor
>> >
>> > Department of Molecular Biology Office: +45 89425259
>> > University of Aarhus Lab: +45 89425022
>> > Gustav Wieds Vej 10c Fax: +45 86123178
>> > DK-8000 Aarhus C Email: deb at mb.au.dk
>> > Denmark Lab WWW: www.bioxray.dk/~deb
>> >
>> >
>> >> -----Original Message-----
>> >> From: Richard Holland [mailto:holland at ebi.ac.uk]
>> >> Sent: 16 November 2007 11:47
>> >> To: Ditlev Egeskov Brodersen
>> >> Cc: biojava-l at biojava.org
>> >> Subject: Re: Wrapping SimpleGappedSequence
>> >>
>> > The easiest way is simply for me to alter the constructor to
>> > SimpleGappedSequence (and equivalently to SimpleGappedSymbolList) to
>> > copy all gaps if passed another instance of GappedSymbolList as the
>> > parameter. I've just done this in CVS so you should be able to update
>> > your copy and observe the new behaviour.
>> >
>> > cheers,
>> > Richard
>> >
>> > Ditlev Egeskov Brodersen wrote:
>> >>>> Hi again,
>> >>>>
>> >>>> thanks for the info - will do the check just to be proper. I
>> have
>> > another
>> >>>> question: In my application, I would like to wrap the retrieved
>> >>>> SimpleGappedSequence objects inside another object that extends
>> the
>> >>>> functionality with application-specific stuff. Ideally, I would do
>> > this by
>> >>>> extending the SimpleGappedSequence object and create it by passing
>> > the
>> >>>> SimpleGappedSequence from the alignment import to the constructor
>> of
>> > the
>> >>>> parent, like so:
>> >>>>
>> >>>> class AlignedSequence extends SimpleGappedSequence {
>> >>>> public AlignedSequence(SimpleGappedSequence aGapped) {
>> >>>> super(aGapped);
>> >>>> }
>> >>>>
>> >>>> ..custom stuff..
>> >>>> }
>> >>>>
>> >>>> However, the problem is that there is only one constructor for the
>> >>>> SimpleGappedSequence, one which takes a simple Sequence object. I
>> can
>> > pass
>> >>>> the derived class alright, but all gap information is lost again,
>> > presumably
>> >>>> because the SimpleGappedSequence constructor just takes out the
>> > seqString()
>> >>>> and puts it into its own sequence object.
>> >>>>
>> >>>> Shouldn't the constructor of the SimpleGappedSequence class
>> recognise
>> > when a
>> >>>> derived (and gapped) sequence object is passed, and process it
>> > accordingly?
>> >>>> As it stands, I am forced to include the SimpleGappedSequence as a
>> > private
>> >>>> member of the AlignedSequence class, which is not near as nice
>> since
>> > all
>> >>>> statement using the class will have to do something like
>> >>>>
>> >>>> class AlignedSequence extends SimpleGappedSequence {
>> >>>> private SimpleGappedSequence gapped_sequence;
>> >>>>
>> >>>> public AlignedSequence(SimpleGappedSequence aGapped) {
>> >>>> gapped_sequence = aGapped;
>> >>>> }
>> >>>>
>> >>>> public SimpleGappedSequence getGappedSequence() {
>> >>>> return(gapped_sequence);
>> >>>> }
>> >>>>
>> >>>> ..custom stuff..
>> >>>> }
>> >>>>
>> >>>> ...
>> >>>>
>> >>>> AlignedSequence aAligned = new AlignedSequence(aGapped);
>> >>>> aAligned.getGappedSequence().seqString();
>> >>>>
>> >>>> rather than simply:
>> >>>>
>> >>>> AlignedSequence aAligned = new AlignedSequence(aGapped);
>> >>>> aAligned.seqString();
>> >>>>
>> >>>> In other words, is there any solution with the current setup that
>> > would
>> >>>> allow me to extend SimpleGappedSequence and not loose the gap
>> > information?
>> >>>> -- Ditlev
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Ditlev E. Brodersen, Ph.D.
>> >>>> Lektor, Associate Professor
>> >>>>
>> >>>> Department of Molecular Biology Office: +45 89425259
>> >>>> University of Aarhus Lab: +45 89425022
>> >>>> Gustav Wieds Vej 10c Fax: +45 86123178
>> >>>> DK-8000 Aarhus C Email: deb at mb.au.dk
>> >>>> Denmark Lab WWW: www.bioxray.dk/~deb
>> >>>>
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: Richard Holland [mailto:holland at ebi.ac.uk]
>> >>>>> Sent: 16 November 2007 10:50
>> >>>>> To: Ditlev Egeskov Brodersen
>> >>>>> Cc: biojava-l at biojava.org
>> >>>>> Subject: Re: [Biojava-l] Parsing exising gaps
>> >>>>>
>> >>>>>>> The returned gapped sequences are all properly set up with
>> gaps,
>> >>>> name etc.
>> >>>>>>> But as for other users, I think there may be some problems,
>> since
>> > the
>> >>>>>>> SimpleAlignment object only has a general symbol list iterator,
>> > the
>> >>>> user
>> >>>>>>> will have to cast each statement extracting a sequence object,
>> and
>> >>>>>>>
>> >>>>>>> SimpleSequence aSimple =
>> (SimpleSequence)aSequences.next();
>> >>>>>>>
>> >>>>>>> returns an ClassCastException at run time. So old code might
>> not
>> > run
>> >>>> with
>> >>>>>>> the update as far as I can see.
>> >>>> This is true. However, such code would be unsupported by us as the
>> > API
>> >>>> clearly states that SimpleAlignment returns SymbolList instances,
>> and
>> >>>> does not make any guarantees about the exact implementation
>> details
>> > of
>> >>>> the objects it returns. To attempt to cast it to anything other
>> than
>> >>>> SymbolList would be a mistake! (Although actually it is now
>> returning
>> > a
>> >>>> guarantee of GappedSymbolList, which is what your code can now
>> take
>> >>>> advantage of). To assume it will return SimpleSequence is outside
>> the
>> >>>> behaviour defined by the API and therefore should not be relied
>> upon.
>> >>>>
>> >>>> A more correct behaviour would be to test each item returned:
>> >>>>
>> >>>> SymbolList symlist = aSequences.next();
>> >>>> if (symlist instanceof SimpleSequence) {
>> >>>> SimpleSequence seq = (SimpleSequence)symlist;
>> >>>> // Do simple-sequence stuff
>> >>>> } else {
>> >>>> // Do something else!
>> >>>> }
>> >>>>
>> >>>> In future, I will modify the API to change the SymbolList
>> guarantee
>> > to
>> >>>> a
>> >>>> GappedSymbolList guarantee, but I can't do this right now as this
>> >>>> really
>> >>>> would break everyone's code!
>> >>>>
>> >>>> We are currently planning a redesign as you may be aware, so
>> issues
>> >>>> like
>> >>>> this will hopefully be resolved as part of that process. For a
>> start,
>> >>>> if
>> >>>> we use Java 5 generics in future as we plan, we can strictly
>> specify
>> >>>> what kinds of objects will be returned by things such as the
>> > alignment
>> >>>> API, making it easier for us to enforce API-compliant behaviour in
>> >>>> user's code.
>> >>>>
>> >>>> cheers,
>> >>>> Richard
>>
>> - --
>> Richard Holland (BioMart)
>> EMBL EBI, Wellcome Trust Genome Campus,
>> Hinxton, Cambridgeshire CB10 1SD, UK
>> Tel. +44 (0)1223 494416
>>
>> http://www.biomart.org/
>> http://www.biojava.org/
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>
>> iD8DBQFHPZEf4C5LeMEKA/QRAr/JAJ4p/DvZRqkCwPqgKNkcY0LLJvnanQCeJcWx
>> H0QV01cFreNi1SNLRPbhepg=
>> =023Y
>> -----END PGP SIGNATURE-----
>
>
--
Richard Holland
BioMart (http://www.biomart.org/)
EMBL-EBI
Hinxton, Cambridgeshire CB10 1SD, UK
More information about the Biojava-l
mailing list