[Biojava-l] Wrapping SimpleGappedSequence

Ditlev Egeskov Brodersen deb at mb.au.dk
Mon Nov 19 09:46:01 EST 2007


Dear Richard and all,

  I've been dissecting the delegation problem encountered when instantiating
SimpleGappedSequence(Sequence) with an already gapped sequence. The
constructor calls the parent SimpleGappedSymbolList(), which in Richard's
CVS update of 161107 now contains a separate overloaded constructor for the
gapped case:

  public SimpleGappedSymbolList(GappedSymbolList gappedSource)

  However, when instantiating a new SimpleGappedSequence based on an
existing gapped sequence (with several blocks), the blocks were lost. 

  After checking the path of code execution it appeared that for some
reason, the old SimpleGappedSymbolList(SymbolList) was called. So I modified
SimpleGappedSequence.java to include an overloaded constructor also for the
descendant class, identical to the other constructor but with a
GappedSequence argument:

  public SimpleGappedSequence(GappedSequence seq) {
    super(seq);
    this.sequence = seq;
    createOnUnderlying = false;
  }

  Now, the correct parent constructor
(SimpleGappedSymbolList(GappedSymbolList)) was called. However, there are
two other problems with the new SimpleGappedSymbolList constructor that
needs to be corrected for it to work as expected: First, the initial
introduction of a single, large block is missing from the new code, so
insert:

  Block b = new Block(1, length, 1, length);
  blocks.add(b);

  Secondly, the code for transferring the gaps from the sequence string need
to use two separate indices, otherwise the gaps will be placed wrongly
because their position is affected by previously inserted gaps:

  int n=1;
  for(int i=1;i<=this.length();i++) {
    if(this.alpha.getGapSymbol().equals(gappedSource.symbolAt(i)))
      this.addGappInSource(n);
    else
      n++;

  In other words, the index giving the position of the gaps should only
increment when there are NO gaps at the corresponding position in the gapped
string.

  Following these changes, the GappedSequenceTest program from last week now
works as expected:

 aSymbolList = MSE--KLMPRT---TWAKG
 aSequence   = MSE--KLMPRT---TWAKG

 Gaps are not parsed when a SimpleGappedSequence is constructed from a 
 gapped Sequence object:
 aGapped     = MSE--KLMPRT---TWAKG
 Gapped position 10 = Plain position 10

 aSymbolList = MSEKLMPRTTWAKG
 aSequence   = MSEKLMPRTTWAKG

 Gaps introduced through addGapsInSource work ok:
 aGapped     = MS--EKLMPR---TTWAKG
 Gapped position 10 = Plain position 8

 Now a new SimpleGappedSequence object is created from the previous one:
 aGapped2    = MS--EKLMPR---TTWAKG
 Gapped position 10 = Plain position 8

  -- Ditlev

--
 
Ditlev E. Brodersen, Ph.D.
Lektor, Associate Professor
 
Department of Molecular Biology   Office:  +45 89425259
University of Aarhus              Lab:     +45 89425022
Gustav Wieds Vej 10c              Fax:     +45 86123178
DK-8000 Aarhus C                  Email:   deb at mb.au.dk
Denmark                           Lab WWW: www.bioxray.dk/~deb


 -----Original Message-----
 From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-
 bounces at lists.open-bio.org] On Behalf Of Richard Holland
 Sent: 18 November 2007 18:12
 To: Ditlev Egeskov Brodersen
 Cc: biojava-l at biojava.org
 Subject: Re: [Biojava-l] Wrapping SimpleGappedSequence
 
 Interesting stuff. I'm not sure why it isn't working so I'll have to
 have
 a closer look.
 
 I'm currently on annual leave but will investigate when I return (Nov
 27th).
 
 cheers,
 Richard
 
 On Sun, November 18, 2007 10:50 am, Ditlev Egeskov Brodersen wrote:
  Hi Richard,
 
    I thought that was also correct what you say, but I can't get it to
  work.
  Below is a small test program to check this. First, I create a
  SimpleGappedSequence through Text with
  gaps-SymbolList-Sequence-GappedSequence. Gaps are there but not
  "understood", as expected. Next, I create the same sequence non-
 gapped in
  the above way, then introduce gaps with addGapsInSource. A gapped
 location
  is now properly translated to a non-gapped sequence position.
 Finally, I
  create a new SimpleGappedSequence based on the working one - as you
 can
  see
  the gaps are still there but not "understood"...
 
  aSymbolList = MSE--KLMPRT---TWAKG
  aSequence   = MSE--KLMPRT---TWAKG
 
  Gaps are not parsed when a SimpleGappedSequence is constructed from a
  gapped
  Sequence object:
  aGapped     = MSE--KLMPRT---TWAKG
  Gapped position 10 = Plain position 10
 
  aSymbolList = MSEKLMPRTTWAKG
  aSequence   = MSEKLMPRTTWAKG
 
  Gaps introduced through addGapsInSource work ok:
  aGapped     = MS--EKLMPR---TTWAKG
  Gapped position 10 = Plain position 8
 
  Now a new SimpleGappedSequence object is created from the previous
 one:
  aGapped2    = MS--EKLMPR---TTWAKG
  Gapped position 10 = Plain position 10
 
  This should have been compiled with the new biojava.jar of 161107
 (updated
  via CVS), but perhaps I made a mistake updating?
 
  Any clues?
 
  Thanks,
 
    Ditlev
 
  ---
 
  package gappedsequencetest;
 
  import org.biojava.bio.*;
  import org.biojava.bio.seq.*;
  import org.biojava.bio.seq.impl.*;
  import org.biojava.bio.symbol.*;
 
  public class Main {
 
      public static void main(String[] args) {
          SymbolList aSymbolList = null;
          try {
              aSymbolList =
  ProteinTools.createProtein("MSE--KLMPRT---TWAKG");
 
          }
          catch(BioException ex) {}
 
          System.out.println("aSymbolList = " +
 aSymbolList.seqString());
 
          Sequence aSequence = new SimpleSequence(aSymbolList, "",
  "mySequence", null);
          System.out.println("aSequence   = " + aSequence.seqString() +
  "\n");
 
          SimpleGappedSequence aGapped = new
  SimpleGappedSequence(aSequence);
          System.out.println("Gaps are not parsed when a
  SimpleGappedSequence
  is constructed from a gapped Sequence object:");
          System.out.println("aGapped     = " + aGapped.seqString());
          System.out.println("Gapped position 10 = Plain position " +
  aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
 
          try {
              aSymbolList =
 ProteinTools.createProtein("MSEKLMPRTTWAKG");
          }
          catch(BioException ex) {}
 
          System.out.println("aSymbolList = " +
 aSymbolList.seqString());
 
          aSequence = new SimpleSequence(aSymbolList, "", "mySequence",
  null);
          System.out.println("aSequence   = " + aSequence.seqString() +
  "\n");
 
          aGapped = new SimpleGappedSequence(aSequence);
          aGapped.addGapsInSource(9, 3);
          aGapped.addGapsInSource(3, 2);
          System.out.println("Gaps introduced through addGapsInSource
 work
  ok:");
          System.out.println("aGapped     = " + aGapped.seqString());
          System.out.println("Gapped position 10 = Plain position " +
  aGapped.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
 
          SimpleGappedSequence aGapped2 = new
 SimpleGappedSequence(aGapped);
          System.out.println("Now a new SimpleGappedSequence object is
  created
  from the previous one:");
          System.out.println("aGapped2    = " + aGapped2.seqString());
          System.out.println("Gapped position 10 = Plain position " +
  aGapped2.gappedToLocation(new PointLocation(10)).getMin()+ "\n");
      }
 
  }
 
  --
 
  Ditlev Egeskov Brodersen
  Lektor
  Bakkefaldet 30, Hasle
  8210 Århus V
 
  www.lindeman-brodersen.dk
 
 
  -----Original Message-----
  From: Richard Holland [mailto:holland at ebi.ac.uk]
  Sent: 16 November 2007 13:46
  To: Ditlev Egeskov Brodersen
  Cc: biojava-l at biojava.org
  Subject: Re: Wrapping SimpleGappedSequence
 
  -----BEGIN PGP SIGNED MESSAGE-----
  Hash: SHA1
 
  SimpleGappedSequence extends SimpleGappedSymbolList, and the
  constructor
  delegates to the SimpleGappedSymbolList constructor.
 
  When you extend SimpleGappedSequence you should delegate in your new
  constructor to the existing SimpleGappedSequence constructor, which
 in
  turn will delegate as above and preserve the gaps.
 
  By passing any object which implements GappedSymbolList to the
  SimpleGappedSequence constructor, e.g. SimpleGappedSequence or
  SimpleGappedSymbolList, it will automatically choose the new
  constructor
  from SimpleGappedSymbolList which you hopefully should be able to
 see
  in
  the code you have just checked out. If passed any other
  non-GappedSymbolList object, it will use the old constructor that
  already existed from before.
 
  cheers,
  Richard
 
  Ditlev Egeskov Brodersen wrote:
   Hi again,
  
     I updated CVS and got the new SimpleGappedSymbolList class, but
  there
   seems to be no changes to the SimpleGappedSequence class, which is
  the one I
   need to extend...have I missed something?
  
     Ditlev
  
   --
  
   Ditlev E. Brodersen, Ph.D.
   Lektor, Associate Professor
  
   Department of Molecular Biology   Office:  +45 89425259
   University of Aarhus              Lab:     +45 89425022
   Gustav Wieds Vej 10c              Fax:     +45 86123178
   DK-8000 Aarhus C                  Email:   deb at mb.au.dk
   Denmark                           Lab WWW: www.bioxray.dk/~deb
  
  
   -----Original Message-----
   From: Richard Holland [mailto:holland at ebi.ac.uk]
   Sent: 16 November 2007 11:47
   To: Ditlev Egeskov Brodersen
   Cc: biojava-l at biojava.org
   Subject: Re: Wrapping SimpleGappedSequence
  
   The easiest way is simply for me to alter the constructor to
   SimpleGappedSequence (and equivalently to SimpleGappedSymbolList)
 to
   copy all gaps if passed another instance of GappedSymbolList as
 the
   parameter. I've just done this in CVS so you should be able to
 update
   your copy and observe the new behaviour.
  
   cheers,
   Richard
  
   Ditlev Egeskov Brodersen wrote:
   Hi again,
  
     thanks for the info - will do the check just to be proper. I
  have
   another
   question: In my application, I would like to wrap the retrieved
   SimpleGappedSequence objects inside another object that extends
  the
   functionality with application-specific stuff. Ideally, I would
 do
   this by
   extending the SimpleGappedSequence object and create it by
 passing
   the
   SimpleGappedSequence from the alignment import to the
 constructor
  of
   the
   parent, like so:
  
     class AlignedSequence extends SimpleGappedSequence {
       public AlignedSequence(SimpleGappedSequence aGapped) {
         super(aGapped);
       }
  
       ..custom stuff..
     }
  
   However, the problem is that there is only one constructor for
 the
   SimpleGappedSequence, one which takes a simple Sequence object.
 I
  can
   pass
   the derived class alright, but all gap information is lost
 again,
   presumably
   because the SimpleGappedSequence constructor just takes out the
   seqString()
   and puts it into its own sequence object.
  
   Shouldn't the constructor of the SimpleGappedSequence class
  recognise
   when a
   derived (and gapped) sequence object is passed, and process it
   accordingly?
   As it stands, I am forced to include the SimpleGappedSequence
 as a
   private
   member of the AlignedSequence class, which is not near as nice
  since
   all
   statement using the class will have to do something like
  
     class AlignedSequence extends SimpleGappedSequence {
       private SimpleGappedSequence gapped_sequence;
  
       public AlignedSequence(SimpleGappedSequence aGapped) {
         gapped_sequence = aGapped;
       }
  
       public SimpleGappedSequence getGappedSequence() {
         return(gapped_sequence);
     }
  
       ..custom stuff..
     }
  
     ...
  
     AlignedSequence aAligned = new AlignedSequence(aGapped);
     aAligned.getGappedSequence().seqString();
  
   rather than simply:
  
     AlignedSequence aAligned = new AlignedSequence(aGapped);
     aAligned.seqString();
  
   In other words, is there any solution with the current setup
 that
   would
   allow me to extend SimpleGappedSequence and not loose the gap
   information?
   --  Ditlev
  
   --
  
   Ditlev E. Brodersen, Ph.D.
   Lektor, Associate Professor
  
   Department of Molecular Biology   Office:  +45 89425259
   University of Aarhus              Lab:     +45 89425022
   Gustav Wieds Vej 10c              Fax:     +45 86123178
   DK-8000 Aarhus C                  Email:   deb at mb.au.dk
   Denmark                           Lab WWW: www.bioxray.dk/~deb
  
  
   -----Original Message-----
   From: Richard Holland [mailto:holland at ebi.ac.uk]
   Sent: 16 November 2007 10:50
   To: Ditlev Egeskov Brodersen
   Cc: biojava-l at biojava.org
   Subject: Re: [Biojava-l] Parsing exising gaps
  
     The returned gapped sequences are all properly set up with
  gaps,
   name etc.
   But as for other users, I think there may be some problems,
  since
   the
   SimpleAlignment object only has a general symbol list
 iterator,
   the
   user
   will have to cast each statement extracting a sequence
 object,
  and
  
         SimpleSequence aSimple =
  (SimpleSequence)aSequences.next();
  
   returns an ClassCastException at run time. So old code might
  not
   run
   with
   the update as far as I can see.
   This is true. However, such code would be unsupported by us as
 the
   API
   clearly states that SimpleAlignment returns SymbolList
 instances,
  and
   does not make any guarantees about the exact implementation
  details
   of
   the objects it returns. To attempt to cast it to anything other
  than
   SymbolList would be a mistake! (Although actually it is now
  returning
   a
   guarantee of GappedSymbolList, which is what your code can now
  take
   advantage of). To assume it will return SimpleSequence is
 outside
  the
   behaviour defined by the API and therefore should not be relied
  upon.
  
   A more correct behaviour would be to test each item returned:
  
   	SymbolList symlist = aSequences.next();
   	if (symlist instanceof SimpleSequence) {
   		SimpleSequence seq = (SimpleSequence)symlist;
   		// Do simple-sequence stuff
   	} else {
   		// Do something else!
   	}
  
   In future, I will modify the API to change the SymbolList
  guarantee
   to
   a
   GappedSymbolList guarantee, but I can't do this right now as
 this
   really
   would break everyone's code!
  
   We are currently planning a redesign as you may be aware, so
  issues
   like
   this will hopefully be resolved as part of that process. For a
  start,
   if
   we use Java 5 generics in future as we plan, we can strictly
  specify
   what kinds of objects will be returned by things such as the
   alignment
   API, making it easier for us to enforce API-compliant behaviour
 in
   user's code.
  
   cheers,
   Richard
 
  - --
  Richard Holland (BioMart)
  EMBL EBI, Wellcome Trust Genome Campus,
  Hinxton, Cambridgeshire CB10 1SD, UK
  Tel. +44 (0)1223 494416
 
  http://www.biomart.org/
  http://www.biojava.org/
  -----BEGIN PGP SIGNATURE-----
  Version: GnuPG v1.4.2.2 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
  iD8DBQFHPZEf4C5LeMEKA/QRAr/JAJ4p/DvZRqkCwPqgKNkcY0LLJvnanQCeJcWx
  H0QV01cFreNi1SNLRPbhepg=
  =023Y
  -----END PGP SIGNATURE-----
 
 
 
 
 --
 Richard Holland
 BioMart (http://www.biomart.org/)
 EMBL-EBI
 Hinxton, Cambridgeshire CB10 1SD, UK
 
 _______________________________________________
 Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
 http://lists.open-bio.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list