[Biojava-dev] Pairwise Alignment methods

Mark Schreiber markjschreiber at gmail.com
Fri Jan 25 05:40:20 UTC 2008


On Jan 25, 2008 12:25 PM, Felipe Albrecht <felipe.albrecht at gmail.com> wrote:
> Hello again :-)
>
>
> On Jan 25, 2008 1:43 AM, Mark Schreiber <markjschreiber at gmail.com> wrote:
> >
> > On Jan 25, 2008 10:06 AM, Felipe Albrecht <felipe.albrecht at gmail.com>
> wrote:
> > > Hi,
> > >
> > > is not possible to add into the SequenceAlignment interface something
> like:
> > > "double doAlignmentAndGetTheScore(SymbolList symbolList1, SymbolList
> > > symbolList2)"?
> > > Okay, the name is horrible, but you know what it means.
> > >

> If I'm correct, the SequenceAlignment is an abstract class, so, we can
> define there with an empty implementation, and SmithWaterman and others
> classes implements it. Anyone that implemented SequenceAlignment will not
> see anything different.

OK in that case adding the method would be OK, even desirable.
Probably this would be the best way to merge in your code.

> Okay, now I understood, biojava is not a library for bioinformatics
> applications, but for interconnect bioinformatics applications. So, biojava

Actually it is a library for bioinformatics that you use to build
bioinformatics applications.  It is possibly not as loosely coupled as
you might like for your purpose. It is definitely not as loosely
coupled as the Unix collection of executables or an SOA system.  Due
to heavy use of interfaces and abstract classes there is some
possibility for custom code.  For example you can recode the
SmithWaterman object to be optimal for your needs and then create an
application where you use your class in place of the normal biojava
SmithWaterman.

> in the actual way is not appropriate for the application that I am
> developing. I will develop some "optimized" classes and functions for my use
> and when it will be ready I will announce in this mailing list and ask if
> want to merge in biojava. If biojava team needs somebody to improve some
> biojava functions, specially sequences and sequences IO, can ask me.

Code improvements and optimizations are always welcome especially if
current interfaces can be preserved (that way the end user gets the
improvement without having to change their code).  I always advise
potential optimizers to use a profiler because it is sometimes hard to
predict how the JVM will behave, for example JIT compiling may mean
parts of code that are theoretically CPU intensive may not be the CPU
bottleneck when the JVM compiles them.

- Mark

>
> Thank you
>
> Felipe Albrecht
>
>
>
>
>
> >
> >
> >
> >
> >
> > - Mark
> >
> > > On Jan 24, 2008 11:26 PM, Mark Schreiber <markjschreiber at gmail.com>
> wrote:
> > > > Hi Felipe -
> > > >
> > > > I agree your method is more efficient but I think it violates the
> > > > SequenceAlignment interface which would cause compatibility problems.
> > > > I also wonder what should happen if a user calls the getAlignment()
> > > > method if you have only calculated a score.
> > > >
> > > > instanceof is potentially expensive but it is nothing compared to
> > > > actually performing the SmithWaterman.
> > > >
> > > > Biojava is somewhat memory heavy but this is largely because it is
> > > > object oriented. Certainly something in C would be lighter and faster
> > > > but the whole point in using Java is the relative benefits of object
> > > > oriented design.  While ultra optimized algorithms where once a major
> > > > feature of bioinformatics this is becoming less necessary as standard
> > > > desktops are now equivalent to the super computers of 5 years ago.
> > > >
> > > > I actually find the SW and NW to be reasonably fast. This is because
> > > > all the heavy lifting is done in loops that the JVM presumably
> > > > compiles and executes natively.
> > > >
> > > > - Mark
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Jan 25, 2008 3:40 AM, Felipe Albrecht <felipe.albrecht at gmail.com >
> > > wrote:
> > > > > Hello,
> > > > >
> > > > > I saw the commit and I think that this solution is not the better.
> > > > > I think it because you are creating internally two Sequence and
> probably
> > > the
> > > > > programmer will not use others alignment information,  he will use
> only
> > > the
> > > > > score.
> > > > >
> > > > > Because it, I think that if you have 2 SymbolList, just do the
> alignment
> > > and
> > > > > return the score, as I did.Otherwise, If the programmer want the
> "visual
> > > > > alignment", he should create externally the SimpleSequences, it is,
> not
> > > the
> > > > > method must do it.
> > > > >
> > > > > IMHO, one [serious] problem in biojava is the memory consumption, it
> > > have
> > > > > not "lightweight" classes or methods that do the things quickly.
> Because
> > > it,
> > > > > may be is a good choice to have a method that simply gives the
> alignment
> > > > > score, and not do the others things, like backtracking. Another
> think,
> > > the
> > > > > cost of the "instanceof" is high.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Felipe Albrecht
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Jan 24, 2008 11:35 AM, Mark Schreiber < markjschreiber at gmail.com
> >
> > > wrote:
> > > > > > Hi -
> > > > > >
> > > > > > I have just commited changes that let you use SymbolLists in all
> parts
> > > > > > of the NW and SW SequenceAlignment objects.
> > > > > >
> > > > > > As you suggested I made the matrix a method local variable. I also
> > > > > > removed calls to the garbage collector.
> > > > > >
> > > > > > This can be checked out from SVN.
> > > > > >
> > > > > > - Mark
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Jan 24, 2008 9:05 PM, Felipe Albrecht <
> felipe.albrecht at gmail.com >
> > > > > wrote:
> > > > > > > If you prefer, I can send a diff and should I do the same thing
> in
> > > > > > > SequenceAlignment and NeedlemanWunsch classes?
> > > > > > >
> > > > > > > Thank  you,
> > > > > > >
> > > > > > > Felipe Albrecht
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Jan 24, 2008 5:50 AM, Mark Schreiber <
> markjschreiber at gmail.com >
> > > > > wrote:
> > > > > > > > Hi Felipe -
> > > > > > > >
> > > > > > > > Thanks for the input on this. As a general rule the GC should
> > > never be
> > > > > > > > called from code. Generally this degrades performance of the
> JVM.
> > > > > > > > Unless there is a very good reason I will remove this.
> Probably
> > > you
> > > > > > > > are right a method parameter may work better.
> > > > > > > >
> > > > > > > > - Mark
> > > > > > > >
> > > > > > > > On Jan 24, 2008 1:47 PM, Felipe Albrecht
> > > <felipe.albrecht at gmail.com >
> > > > > > > wrote:
> > > > > > > > > Hello,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > I think that it can be solved by a simple way:
> > > > > > > > > Implement (or just copy and cut) a pairwiseAlignment
> utilizing
> > > > > SymboList
> > > > > > > as
> > > > > > > > > parameters and do no creating a alignment, just the
> calculating
> > > it
> > > > > and
> > > > > > > > > returning the value.
> > > > > > > > >
> > > > > > > > > Another thing that is a bit stange for me, is the
> utilization of
> > > > > garbage
> > > > > > > > > collector direcly, that is: The field "scoreMatrix" is a
> class
> > > > > field,
> > > > > > > why at
> > > > > > > > > the end of pairwiseAlignment it is set to null and the
> garbage
> > > > > collector
> > > > > > > > > run? It is not better (and simpler) to use scoreMatrix as
> method
> > > > > > > variable?
> > > > > > > > >
> > > > > > > > > I'm annexing the class code with my changes that is doing
> well
> > > the
> > > > > (4^8)
> > > > > > > *
> > > > > > > > > (4^8) SymbolList pairwise alignments that I am needing :-)
> > > > > > > > >
> > > > > > > > > Thank you,
> > > > > > > > >
> > > > > > > > > Felipe Albrecht
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  On Jan 23, 2008 6:50 AM, Mark Schreiber <
> > > markjschreiber at gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > > Hi Felipe -
> > > > > > > > > >
> > > > > > > > > > I agree this is a barrier to ease of use. Even if
> Sequences
> > > are
> > > > > > > > > > required internally for some obscure reason there is no
> reason
> > > why
> > > > > > > > > > dummy Sequences cannot be made inside the aligner.  These
> > > > > sequences
> > > > > > > > > > could be given names like 'query' and 'subject' or even
> 'seq1'
> > > and
> > > > > > > > > > 'seq2'.
> > > > > > > > > >
> > > > > > > > > > I will take a look at adding some methods.
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > >
> > > > > > > > > > - Mark
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Jan 23, 2008 2:58 PM, Felipe Albrecht
> > > > > < felipe.albrecht at gmail.com >
> > > > > > > > > wrote:
> > > > > > > > > > > Hello all,
> > > > > > > > > > >
> > > > > > > > > > > I have a simple question about pairwise alignment
> classes
> > > > > > > (SmithWaterman
> > > > > > > > > and
> > > > > > > > > > > NeedlemanWunsch):
> > > > > > > > > > > Why it is necessary two Sequence for alignment and not
> two
> > > > > > > SymbolList?
> > > > > > > > > > >
> > > > > > > > > > > Example, I have a SymbolList collection to align between
> > > then,
> > > > > > > > > > > by this way I need to create some "dummies"  Sequence
> for to
> > > do
> > > > > the
> > > > > > > > > > > alignment.
> > > > > > > > > > >
> > > > > > > > > > > Reading the source, I saw that the unique field that is
> > > > > exclusive to
> > > > > > > > > > > Sequence is the name, for the alignment output,
> > > > > > > > > > > but if I need only the alignment result, it is useless.
> > > > > > > > > > >
> > > > > > > > > > > It is not possible to override the pairwiseAlignment to
> > > accept
> > > > > > > > > SymbolList or
> > > > > > > > > > > may be a new method that the parameters are 2 SymbolList
> and
> > > > > returns
> > > > > > > the
> > > > > > > > > > > alignment score?
> > > > > > > > > > >
> > > > > > > > > > > Thank you
> > > > > > > > > > >
> > > > > > > > > > > Felipe Albrecht
> > > > > > > > > > > _______________________________________________
> > > > > > > > > > > biojava-dev mailing list
> > > > > > > > > > > biojava-dev at lists.open-bio.org
> > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>



More information about the biojava-dev mailing list