[Biojava-l] Evolutionary distances
Andy Yates
ayates at ebi.ac.uk
Wed Oct 24 12:28:21 UTC 2007
Yes a very good point & one I was going to make before hand but forgot :)
Also not to mention that micro-benchmarks/profiling in Java are
notorious for giving false results due to VM warmup & JIT compilation
optimisations. There is a framework hosted on Java.net somewhere which
can perform VM warmups and code iterations to produce more accurate
benchmarking results; but the name escapes me at the moment.
However looking at this particular code I get the feeling that this is
about as fast as its going to get without someone doing bitwise XOR
operations or some C code ... that's not an open invitation for people
to start recoding this in C :). At the end of the day the key to
optimisation is to ask the question "is it fast enough already?". If it
is then there's no point :)
Andy
Mark Schreiber wrote:
> Hi -
>
>>From experience the best way to optimize java code is to run a
> profiler. The one in Netbeans is quite good.
>
> The reason is that the hotspot or JIT compilers might natively compile
> the part of the code that you think is slow and actually make it
> faster than something else which becomes the bottle neck. Using a good
> profiler you can detect how much time is spent in each method and pin
> point some candidate methods for optimization. You can also see if
> there is a burden due to creation of lots of objects.
>
> - Mark
>
> On 10/24/07, Andy Yates <ayates at ebi.ac.uk> wrote:
>> Our code is very similar but not identical. The original programmer
>> shortcutted a lot of else if conditions by considering if the two bases
>> were equal or not. It can then calculate the transitional changes &
>> assume the rest are transversional.
>>
>> In terms of speed of both pieces of code I can't see an obvious way to
>> speed it up. Probably in our code removing the 10 or so calls to
>> String.charAt() with a two calls & referencing those chars might help
>> but in all honesty I cannot say.
>>
>> Andy
>>
>> Richard Holland wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Thanks.
>>>
>>> Your code is similar to the code we have in
>>> org.biojavax.bio.phylo.MultipleHitCorrection. I haven't checked it to
>>> see if it is identical, but it probably is.
>>>
>>> You can call our code like this:
>>>
>>> // import statement for biojava phylo stuff
>>> import org.biojavax.bio.phylo.*;
>>>
>>> // ...rest of code goes here
>>>
>>> // call Kimura2P
>>> String seq1 = ...; // Get seq1 and seq2 from somewhere
>>> String seq2 = ...;
>>> double result = MultipleHitCorrection.Kimura2P(seq1, seq2);
>>>
>>> Note that our implementation expects sequence strings to be in upper
>>> case, so you'll need to make sure your data is upper case or has been
>>> converted to upper case before calling our method.
>>>
>>> cheers,
>>> Richard
>>>
>>> vineith kaul wrote:
>>>> This is what I have .....Thanks a lot fr the help.
>>>>
>>>>
>>>> //Method to calculate the Kimura 2 parameter distance
>>>> public static double K2P(String sequence1,String sequence2){
>>>> long p=0,q=0,numberOfAlignedSites=0; // P= transitional
>>>> differences (A<->G & T<->C) ; Q= transversional differences (A/G<-->C/T)
>>>>
>>>>
>>>> char[] seq1array=sequence1.toCharArray();
>>>> char[] seq2array=sequence2.toCharArray();
>>>>
>>>> for(int i=0;i<seq1array.length;i++){
>>>> // Number of aligned sites
>>>> if(((seq1array[i]=='a') ||
>>>> (seq1array[i]=='A')||(seq1array[i]=='g') ||
>>>> (seq1array[i]=='G')||(seq1array[i]=='c') || (seq1array[i]=='C') ||
>>>> (seq1array[i]=='t') || (seq1array[i]=='T')) && ((seq2array[i]=='a') ||
>>>> (seq2array[i]=='A')||(seq2array[i]=='c') ||
>>>> (seq2array[i]=='C')||(seq2array[i]=='t') ||
>>>> (seq2array[i]=='T')||(seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>>
>>>> numberOfAlignedSites++;
>>>> }
>>>>
>>>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>> p++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>>> p++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>>> p++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>>> p++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='a') || (seq1array[i]=='A')) &&
>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>>>> ((seq2array[i]=='c') || (seq2array[i]=='C'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='g') || (seq1array[i]=='G')) &&
>>>> ((seq2array[i]=='t') || (seq2array[i]=='T'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='t') || (seq1array[i]=='T')) &&
>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>>>> ((seq2array[i]=='a') || (seq2array[i]=='A'))) {
>>>> q++;
>>>> }
>>>> else
>>>> if(((seq1array[i]=='c') || (seq1array[i]=='C')) &&
>>>> ((seq2array[i]=='g') || (seq2array[i]=='G'))) {
>>>> q++;
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> }
>>>>
>>>> double P = 1.0 - (2.0 * ((double)p)/numberOfAlignedSites) -
>>>> (((double)q)/numberOfAlignedSites);
>>>> double Q = 1.0 - (2.0 * ((double)q)/numberOfAlignedSites);
>>>> System.out.print(numberOfAlignedSites+"\t"+p+"\t"+q+"\t");
>>>> double dist = (-0.5 * Math.log(P)) - ( 0.25 * Math.log(Q));
>>>> return dist;
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 10/22/07, *Richard Holland* <holland at ebi.ac.uk
>>>> <mailto:holland at ebi.ac.uk>> wrote:
>>>>
>>>> You should take a look at the latest 1.5 release, in the
>>>> org.biojavax.bio.phylo packages. This code is the beginnings of some
>>>> phylogenetics code that will perform tasks as you describe. The future
>>>> plan is to extend this code to cover a wider range of use cases.
>>>> Kimura2P
>>>> is already implemented here, in
>>>> org.biojavax.bio.phylo.MultipleHitCorrection.
>>>>
>>>> If you can't find code that will do what you want, but have written some
>>>> before, then please do feel free to contribute it. Even if it is
>>>> slow, I'm
>>>> sure someone out there will be able to help optimise it!
>>>>
>>>> cheers,
>>>> Richard
>>>>
>>>> On Sun, October 21, 2007 5:30 pm, vineith kaul wrote:
>>>> > Hi,
>>>> >
>>>> > Are there functions to calculate evolutionary pairwise distances like
>>>> > Kimura2P,Finkelstein etc in Biojava
>>>> > I did write smthng on my own but on large sequences it runs terribly
>>>> > slow and I am not even sure if thats right.
>>>> > --
>>>> > Vineith Kaul
>>>> > Masters Student Bioinformatics
>>>> > The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>>>> > Georgia Tech, Atlanta
>>>> > _______________________________________________
>>>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>> <mailto:Biojava-l at lists.open-bio.org>
>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>> >
>>>>
>>>>
>>>> --
>>>> Richard Holland
>>>> BioMart ( http://www.biomart.org/)
>>>> EMBL-EBI
>>>> Hinxton, Cambridgeshire CB10 1SD, UK
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Vineith Kaul
>>>> Masters Student Bioinformatics
>>>> The Parker H. Petit Institute for Bioengineering and Bioscience (IBB)
>>>> Georgia Tech, Atlanta
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.2.2 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFHHvm34C5LeMEKA/QRAlc3AJ9GAMML/z5+BBl12PA2a/Zyz/CHDQCdFWKa
>>> 4iKvsyBj2uznhhjTF9EYDFE=
>>> =LALE
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
More information about the Biojava-l
mailing list