[Biojava-dev] Pairwise similarity score speed

Andy Yates ayates at ebi.ac.uk
Wed Jun 30 11:48:55 UTC 2010


Hi Radwen,

I would have said that this is more of a problem because of the type of algorithm you are using. It's impossible (as far as I am aware) to calculate the score matrices in one step for multiple sequences & even if it did I don't quite see where the speed increase would come from.

As for the All vs. All problem don't forget that really your total number of comparisons is  ((n * (n-1)) / 2) where n is the number of sequences you are comparing so a simple 2D for loop will have you spending twice the amount of time on this than needs to occur. When I've done this before (in Perl so excuse the usage of it) the code looks like this:

my @output;
my @elements = ('some','elements','something');
while(scalar(@elements) > 1) {
  my $target = pop(@elements);
  foreach my $remaining_element (@elements) {
    push(@output, [$target, $remaining_element]);
  }
}

So this would have emitted:

[
	['some','elements'],
	['some','something'],
	['elements','something']
]

Try doing something similar to this using the Java Deque objects which can act as a stack.

Hope this helps to answer your question

Andy

On 30 Jun 2010, at 12:18, Radhouane Aniba wrote:

> Hello Biojava people,
> 
> I have a question concerning Needlman Wunsh or Smith waterman algorithms.
> I am using Biojava 1.7 and I read sequences from proteins fasta file then I
> store my sequences into an array to calculate pairwise similarity scores
> using a for loop.
> The problem is that it is very time consuming if we want to calculate all
> pairwise for a big number of protein sequences. I would like to know if
> there is way to do a kind of "All against All" comparisons in one single
> step ?
> Do someone have a solution for this kind of problem ?
> 
> Thanks for help.
> 
> Radwen
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the biojava-dev mailing list