[BioPython] kcluster and distances

Scott Rifkin scott.rifkin at yale.edu
Fri Mar 4 17:15:08 EST 2005


The euclidean distance function in cluster.c is:

{ double result = 0.;
  double tweight = 0;
  int i;
  if (transpose==0) /* Calculate the distance between two rows */
  { for (i = 0; i < n; i++)
    { if (mask1[index1][i] && mask2[index2][i])
      { double term = data1[index1][i] - data2[index2][i];
        result = result + weight[i]*term*term;
        tweight += weight[i];
      }
    }
  }
  else
  { for (i = 0; i < n; i++)
    { if (mask1[i][index1] && mask2[i][index2])
      { double term = data1[i][index1] - data2[i][index2];
        result = result + weight[i]*term*term;
        tweight += weight[i];
      }
    }
  }
  if (!tweight) return 0; /* usually due to empty clusters */
  result /= tweight;
  result *= n;
  return result;
}

why at the end is the result multiplied by n?  and why isn't the square 
root of result given as the distance?

thanks
scott rifkin





More information about the BioPython mailing list