Hexaly vs Gurobi on the K-Means Clustering Problem (MSSC)

The K-Means Clustering problem is defined as follows: given a set of points along some dimensions, find a partition of these points into k clusters. The total sum of squared Euclidean distances between each cluster’s centroid and its elements is minimized. This problem is also known as the Minimum Sum-of-Squares Clustering (MSSC).

Hexaly Optimizer reaches gaps below 1% in 60 seconds of running time on the K-Means Clustering Problem. We illustrate on this page how Hexaly Optimizer outperforms traditional general-purpose optimization solvers, like Gurobi 9.1, on this challenging problem.

Input data

The instances of this benchmark come from the UCI Machine Learning Repository, gathering a wide variety of datasets used in machine learning with an arbitrary number of dimensions, and the TSPLIB, which is the reference instance repository for the Traveling Salesman Problem (TSP) and contains only 2-dimensional points.

We extracted from these two repositories the instances used in research articles [1][2][3][4][5], presenting state-of-the-art dedicated algorithms for the K-Means Clustering problem. Since each instance can be solved for any value of k, the number of clusters is used to partition the points. We decided only to keep instances solved with k ≤ 25. Typical clustering applications involve a more significant number of clusters.

Mathematical models of the K-Means Clustering Problem (MSSC)

Mixed-Integer Quadratically Constrained Programming (MIQCP) model

The results reported for Gurobi are obtained with the Mixed-Integer Quadratically Constrained Programming (MIQCP) formulation presented here. Each cluster center is represented by continuous variables, one for each dimension, allowing the computation of Euclidean distances to each point thanks to quadratic constraints. In addition, binary variables are used to model assignments of points to clusters. Note that due to numerical issues on instances with large coordinates, we had to scale all coordinates in the range [0,1] for Gurobi.

Hexaly model

The Hexaly model relies on set variables representing assignments of points to clusters. The centroid of each cluster is then computed using a lambda expression. The distance between the points assigned to a cluster and its centroid is computed following the same approach. Compared to the MIQCP model, the Hexaly model is much more compact, as the only decisions to be taken are the assignments of points to clusters. Set variables combined with lambda expressions enable simple and intuitive modeling of the problem without limiting the possible extensions.

function model() {
  // Set variables: clusters[c] represents the points in cluster c
  clusters[1..k] <- set(nbPoints);

  // Each point must belong to one and only one cluster
  constraint partition[c in 1..k](clusters[c]);

  for [c in 1..k] {
      local cluster <- clusters[c];
      local size <- count(cluster);

      // Compute the centroid of the cluster
      centroid[d in 0..nbDimensions-1] <- size == 0 ? 0 : sum(cluster, o => coordinates[o][d]) / size;

      // Compute the variance of the cluster
      squares[d in 0..nbDimensions-1] <- sum(cluster, o => pow(coordinates[o][d] - centroid[d], 2));
      variances[c] <- sum[d in 0..nbDimensions-1](squares[d]);
  }

  // Minimize the total variance
  obj <- sum[c in 1..k](variances[c]);
  minimize obj;
}

Gurobi and Hexaly results on the K-Means Clustering Problem (MSSC)

We compare both solvers’ performance with two solving times: 1 minute and 10 minutes. At the end of the running time, we measure the gap to the best known solutions reported in the cited research papers. Each instance is solved with different values of k, the number of clusters. The first eight instances, from “german” to “pr2392”, are solved for k = 2, 3, 4, 5, 6, 7, 8, 9, 10. The remaining, larger instances, from “breast” to “pla85900”, are solved for k = 2, 5, 10, 15, 20, 25. For each instance, the gap reported in the table corresponds to the arithmetic mean of the gaps in % between the best known solution and the solution found by the solver for each value of k. When a solver doesn’t find any feasible solution within the time limit for at least one value of k, no average gap value but the letter “X” is reported in the table.

We use Hexaly 10.0 and Gurobi 9.1, known as a state-of-the-art MIP solver. Both are used with default parameters. We ran them on a server equipped with an Intel Core i5-6500 processor (4 cores, 3.6GHz, 6MB cache) and 16GB RAM.

Hexaly Gurobi
Instances Points Dimensions 1 min 10 min 1 hour 1 min 10 min 1 hour
german 59 2 0.0 0.0 0.0 5.9 3.9 3.6
ruspini 75 2 0.0 0.0 0.0 7.9 7.3 5.6
spath 89 3 0.0 0.0 0.0 11.9 3.8 1.6
hatco 100 7 0.0 0.0 0.0 16.8 10.0 9.0
iris 150 4 0.0 0.0 0.0 24.9 14.7 14.4
gr202 202 2 0.2 0.0 0.0 26.0 20.9 18.2
gr666 666 2 0.3 0.1 0.1 33.8 18.4 18.1
pr2392 2,392 2 0.2 0.0 0.0 X 35.6 10.8
breast 683 9 0.8 0.1 0.1 40.3 21.3 14.1
u1060 1,060 2 0.8 0.4 0.0 62.8 39.4 21.6
pcb3038 3,038 2 0.5 0.3 0.3 X 63.2 34.9
pendigit 10,992 16 0.8 0.8 0.4 X 61.4 54.2
d15112 15,112 2 0.1 0.1 0.0 X 73.3 71.7
letter 20,000 16 0.8 0.4 0.2 X X 44.4
kegg 53,413 23 25.7 5.9 3.1 X X X
shuttle 58,000 9 4.5 1.4 0.9 X X X
pla85900 85,900 2 0.9 0.4 0.1 X X 78.6
Average gaps in % to best known solutions.

Hexaly obtains high-quality solutions in short running times, with a gap lower than 1%, even for the largest and hardest instances. On the other hand, Gurobi finds viable solutions on instances with less than 100 points but quickly struggles with larger instances. For huge instances with 10,000 points, Gurobi fails to find any feasible solution.

Conclusion

Hexaly offers an innovative modeling approach based on set variables, making the mathematical modeling of the K-Means Clustering Problem (MSSC) much more compact than the MIQCP formulation handled by traditional MIP solvers. Extra constraints can be added easily, offering flexibility to extend the model to suit real-world clustering needs. Beyond simplicity, Hexaly delivers near-optimal solutions in minutes.

You can find the complete model of the K-Means Clustering Problem (MSSC) and many other Clustering problems in Python, Java, C#, and C++ in our Example Tour.

Are you interested in trying it out? Get free trial licenses here. In the meantime, feel free to contact us; we will be glad to exchange about your optimization problems.


[1] D. Aloise, P. Hansen (2009). A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering. Pesquisa Operacional. 29(3), 503-516.

[2] D. Aloise, P. Hansen, L. Liberti (2012). An improved column generation algorithm for minimum sum-of-squares clustering. Mathematical Programming 131, pp. 195–220.

[3] A.M. Bagirov, B. Ordin, G. Ozturk, A.E. Xavier (2015). An incremental clustering algorithm based on hyperbolic smoothing. Computational Optimization and Applications 61, pp. 219–241.

[4] T. Pereira, D. Aloise, J. Brimberg, N. Mladenović (2018). Review of basic local searches for solving the minimum sum-of-squares clustering problem. In P. Pardalos, A. Migdalas (eds), Open Problems in Optimization and Data Analysis. Springer Optimization and Its Applications, vol. 141. Springer, Cham.

[5] P. Kalczynski, J. Brimberg, Z. Drezner (2019). The importance of good starting solutions in the minimum sum of squares clustering problem. Submitted on arXiv on 6 Apr 2020.

Share