Selecting a cluster based on minimum average distance

Question

I have a symmetric matrix of non-Euclidean distances of size $N$ (say, 500) and I would like to select one cluster of a fixed size $K$ (say, 25), so that it has the smallest average distance within this cluster. What is a good algorithm for doing that given combinatorial complexity of the problem?

Currently I have implemented the following algorithm, which is not perfect in finding the optimum:

Take $K$ points at random, form the cluster
Find $K$ points with smallest average distance to the points in the cluster at step 1). Call these $K$ points the new cluster
Repeat 1) and 2) until selected $K$ points are the same in both steps or until the new cluster has the larger average distance than the old cluster.

score 0 · Accepted Answer · edited Sep 04 '19 at 10:06

0

Seems like you're re-inventing the k-means algorithm that has been here for a while (Lloyd's algorithm from 1957). Although the problem normally minimizes Euclidean distances, it shouldn't be a problem as long as your distance function is a metric.

k-means++ due to careful seeding provides more stable results in less iterations (original paper).

edited Sep 04 '19 at 10:06

Pang

399
5
8

answered Apr 09 '19 at 14:32

Tombart

116

Thank you for the answer! It applies well to the problem with one exception. What I aim to find is just one optimal cluster with minimum distance, whereas k-means and k-means++ would minimize the sum of distances over all clusters. Are you aware of a modification for k-means or separate algorithms in which we just aim to find one cluster with minimum within distance? – Andrey Zubanov Apr 19 '19 at 20:58

Selecting a cluster based on minimum average distance

1 Answers1