Currently I'm struggling with a (for me) new field, namely clustering. I would really appreciate any help I could get!
The starting situation is that a data set $(x_k)_{k\in\{1,\dots,n\}} \subseteq \mathbb{R}^N$ is given. The task is to partition this set into clusters $C_1,\dots, C_m$ (where $m$ is not preset) so that with a given $c \in \mathbb{R_{>0}}$ holds $$ \forall i \in \{1,\dots,m\} \ \forall x,y \in C_i \colon \ \Vert x-y \Vert \leq c \\ \forall i,j \in \{1,\dots,m\} \ \forall x \in C_i \ \forall y \in C_j \colon \ i \neq j \ \Longrightarrow \ \Vert x - y \Vert > c $$ and so that $m$ is minimal. In other words, what I'm looking for is: How can I divide the initial data set into as few clusters as possible so that the elements within each cluster have at most distance $c$ and so that two elements of distinct clusters have at least distance $c$? (One maybe could also ask this question where distances are replaced by similarities.)
Does anybody know some keywords I could look for? It would be great if there already was an algorithm or easy implementation for that. I'm also happy if somebody knows something that solves a problem which is close to mine.
Also, is there a method which would allow to replace the "$\Vert x-y\Vert$" by an arbitrary distance measure $d(x,y)$ and which would only rely on the distances between already given points and no others? By that I mean that some of my ideas would use custom distances (or similarities) where it would be too expensive to calculate the distance for new points (like for example for the mean of some of the given points to another point).
Regards Murp