I'm stuck expressing the following scenario in mathematical terms. I have the code written in Python, an example can found at the bottom of this post.
Scenario: I have a set of multiple data sets $D$. Each data set $d \in D$ has a day associated with it, such as $d^{t}$ is the 'current day', and $d^{t-1}$ is the previous day. Each data set contains clusters with a label (e.g. 0 or 1). I am comparing each cluster in data set $d^{t}$ to the clusters in the previous data set $d^{t-1}$. Each cluster has features (e.g. BMI, length, age) that have a numerical value.
I am looping over all clusters in data set $d^{t}$ and I compare them to all clusters in data set $d^{t-1}$. Each comparison involves the calculation of the absolute differences between the numeric features of both clusters. The clusters always have the same set of features. I want to minimize the absolute differences. For formula should have as input: a cluster $c$ in $d^{t}$, and as output the cluster with the minimized absolute difference from the clusters in $d^{t-1}$.
Something like: $\min_{c \in D^{t}} ...$
Example: $d^{t}$ contains clusters 0, 1, and 2, with the following feature values:
| Cluster | Age | BMI | length |
|---|---|---|---|
| 0 | 20 | 25 | 170 |
| 1 | 30 | 20 | 180 |
| 2 | 40 | 30 | 160 |
And $d^{t-1}$ contains the clusters 0 and 1, with the following feature values:
| Cluster | Age | BMI | length |
|---|---|---|---|
| 0 | 50 | 20 | 150 |
| 1 | 25 | 30 | 180 |
The absolute difference between cluster 0 from $d^{t}$ and cluster 1 from $d^{t-1}$ would be: abs(20-25) + abs(25-30) + abs(170-180) = 20.
In pseudocode:
differences = {} # dictionary with key:value pairs
for each cluster c1 in d^t:
differences[c1] = [] # empty list
for each cluster c2 in d^t-1:
abs_difference = 0
for each feature: # that is both in c1 and c2
abs_difference = abs_difference + abs(feature_c1, feature_c2)
differences[c1].append(c2, abs_difference) # append the difference between c1 and c2
This results in e.g. {c0:[(c0, 85), (c1, 20)], c1:[(c0, 60), (c1, 15)], c2:[(c0, 30), (c1, 40)]}. The minimalization would then result in the combinations c0:c1; c1:c1, c2:c0. Thus, if we would give the formula input: c0 from data set $d^{t}$, it would return c1 (or simply 1).
Any help converting this problem to a mathematical formula is much welcome.