1

I'm stuck expressing the following scenario in mathematical terms. I have the code written in Python, an example can found at the bottom of this post.

Scenario: I have a set of multiple data sets $D$. Each data set $d \in D$ has a day associated with it, such as $d^{t}$ is the 'current day', and $d^{t-1}$ is the previous day. Each data set contains clusters with a label (e.g. 0 or 1). I am comparing each cluster in data set $d^{t}$ to the clusters in the previous data set $d^{t-1}$. Each cluster has features (e.g. BMI, length, age) that have a numerical value.

I am looping over all clusters in data set $d^{t}$ and I compare them to all clusters in data set $d^{t-1}$. Each comparison involves the calculation of the absolute differences between the numeric features of both clusters. The clusters always have the same set of features. I want to minimize the absolute differences. For formula should have as input: a cluster $c$ in $d^{t}$, and as output the cluster with the minimized absolute difference from the clusters in $d^{t-1}$.

Something like: $\min_{c \in D^{t}} ...$

Example: $d^{t}$ contains clusters 0, 1, and 2, with the following feature values:

Cluster Age BMI length
0 20 25 170
1 30 20 180
2 40 30 160

And $d^{t-1}$ contains the clusters 0 and 1, with the following feature values:

Cluster Age BMI length
0 50 20 150
1 25 30 180

The absolute difference between cluster 0 from $d^{t}$ and cluster 1 from $d^{t-1}$ would be: abs(20-25) + abs(25-30) + abs(170-180) = 20.

In pseudocode:

differences = {} # dictionary with key:value pairs
for each cluster c1 in d^t:
    differences[c1] = [] # empty list
    for each cluster c2 in d^t-1:
        abs_difference = 0
        for each feature: # that is both in c1 and c2
            abs_difference = abs_difference + abs(feature_c1, feature_c2)
    differences[c1].append(c2, abs_difference) # append the difference between c1 and c2

This results in e.g. {c0:[(c0, 85), (c1, 20)], c1:[(c0, 60), (c1, 15)], c2:[(c0, 30), (c1, 40)]}. The minimalization would then result in the combinations c0:c1; c1:c1, c2:c0. Thus, if we would give the formula input: c0 from data set $d^{t}$, it would return c1 (or simply 1).

Any help converting this problem to a mathematical formula is much welcome.

0 Answers0