0

In my work, I came across the following problem: Given a similarity matrix D, where $d_{i,j} \in \Re$ represents the similarity between objects $i$ and $j$, I would like to select $k$ objects, for $k \in \{1, \dots, n\}$, in such a way that minimizes the similarity between the selected objects. My first attempt to formally formulate this problem was using the following integer program:

$$\text{minimize}\quad d_{1,2}X_1X_2 + d_{1,3}X_1X_3 + \dots + d_{1,n}X_1X_n + d_{2,1}X_2X_1 + \dots + d_{n,n-1}X_nX_{n-1} $$

such that $X_1 + X_2 + \dots + X_n = k$ and $X_y \in \{0,1\}$, for $y=1,\dots,n$.

In the above program, $X_y$ indicates whether or not object $y$ was selected. Clearly, the above program is not linear. I tried to make the objective function linear by using variables $X_{1,2} $, which indicates whether or not both objects $X_1$ and $X_2$ were selected. However, I am struggling to formulate the constraint that exactly $k$ objects must be chosen, i.e., the previous constraint $X_1 + X_2 + \dots + X_n = k$.

Since I am not an expert in mathematical programming, I wonder if you could help me with this.

Thank you in advance! All the best,

Arthur

1 Answers1

0

Depending on whether you allow variables of the type $X_{i,i}$ or not, you could try the following:

if you allow the "diagonal type" variables $X_{i,i}$ then the sum $\sum_i^n X_{1,i}$ is $0$ if $X_1$ was not selected and equals the number of selected items if $X_1$ was selected.

As you don't know a priori, which variables will be selected, you can come up with $n$ constraints of the above form. I.e. use $\sum_i^n X_{j,i}=k$ for $j \in \{1,\dots ,n\}$. this should do the trick.

If you don't allow those "diagonal type" variables, you have to use $\sum_i^n X_{j,i}=k-1$ for $j \in \{1,\dots ,n\}$ instead, however this will make problems for the case $k=1$ (and obviously also for the case $k=0$ but I'd consider that one as irrelevant) so I'd recommend using the first proposal.

Barkas
  • 908