Optimization with 'sort', or bilevel optimization with permutation matrix

Question

Original problem: $$ \begin{aligned} \mathbf{w}^*&= \underset{\mathbf{w}}{\mathrm{Argmax}} ~ \text{Sort} \left[ R_1\mathbf{w} \right]^T \mathbf{p} - c||\mathbf{w}-\mathbf{w}_0||_1 \\s.t.~\\ ||\mathbf{w}||_\infty&\leq 1 \end{aligned} $$

Second problem $(R_2=-R_1)$: $$ \begin{aligned} \mathbf{w}^*&= \underset{\mathbf{w}}{\mathrm{Argmin}} ~ \text{Sort} \left[ R_2\mathbf{w} \right]^T \mathbf{p} + c||\mathbf{w}-\mathbf{w}_0||_1 \\s.t.~\\ ||\mathbf{w}||_\infty&\leq 1 \end{aligned} $$

Bilevel optimization: $$ \begin{aligned} \mathbf{w}^* &= \underset{\mathbf{w}}{\mathrm{Argmin}} ~ \left[ M (R\mathbf{w}) \right]^T \mathbf{p} + c||\mathbf{w}-\mathbf{w}_0||_1 \\ M &\in \underset{M}{\mathrm{Argmin}}~\mathbf{v}^T M(R\mathbf{w}) \\ \mathbf{v}&= (1,2,3,\ldots,nrow(M))^T \\s.t.~\\ ||\mathbf{w}||_\infty&\leq 1 \\\sum_i M_{i,j} &= 1 \\\sum_j M_{i,j} &= 1 \\M_{i,j} &\in \{0,1\} \end{aligned} $$

$R$ is a tall random matrix of shape around $100000\times1000$.

$\mathbf{p}$ is a univariate probability mass function of an arbitrary distribution within $(0,1)$. $\mathbf{p}$ matches with each row of the sorted $R\mathbf{w}$. All elements in $\mathbf{p}$ are positive and ideally, they have a sum equal to $1$. $\mathbf{p}$ can be sorted as well.

Here I rewrite the 'Sort' part using permutation matrix $M$. It is okay to assume the 'Sort' as increasing or decreasing ($\text{Sort} \left[R\mathbf{w} \right]^T \mathbf{p}$ as largest element with largest weight or largest element with smallest weight). The 'Sort' part can be relaxed, a weak approximation is acceptable.

Please let me know if there's an algo or solver for it, or if you have any ideas. Thx a lot!

Any sign properties on $p$ and $R$? Is $p$ arbitrary or sorted? — Johan Löfberg, Apr 29 '21 at 18:40
Had you minimized with reversed sum on the norm (assuming positive $c$) it would actually be LP-representable (i.e minimize norm + weighted sort with decreasing non-negative weights is convex and LP-representable) — Johan Löfberg, Apr 30 '21 at 06:46
Did you change the objective? Now you are minimizing with positive sign on both the convex norm and the sum of weighted sorted vector? This is LP representable — Johan Löfberg, Apr 30 '21 at 09:57
Yes! Following the hint I think adding a negative sign to $R$ is also acceptable and simplifies the objfun. — user29988, Apr 30 '21 at 10:01
Note that switching the sign on $R$ switch the order of the sort, sort(1,3,2) is (3,2,1) while sort(-1,-3,-2) yields (-1,-2,-3) which is not equal to -(sort(-(1,3,2))), i.e. be careful — Johan Löfberg, Apr 30 '21 at 10:06

score 2 · Accepted Answer · answered Apr 30 '21 at 13:03

Listing a second answer to address the convex case.

With $p>0$ sorted in the same order as the sort operator in the objective, the problem is convex and LP-representable. The sum of (decreasingly) weighted sorted values can be seen as the limiting case of the weighted sum of k largest values which is a classical example of a weird operator which actually is LP representable.

Although it is LP-representable, the straightforward LP model is no fun, as the size grows quadratically in the length of the sorted vector. However, since this shows that the problem indeed is convex despite its intimidating form, it might encourage you to employ some other convex optimization strategy to optimize the objective (ADMM/sub-gradient stuff etc)

If you live on the bleeding edge with YALMIP, the following code runs on the develop branch which I just updated to include weights in the sumk operator.

% Random data
n = 50;
m = 5;
p = sort(rand(n,1),'descend');
R = randn(n,m);
w0 = randn(m,1);
w = sdpvar(m,1);
% LP
objective = sumk(R*w,n,p)+norm(w-w0,1);
Model = [norm(w,inf)<=1];

optimize(Model,objective)
% MILP (maybe reduce n first...)
objective = sort(Rw,'descend')'p+norm(w-w0,1);
Model = [norm(w,inf)<=1];

optimize(Model,objective)

Hi Johan! Would like to consult you the category of this optimization problem. is it k-sum optimization? Do you have more details? — user29988, Jun 13 '22 at 15:27
Thanks for your correction. Then what's the correct name for this type of problem? May I have references to cite your package and also the origin of this type of problem? — user29988, Jun 13 '22 at 17:03
It's nothing specially named, it's just linear programming representation of sum-of-k-largest . You can probably find references for it and generalizations in Lectures on modern convex optimization by Ben-Tal and Nemirovski — Johan Löfberg, Jun 13 '22 at 18:46

Johan Löfberg · Answer 2 · 2021-04-30T12:49:42.387

This answer was supplied before the question was adjusted (switch sign before sort and structure on $p$) to a case which actually is convex and LP representable. It is kept though as it might be useful for the general case

Neither of them can be represented using linear programming.

The first one is MILP-representable (the norm is LP-representable, but sort is in general a nasty combinatorial object)

Here is proof-of-concept code in the MATLAB toolbox YALMIP. You need a good MILP solver for this to be solved

% Random data
n = 10;
p = randn(n,1);
R = randn(n);
w0 = randn(n,1);
% Solve
w = sdpvar(n,1);
objective = -sort(Rw)'p+norm(w-w0,1);
Model = [norm(w,inf) <= 1];

optimize(Model,-objective)

The bilevel program you list is extremely hard (as there are binaries in the inner problem), but I guess that is just your attempt to rewrite the first model. You seem to be on the right track, but it can be done more efficiently and does not require a bilevel formulation.

To model $s = \text{sort}(z)$ you can introduce a binary permutation matrix $M$ and the model $s = Mz, s_i \leq s_{i+1}$ with the row-sum and column-sum constraints, and then you apply a big-M model on the bilinear products in $Mz$. This is the model you obtain in YALMIP.

The model quickly becomes huge though...

Thank you so much Johan! I tried the code in MATLAB and yes it works but it's truly hard... Curious about "Had you minimized with reversed sum on the norm (assuming positive c) it would actually be LP-representable (i.e minimize norm + weighted sort with decreasing non-negative weights is convex and LP-representable) – Johan Löfberg" Sorry i'm a noob on this and cant see the link... Could you kindly add more hints? thx! — user29988, Apr 30 '21 at 08:39
Minimizing $\text{sort}(Rw)^Tp + ||w-w_0||_1$ is linear-programming representable, assuming non-negative $p$ is sorted in same order as the sort operator (i.e. largest element has largest weight). It is not trivial to see or implement though (of course) and requires a rather complicated LP model (I just generalized the convex operators sumk and sumabsk in YALMP to support this weighted sum of $k$ largest elements). Alas, as you have written it now, you have the wrong sign on the sorted term for it to be applicable) — Johan Löfberg, Apr 30 '21 at 08:49

Optimization with 'sort', or bilevel optimization with permutation matrix

2 Answers2