1

I am having trouble understanding min/max problems in Linear Algebra. I am normally used to solving these types of problems using some form of calculus. The question I am stuck on is minimizing:

$$b^{T}x + \frac{1}{2}x^{T}x$$

subject to

$$Ax = 0$$

I think it's a fairly straightforward question when solving it using gradients and lagrange but I am stuck on how to even start questions like this.

1 Answers1

2

Find a basis for the null space of $A$, then if matrix $N$ has as its columns the vectors of this basis, you can express $x$ as follows

$x = N u $

where $N$ is $n \times k$ and $u $ is a $k \times 1 $ vector. So now the optimization problem becomes unconstrained as follows. Find $u \in \mathbb{R}^k $ such that

$ f = \dfrac{1}{2} u^T N^T N u + b^T N u \hspace{15pt}(1) $

is minimized. And the solution of this problem is well-known. It can be found by completing the square. We want to absorb the linear term $b^T N u = u^T N^T b$, so write $f$ as follows

$ f = \dfrac{1}{2} (u - u_0)^T N^T N (u - u_0) - \dfrac{1}{2} u_0 N^T N u_0 \hspace{15pt}(2) $

Expanding the form we just wrote, we get

$ f = \dfrac{1}{2} u^T N^T N u - u^T N^T N u_0 $

Comparing this to the original form (Eq. $(1)$), we must select $u_0$ such that

$ - u^T N^T N u_0 = u^T N^T b $

Therefore,

$ u_0 = - (N^T N)^{-1} N^T b $

From Eq. $(2)$, and since $N^T N$ as positive semi-definite, the minimum value of $f$ is

$ f_{min} = - \dfrac{1}{2} u_0^T (N^T N) u_0 = - \dfrac{1}{2} b^T N (N^T N)^{-1} N^T b $

The optimal value of vector $x$ is

$ x = N u_0 = - N (N ^T N )^{-1} N^T b $

Note that $f_{min} = - \dfrac{1}{2} b^T x $


Another way to proceed is directly through Lagrange multipliers

The target function to be minimized is

$g (x, \lambda) = \frac{1}{2} x^T x + b^T x + \lambda^T (A x) $

The gradient with respect to $x$ is

$ \nabla_x g = x + b + A^T \lambda $

And the gradient with respect to $\lambda$ is

$\nabla_\lambda g = A x $

From $\nabla_x g = 0 $ , it follows that $ x = - b - A^T \lambda $

From $\nabla_\lambda g = 0 $ , it then follows that

$A (- b - A^T \lambda) = 0 $

Therefore,

$ A A^T \lambda = - A b $

At this point we have to assume that $(A A^T)$ has full rank, (i.e. if $A$ is $m \times n$, then we want the rank of $A A^T$ to be $m$)

Then,

$\lambda = - (A A^T)^{-1} A b = - A^{\#} b $

where $A^\#$ is the Penrose pseudo-inverse of $A$.

And, therefore,

$ x = - b - A^T \lambda = - ( I - A^T (A A^T)^{-1} A ) b $

And finally the minimum value of $f$ is

$ f_{min} = - \frac{1}{2} b^T ( I - A^T (A A^T)^{-1} A ) b $

The advantage of this method is that we can express the minimizing $x$ and $f_{min} $ directly in terms of $A$ and $b$.

Finally, one notes that if $A$ is $m \times n$ where $ m \lt n $ has rank $m$, then $N$ will be of dimension $n \times (n - m)$; therefore given a vector $y \in \mathbb{R}^n$, it can be written as

$ y = [ A^T : N ] [u_1 , u_2 ]^T $

where $u_1 \in \mathbb{R}^m , u_2 \in \mathbb{R}^{n-m} $

And we will have

$(I - A^T (A A^T)^{-1} A) y = [ 0 : N] [u_1 , u_2]^T $

and also,

$ (N (N^T N) N^T ) y = [ 0 : N ] [u_1 , u_2]^T $

Thus both are projectors into the null space of matrix $A$, so they're equal.

Hosam Hajeer
  • 21,978
  • How did you know that $\u_{0} = -(N^{T}N)^{-1}N^{T}b$ is the minimum to f? – Tigertron Apr 13 '22 at 21:44
  • Please check my updated solution. – Hosam Hajeer Apr 13 '22 at 22:06
  • Okay and to express the solution in terms of x, A, and b, I would just multiply $u_{0}$ with N to get x and simply the rest to remove the N's and get it in terms of A – Tigertron Apr 13 '22 at 22:23
  • You cannot express the solution $x$ and the minimum value $f_{min}$ in terms of matrix $A$, it has to be expressed in terms of matrix $N$. – Hosam Hajeer Apr 13 '22 at 22:32
  • @Tigertron Do you have a better solution, where the minimizing $x$ and the minimum value can be expressed of $A$ directly ? (Without the need for $N$) ? – Hosam Hajeer Apr 13 '22 at 22:47
  • but a solution can be expressed in A and b correct? If I solve it using Lagrange, I can get a solution that is composed only of A and b – Tigertron Apr 13 '22 at 22:49
  • Yes. That's right. I've added the Lagrange method as a second method in my updated solution now. – Hosam Hajeer Apr 13 '22 at 23:09
  • Clearly, one should expect that $(I - A^T (A A^T)^{-1} A ) = N (N^T N)^{-1} N^T $, but I am unsure how to prove that. – Hosam Hajeer Apr 13 '22 at 23:20
  • I was thinking that you could say that $A^{T}(AA^{T})^{-1}A$ is the projection($P_{1}$) of $A^{T}$ which is $N(A)$ but since it is $(I - P_{1})$, it's the projection of $R(A)$ which contradicts $N(N^{T}N)^{-1}N^{T}$ – Tigertron Apr 14 '22 at 00:31
  • I have added explaining why $ (I - A^T (A A^T)^{-1} A) = N (N^T N)^{-1} N^T$, as they're both projectors onto the null space of $A$. – Hosam Hajeer Apr 14 '22 at 04:32