1

I am told that the definition of a polyhedron is

$$P = \{ x \vert a_j^T x \le b_j, j = 1, \dots, m, c_j^T x = d_j, j = 1, \dots, p \}.$$

I am then told that the compact notation is

$$P = \{ x \vert Ax \preceq b, Cx = d \},$$

where

$$A = \begin{bmatrix} a_1^T \\ \vdots \\ a_m^T \end{bmatrix}, \ \ \ \ \ \ \ \ \ \ C = \begin{bmatrix} c_1^T \\ \vdots \\ c_p^T \end{bmatrix},$$

and the symbol $\preceq$ denotes vector inequality or compenentwise inequality in $\mathbb{R}^m$: $u \preceq v$ means $u_i \le v_i$ for $i = 1, \dots, m$.

I am then told that a simplex is a family of polyhedra. If the $k + 1$ points $v_0, \dots, v_k \in \mathbb{R}^n$ are affinely independent, which means $v_1 - v_0, \dots, v_k - v_0$ are linearly independent, then the simplex determined by them is given by

$$C = \mathbf{\text{conv}}\{v_0, \dots, v_k\} = \{ \theta_0 v_0+ \dots + \theta_k v_k \vert \theta \succeq 0, \mathbf{1}^T \theta = 1 \},$$

where $\mathbf{\text{conv}}$ is the convex hull, and $\mathbf{1}$ denotes the vector with all entries one.

It is then said that, to describe the simplex as a polyhedron (that is, to describe it in the form $P = \{ x \vert Ax \preceq b, Cx = d \}$), we proceed as follows. by definition, $x \in C$ if and only if $x = \theta_0 v_0 + \theta_1 v_1 + \dots + \theta_k v_k$ for some $\theta \succeq 0$ with $\mathbf{1}^T \theta = 1$. Equivalently, if we define $y = (\theta_1, \dots, \theta_k)$ and

$$B = \begin{bmatrix} v_1 - v_0 \dots v_k - v_0 \end{bmatrix} \in \mathbb{R}^{n \times k},$$

we can say that $x \in C$ if and only if

$$x = v_0 + By$$

for some $y \succeq 0$ with $\mathbf{l}^Ty \le 1$. Now we note that affine independence of the points $v_0, \dots, v_k$ implies that the matrix $B$ has rank $k$. Therefore there exists a nonsingular matrix $A = (A_1, A_2) \in \mathbb{R}^{n \times n}$ such that

$$AB = \begin{bmatrix} A_1 \\ A_2 \end{bmatrix} B = \begin{bmatrix} I \\ 0 \end{bmatrix}.$$

I was unable to understand a number of steps here. First of all, I don't understand why, if we define $y = (\theta_1, \dots, \theta_k)$ and

$$B = \begin{bmatrix} v_1 - v_0 \dots v_k - v_0 \end{bmatrix} \in \mathbb{R}^{n \times k},$$

then we can say that $x \in C$ if and only if

$$x = v_0 + By.$$

It is not clear to me why the $x = v_0 + By$ follows from this, and nor do I understand why $x \in C$ if and only if it follows. Secondly, I was unable to understand why affine independence of the points $v_0, \dots, v_k$ implies that the matrix $B$ has rank $k$. If $B$ is subtracting the points as $v_1 - v_0 \dots v_k - v_0$, then does this not create a clear linear dependence? (I'm a novice to affine geometry, so there might be some property of affine spaces that I am not understanding here; I'm just trying to use my basic linear algebra understanding of linear independence in Euclidean spaces to understand this.) Thirdly, how does the author get that

$$AB = \begin{bmatrix} A_1 \\ A_2 \end{bmatrix} B = \begin{bmatrix} I \\ 0 \end{bmatrix}?$$

Sorry for the lengthy question, but I've been trying to understand this for a while, and I have as of yet been unable to make any progress. Thank you.

1 Answers1

2

I will use the usual inequalities, it should be clear what they mean. So let us look at your first question. You want to show that $x \in C$ iff $\exists y\geq 0:\mathbf{1}^Ty\leq1 \text{ and }x=v_0+By$. Let us take a $x \in C$. By definition there exists positive $\theta_i's$ that sum to unity, st. $x=\sum_{i=0}^k\theta_iv_i$. Notice that since $\sum_{i=0}^k \theta_i=1,$ we have that $\theta_0=1-\sum_{i=1}^k \theta_i$. With this in mind let us expand our expression for $x$:

$$x=\sum_{i=0}^k\theta_iv_i=\theta_0v_0+\theta_1v_1+\dots+\theta_kv_k=(1-\sum_{i=1}^k \theta_i)v_0+\theta_1v_1+\dots+\theta_kv_k$$

$$= v_0-\theta_1v_0-\theta_2v_0-\dots-\theta_kv_0+\theta_1v_1+\dots+\theta_kv_k$$

$$=v_0+\theta_1(v_1-v_0)+\theta_2(v_2-v_0)+\dots+\theta_k(v_k-v_0) $$ $$=v_0+B \begin{bmatrix} \theta_1,&\cdots, &\theta_k \end{bmatrix}^T$$ Notice that $\begin{bmatrix} \theta_1,&\cdots, &\theta_k \end{bmatrix}^T$ has non-negative entries since each $\theta_j$ was non-negative. and since the sum of all $\theta_j$ sum to unity, the lack of $\theta_0$ must mean that $$ \mathbf{1}^T\begin{bmatrix} \theta_1,&\cdots, &\theta_k \end{bmatrix}^T \leq 1. $$ The proof of the "if" statement can be done similarly, backwards.

For your second question you ask whether or not the columns of $B$ aren't clearly linearly dependant. Counter-examples to this are easy and you should find them yourself, but the independence of the vectors $\{v_1-v_0,v_2-v_0,\dots,v_k-v_0\}$ comes from requiring affine independence of $\{v_0,v_1,\dots,v_k\}$. You write this yourself before defining the convex hull of a set of vectors.

For your last question we have that $B \in \mathbb{R}^{n\times k}$ has rank $k$ by assumption. Notice $k\leq n$ since we can't have $k>n$ and still have them to be linearly independant. From your first course in Linear Algebra, you know that it is possible to reduce the matrix $B$ to be in reduced row echelon form, where the last $n-k$ rows are zero-rows since all your columns are linearly independent. Each time you perform a row-operation on a matrix, you actually do a right-multiplication with a $n \times n$-elementary matrix. Each elementary matrix is invertible. The product of all your elementary matrices - that correspond to all your row-operations, is your matrix $A$. The following notes gives a good example of using elementary matrices to construct such a matrix: https://people.math.carleton.ca/~kcheung/math/notes/MATH1107/wk05/05_elementary_matrices_example.html

ms_
  • 460
  • Thanks you. Your demonstration for the first point is excellent, and it clarifies my misunderstanding. – Dom Fomello Feb 13 '20 at 04:19
  • With regards to the second point, I've read through it again after reading your post, and I think I was misinterpreting the statement; so my original thoughts about it being clearly linearly dependent were wrong. – Dom Fomello Feb 13 '20 at 04:20
  • Affine spaces have creeped into a lot of the work I've been doing lately, and I have noticed what seems to be a common theme: The affine space has dimension $n$, but it contains $n + 1$ affinely independent elements. In my introductory linear algebra studies, we always had that a dimension of $n$ corresponds to $n$ linearly independent elements. This Wikipedia article seems to discuss it: https://en.wikipedia.org/wiki/Affine_space#Affine_span_and_bases This answer seems to clear my misunderstanding: https://math.stackexchange.com/a/2597663/749638 – Dom Fomello Feb 13 '20 at 04:21
  • With regards to my third question, I am referencing page 33 of Convex Optimization from Boyd & Vandenberghe: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf Any idea what the author is doing here? – Dom Fomello Feb 13 '20 at 04:22
  • @DomFomello I have edited the above solution to include an answer to your final question, if you have any more questions feel free. – ms_ Feb 13 '20 at 11:22
  • Thank you very much for this. To be clear, $k\leq n$ since the $k + 1$ points $v_0, \dots, v_k \in \mathbb{R}^n$ are affinely independent? Also, can you please elaborate on what you mean by the used row-operations forming an $n \times n$ matrix, which is invertible? Thanks again. – Dom Fomello Feb 14 '20 at 05:25
  • @DomFomello I have edited the last question again explain this. – ms_ Feb 14 '20 at 05:36
  • Thanks for that. But why does multiplying this matrix by $B$ result in $\begin{bmatrix} I \ 0 \end{bmatrix}$? Is it because $A$ is actually the inverse of the matrix that is the product of all of the row operations? Or am I mixing this up? – Dom Fomello Feb 14 '20 at 06:35
  • @DomFomello You are very close to understanding the point, you are misunderstanding the fact that $B$ never changes even during row operations. Remember when you perform a row operation with elementary matrix $E_j$ on $B$, the new matrix will be $E_jB$. Thus the rref of $B$ is actually a sequence of row operations right-multiplied on $B$, and it can be expressed $E_{\text{last}}E_{\text{last}-1}\cdots E_2E_1B$. Here $A=E_{\text{last}}E_{\text{last}-1}\cdots E_2E_1$. – ms_ Feb 14 '20 at 06:58
  • Ohhh, I see now: So the entire point of the part $AB = \begin{bmatrix} A_1 \ A_2 \end{bmatrix} B = \begin{bmatrix} I \ 0 \end{bmatrix}$ is just to symbolise the performing of row-reduction operations, which, of course, leads to the identity matrix on top (the reduced rows that have a $1$ in unique column positions) and rows of all $0$ at the bottom (since $k\leq n$, which means that, if $k < n$, then there will be some rows of all $0$), which is what we'd expect for rref if the rows of $B$ are linearly independent. Am I understanding this now? – Dom Fomello Feb 14 '20 at 08:55
  • @DomFomello You are completely right. – ms_ Feb 14 '20 at 10:43