Why are we choosing the Krylov projection method like this?

Question

I am attending a course on numerical linear algebra, where we talk about Krylov-Methods right now. We want to construct a sequence which converges to a solution of a system of linear equations $Ax = b$.

Lets say that $x_0$ is our starting point. In an iterative manner we want to construct $x_k$ such that $x_k \in x_0 + \mathcal{K}^k(A,r_0)$. Here we just say that $x_0 = 0$ and with that $r_0 = b$. Here is now the thing I don't understand:

We construct the $x_k$ in such a way that $r_k \perp \mathcal{K}^k(A,r_0)$ so that we get the constraints $$B_k^T(b - Ax_k) = 0$$ where $B_k$ is a Basis of the k-th Krylov subsapce. How is this orthogonality constraint helpful in finding a good sequence $\{x_k\}$?

score 3 · Accepted Answer · answered Aug 01 '22 at 14:40

3

In general, there is no guarantee that the sequence $x_0, x_1, x_2 \dotsc$ will converge rapidly towards the solution $x$ of $Ax=b$. In that sense, the orthogonality condition is not useful at all.

However, there are practical considerations that are important when the sequence converges so fast that it is actually useful.

If you generate the basis $V_k = [v_1, v_2, \dots v_k ]$ of $K_k(A,b)$ using the celebrated Arnoldi method, then you achieve a factorization of the form $$A V_k = V_{k+1} \bar{H}_k$$ where $\bar{H}_{k+1} \in \mathbb{R}^{(k+1) \times k}$ is an upper Hessenberg matrix. Now if we seek $x_k = V_k y_k$, then the orthogonality condition reduces to $$ V_k^T A V_k y_k = H_k y_k = V_k^T b = \|b\|_2 e_1^{(k)}$$ where $H_k \in \mathbb{R}^{k \times k}$ consists of the first $k$ rows of $\bar{H}_k$ and $e_1^{(k)}$ is the first column of the $k$-by-$k$ identity matrix. It is now straight forward to solve the linear system $$H_k y_k = \|b\|_2 e_1^{(k)}$$ because it is almost upper triangular. The standard procedure is to use Givens rotations to reduce it to upper triangular form. The beauty of this procedure is that the work can be recycled if we decided that we need to increase $k$.

In summary, the orthogonality condition is not useful for choosing a good sequence, but it simplifies the problem of computing the sequence.

Good sequences can be often be obtained by preconditioning, i.e., by effectively replacing the linear system $Ax=b$ with another linear system, often of the form $M^{-1} Ax = M^{-1} b$, which has the same solution, but a sequence that converges rapidly.

answered Aug 01 '22 at 14:40

Carl Christian

12,583
1
14
37

I don't understand why this sequence converges at all with this orthogonality constraint. I understand that the actual solution is inside that Krylov space but I don't see how this sequence we get in the end by assuming this orthogonality constraint is a converging series and is even converging to the right solution. – Josh.K Aug 01 '22 at 16:05
1

@Josh.K There is always a smallest $k$ such that $K_k(A,b)$ is the $A$-invariant subspace that contains the vector $b$. In this case $AV_k = V_k H_k$ and for this $k$ the exact solution is $x = V_k y_k$ where $H_k y_k = |b|_2 e_1^{(k)}$. However, and this is critical, typically $k =n$ and any strictly positive residual history is possible. This is the reason why we must always use preconditioning. – Carl Christian Aug 01 '22 at 17:44
Hmm... I am missing something. If I take the smallest $k$ such that $K_k(A,b)$ is $A$-invariant and lets say $k<n$, why do we get $x$ as the solution of the constraint $V_k^T(b - Ax_k) = 0$. If $A$ has full rank, then this equation system should be underdetermined so that there could be more potenial solutions than $x$, why do we still certainly get $x$ as a solution doing the procedure described by you? – Josh.K Aug 01 '22 at 20:22
1

@Josh.K You need to look at the monic polynomial $p$ of minimal degree such that $p(A)b = 0$. You need to use the axiom of choice for here. You will be able to show that $p$ has degree $k$ and that the constant term is nonzero. Here you will need that to assume that $A$ is nonsingular. You can now manipulate the expression for $p(A)b=0$ into $Aq(A)b = b$, where $q$ has degree $k-1$. Hence $x = q(A)b$ is in $K_k(A,b)$. – Carl Christian Aug 01 '22 at 21:36
Maybe my text was a bit misleading. I absolutely understand that $x$ is in $K_k(A,b)$. I kind of don't understand how this orthogonality condition leads us to the solution $x$. – Josh.K Aug 01 '22 at 22:31
Lets say $k<n$ and $K_k(A,b)$ is $A$-invariant. Is it the case that now $x_k = x$ (we compute $x_k$ just like you described above) and if so why? Why can't $x_k$ be another vector? Like I have written above $V_k$ isn't quadratic and in this case $V_k^T(b - Ax_k) = 0$ is underdetermined and so there could be an $x_k \neq x$ which also fulfills this equation. – Josh.K Aug 01 '22 at 22:39
1

@Josh.K I am not sure how to help you. The orthogonality condition is used to define the Arnoldi method for solving linear systems of equations on the basis of the Arnoldi method for computing an orthogonal basis for the Krylov subspace. In the best case, the method converges rapidly, but in the worst case there is no progress until the very last iteration. We are typically somewhere in between the two extremes and preconditioning is absolutely necessary to ensure rapid convergence. – Carl Christian Aug 02 '22 at 13:06
1

@Josh.K You are mistaken when you write that $V_k^T(b-Ax_k)=0$ is underdetermined. This is because we search for $x_k = V_k y_k$ where $y_k \in \mathbb{R}^k$ and $V_k^T A V_k \in \mathbb{R}^{k \times k}$ is nonsingular. This is due to the fact that while $V_k \in \mathbb{R}^{n \times k}$ is a tall matrix it has full rank $k$. – Carl Christian Aug 02 '22 at 13:10
Thanks for your patience with me :). I think I have understood it. I totally forgot that we of course assume that that $x_k$ is in the column-space of $V_k$ and with this, the system has a unique solution. So the process goes as follow: We are computing all these $x_k$ and know that if we reached this $\hat{k}$ where $K_{\hat{k}} = K_{\hat{k}+1}$ that our solution has to be $A^{-1}b$. The crucial point is that $\hat{k} = n$ can happen but we want $\hat{k} << n$ for fast convergence and therefore use preconditioning to ensure that we reach a stationary krylov-space faster. Is that all right? – Josh.K Aug 02 '22 at 15:52
1

@Josh.K Yes, this is exactly message that I wanted to convey. – Carl Christian Aug 02 '22 at 16:48
I have one more general question: Lets say I just don't do the preconditioning. You said that typically then one reaches a stationary krylov-space after $k = n$ steps. How does the performance in such a worst case compares to direct methods using QR or SVD? – Josh.K Aug 03 '22 at 12:52
1

@Josh.K You should ask this as a separate question so that it is seen by everyone else. – Carl Christian Aug 03 '22 at 12:58

score 1 · Answer 2 · answered Aug 01 '22 at 14:04

So im going to Argument this for any Subspace $U_k$ you can substitute it with the Krylow-Subspaces its equivalent:

We are trying to solve $$||x_k-x||_A=\min_{y\in U_k}||y-x||_A.$$ Because we are in Spaces with a dotproduct with the Projection-Theorem this is equivalent to: $$\langle x-x_k,u\rangle_A \forall u\in U_k$$ Since we are using the dot-product given my the Matrix $A$ we can rewrite it as: $$\langle x-x_k,u\rangle_A=\langle A(x-x_k),u\rangle=\langle b-Ax_k,u\rangle.$$

So basically we are using the Projection-Theorem in Spaces with dot-Products to find our Approximation, which naturally gives us the orthogonality constraint.

Note that you are making the assumption that the $A$ inner-product generates a norm! This is not true in the generality of the original question, but requires that $A$ is symmetric positive definite. If $A$ is symmetric positive definite, then your answer is correct. — Carl Christian, Aug 01 '22 at 14:44

Why are we choosing the Krylov projection method like this?

2 Answers2