1

I have a following task of determining whether this problem is well conditioned. I tried to solve it but I am stuck because I don't know how to understand the last step.

Is a problem of calculating the root of the polynomial $p(x) = ax + b$ well conditioned for variables $a,x$ and $b$?

Here is my attempt:

I calculate maximal relative error of result, denoted with $U(a,x,b)$ where $a,x,b$ are arguments with a relative error not greater than arithmetic precision denoted by $\nu$. Here is equation with distorted data for relative error: $$\frac{|a(1+\epsilon_1)x(1+\epsilon_2) + b(1+\epsilon_3) - ax - b|}{|ax + b|} = U(a,x,b)$$ And

$$\sup_{|\epsilon|\leq \nu}U(a,x,b) =\frac{|ax(2 \nu + \nu^2)+ b\nu|}{|ax + b|} $$ $$cond(a,x,b)=\frac{U(a,x,b)}{\nu} = \frac{|ax(2+\nu) + b|}{|ax + b|}$$

How this translates to statement that problem is well defined or not? I don't specifically understand this step.

Cahir7
  • 13
  • Of course! Thank you, if there's more to elaborate I'd like to see it and accept an answer. Like some explanation on this matter. – Cahir7 Nov 15 '21 at 17:35

1 Answers1

0

We are investigating the sensitivity of the root $x = -b/a$ of the linear equation $$ax + b = 0$$ to small changes in the input, i.e., the vector $(a,b) \in \mathbb{R}^2$ and $a \not = 0$. We need to decide how to measure the size of changes, i.e., what is a small change? I will consider changes that are small relative to the components. To that end, let $\epsilon > 0$ be given and consider perturbations $\Delta a$ and $\Delta b$ such that $$|\Delta a| \leq \epsilon |a|, \quad |\Delta b| \leq \epsilon |b|.$$ We are ultimately interested in the limit where $\epsilon$ tends to zero, so there is no harm in assuming that $\epsilon < 1$. Since $a \not = 0$ we are certain that $$a + \Delta a \not = 0.$$ It follows that the perturbed equation $$(a + \Delta a) z = b + \Delta b$$ has a unique solution $$z = x + \Delta x.$$ Our objective is to determine the largest possible size of the relative error $\frac{\Delta x}{x}$ and the limiting behavior of the worst case scenario as $\epsilon$ tends to zero. From $$(a + \Delta a)(x + \Delta x) = b + \Delta b$$ it follows immediately that $$ (a+\Delta a) \Delta x = b + \Delta b - ax - (\Delta a)x = \Delta b - (\Delta a) x $$ I now assume that $b \not = 0$ so that $x = -b/a \not = 0$. Division by $x$ produces $$(a + \Delta a) \frac{ \Delta x}{x} = -a \frac{\Delta b}{b} - \Delta a.$$ Division by $a + \Delta a$ produces $$ \frac{\Delta x}{x} = - \frac{a}{a + \Delta a} \frac{\Delta b}{b} - \frac{\Delta a}{a + \Delta a}$$ We can now start bounding the relative error $\frac{\Delta x}{x}$ subject to the constraints that $$|\Delta a| \leq \epsilon |a|, \quad |\Delta b| \leq \epsilon |b|.$$ By the triangle inequality we have $$ \left| \frac{\Delta x}{x} \right| \leq \frac{|a|}{|a|(1-\epsilon)} \epsilon + \epsilon \frac{|a|}{|a|(1-\epsilon)} = \frac{2 \epsilon}{1 - \epsilon}.$$ It is important to recognize that equality is achieved for the choice of $$\Delta a = - \epsilon a, \quad \Delta b = -\epsilon b.$$ We can now conclude that $$ \sup \left \{ \frac{1}{\epsilon} \left| \frac{\Delta x}{x} \right| : (a + \Delta a)(x + \Delta x) = (b + \Delta b) \wedge |\Delta a| \leq \epsilon |a| \wedge |\Delta b| \leq \epsilon |b| \right \} = \frac{2}{1 - \epsilon}$$ This characterizes the worst case behavior under the given constraints. We are interested in the case of small values of $\epsilon$. It is clear that $$ \sup \left \{ \frac{1}{\epsilon} \left| \frac{\Delta x}{x} \right| : (a + \Delta a)(x + \Delta x) = (b + \Delta b) \wedge |\Delta a| \leq \epsilon |a| \wedge |\Delta b| \leq \epsilon |b| \right \} \rightarrow 2, \quad \epsilon \rightarrow 0, \quad \epsilon > 0.$$ We conclude that the root $$x = -\frac{b}{a}$$ is well-conditioned in the componentwise relative sense. This completes the investigation.


Concluding remarks: It is entirely possible to solve this problem by treating $x$ as a function of two variables, i.e., $$ x = f(a,b) = -\frac{b}{a}$$ and invoking the general theory for the condition numbers of functions of multiple variables. But this is overkill and it defeats the purpose of the exercise. The above derivation mirrors the analysis of the solution of the standard linear system $Ax=b$ where $A \in \mathbb{R}^{n \times n}$ is a nonsingular matrix and $b \in \mathbb{R}^n$.
Carl Christian
  • 12,583
  • 1
  • 14
  • 37