6

I am given the cost function: mean absolute error $$ \frac{1}{N} \sum_{n=1}^{N} \big|y_n - f(x) \big|$$

If $f(x)$ is a linear regression: $f(x) = x^Tw$, where $w$ are the parameters of the regression function and $x$ the input, how can I prove that the mean absolute error is convex as a function of the parameters $w$ ?

convexity is defined here: http://mathworld.wolfram.com/ConvexFunction.html

james
  • 303
  • 1
    When you say "is convex" you mean "is a convex function of the parameters", don't you? Try evaluating the loss at parameter vectors $w_1$, $w_2$, and $w=(w_1+w_2)/2$ and compare what you get. – kimchi lover Dec 25 '17 at 12:43
  • @MichaelHardy Thanks for the note. I edited my question – james Dec 25 '17 at 13:44
  • @kimchilover thanks for the note. Please see the edited question. I don't exactly understand how to use your hint... – james Dec 25 '17 at 13:46
  • 1
    Let $A(w)$ be the absolute error resulting from $w$, given by the displayed expression in your question. You need to show that for all $t\in[0,1]$ that $A(tw_1+(1-t)w_2)\le t A(w_1)+(1-t)A(w_2)$ for all $w_1$ and $w_2$. It's easier, often, to start with $t=1/2$. That's all I suggested. – kimchi lover Dec 25 '17 at 13:53

1 Answers1

3

I too am in the process of working it out, but here is what I have:

We only need to verify that $g(w)=\lvert{y_n-f(x_n)}\rvert$ is convex, since we know that the sum of convex functions is also a convex function.

The condition for convexity is the following: for $0\leq\lambda\leq1$ a function g is convex iff $g(\lambda w_1 + (1-\lambda)w_2) \leq \lambda g(w_1)+(1-\lambda)g(w_2)$

So we have for our function $g(w)=\lvert{y_n-x_n^{T}w}\rvert$,

$\lvert y_n-x_n^{T}(\lambda w_1 + (1-\lambda)w_2)\rvert \leq \lambda \lvert{y_n-x_n^{T}w_1}\rvert + (1-\lambda)\lvert{y_n-x_n^{T}w_2}\rvert$

Since $0\leq\lambda\leq1$, I can put the $\lambda$ and $(1-\lambda)$ inside the absolute value, since these are positive and therefore won't make a difference to the equation.

$\lvert y_n-x_n^{T}(\lambda w_1 + (1-\lambda)w_2)\rvert \leq \lvert{\lambda y_n -\lambda x_n^{T}w_1}\rvert + \lvert{(1-\lambda)y_n- (1-\lambda)x_n^{T}w_2}\rvert$

This looks familiar, since I can see that if I take the right hand side of my inequality and denote the insides of the absolute values by $a$ and $b$

$a = {\lambda y_n -\lambda x_n^{T}w_1}$

$b = {(1-\lambda)y_n- (1-\lambda)x_n^{T}w_2}$

And the inside of the absolute value on the left side of my inquality is:

$a+b = y_n-x_n^{T}(\lambda w_1 + (1-\lambda)w_2)$

Then I see that my inequality is actually $\lvert a+b \rvert \leq \lvert a \rvert + \lvert b \rvert $, which is true. Therefore $g(w)$ is convex and $MAE(w)$ is convex.