0

I want to show that $f_n(\zeta) = \frac{1}{n} \log \sum_{w \in W_n} e^{\zeta K_n(w)} P_n (w)$ , with $\zeta \in \mathbb{R}$ is convex. I will not explain what $W_n, K_n$ and $P_n$ are, because this is not necessary according to the question I guess.

So I want to show that $f_n(\lambda \zeta' +(1-\lambda) \zeta'') \leq \lambda f_n(\zeta') + (1-\lambda) f_n(\zeta'')$ for all $\zeta', \zeta'' \in \mathbb{R}$ and $\lambda \in [0,1]$. I use the Hölder inequality.

What I got:

$\displaystyle f_n(\lambda \zeta' +(1-\lambda) \zeta'')$

$\displaystyle = \frac{1}{n} \log \sum_{w \in W_n} e^{(\lambda \zeta' +(1-\lambda) \zeta'') K_n(w)} P_n (w)$

$\displaystyle = \frac{1}{n} \log \sum_{w \in W_n} e^{(\lambda \zeta' ) K_n(w)} e^{((1-\lambda) \zeta'') K_n(w)} P_n (w)$

$\displaystyle \leq \frac{1}{n} \log [ ( \sum_{w \in W_n} e^{p(\lambda \zeta' ) K_n(w)} P_n (w) )^{\frac{1}{p}} \cdot (\sum_{w \in W_n} e^{(q(1-\lambda) \zeta'') K_n(w)} P_n (w))^{\frac{1}{q}}]$

$\displaystyle = \frac{1}{n} \log ( \sum_{w \in W_n} e^{p(\lambda \zeta' ) K_n(w)} P_n (w) )^{\frac{1}{p}} + \frac{1}{n} \log (\sum_{w \in W_n} e^{(q(1-\lambda) \zeta'') K_n(w)} P_n (w))^{\frac{1}{q}} $

$\displaystyle = \frac{1}{np} \log ( \sum_{w \in W_n} e^{p(\lambda \zeta' ) K_n(w)} P_n (w) ) + \frac{1}{nq} \log (\sum_{w \in W_n} e^{(q(1-\lambda) \zeta'') K_n(w)} P_n (w)) $

And this must somehow equal the following expression:

$\displaystyle \lambda \cdot (\frac{1}{n} \log \sum_{w \in W_n} e^{\zeta' K_n(w)} P_n (w)) + (1-\lambda) \cdot (\frac{1}{n} \log \sum_{w \in W_n} e^{\zeta'' K_n(w)} P_n (w))$

But I don't see how I can do this.

clubkli
  • 759

1 Answers1

0

Assumptions: I suppose you're implicitly assuming $P_n(w) > 0$ for all $w$.

Now, after removing unnecessary details, your problem can be succinctly formulated as follows: Given constants $\alpha_1, \ldots, \alpha_n \in \mathbb{R}$, show that the function $f : \mathbb{R} \rightarrow \mathbb{R}$ $\zeta \mapsto \log \sum_{1 \le k \le n}\exp(\alpha_k \zeta)$, is convex.

For $ 1 \le j \le n$, define $s_j(\zeta):= \mathrm{softmax}(\alpha \zeta)_j := \dfrac{\exp(\alpha_j \zeta)}{ \sum_{1 \le k \le n}\exp(\alpha_k \zeta)}$. One readily computes $f'(\zeta) = \sum_{1 \le k \le n}\alpha_k s_k(\zeta)$, and $f''(\zeta) = \sum_{1 \le k \le n}\alpha_k^2 s_k(x) - (\sum_{1 \le k \le n}\alpha_k s_k(\zeta))^2 = $ variance of a discrete random variable supported on the $\alpha$'s with mass $s_k(\zeta)$ on each $\alpha_k$. Thus $f''(\zeta) \ge 0$, and so $f$ is convex.

dohmatob
  • 9,535