10

The Kullback-Leibler divergence is convex in both $p,q$, where

$$\mathrm{KL}(p||q) = \sum_i p_i \ln \frac{p_i}{q_i}$$

for discrete probability distributions $p_i,q_i$.

Is it strictly convex?

a06e
  • 6,665

1 Answers1

12

No, it is not:

Let $p$, $q$ be two distinct discrete probability distributions on the same alphabet. Then,

$\operatorname{KL}(\lambda p + (1-\lambda) q \Vert \lambda p + (1-\lambda) q) = \lambda \operatorname{KL}(p \Vert p) + (1-\lambda) \operatorname{KL}(q \Vert q) = 0$

for any $\lambda \in [0,1]$.

However, $\operatorname{KL}(p \Vert q)$ is strictly convex in $p$ for fixed $q$, where it is finite. It is also strictly convex in $q$ for fixed $p$ if $\operatorname{supp}(q) \subseteq \operatorname{supp}(p)$. The proof follows from the condition for equality in the log-sum inequality.

Georg Pichler
  • 136
  • 2
  • 4
  • Hi, thanks for the answer. What about convexity w.r.t. one of its arguments only? $p$ or $q$ – a06e Sep 17 '18 at 12:07
  • 1
    That would work. But you have to restrict the support $\operatorname{supp}(p) \subseteq \operatorname{supp}(q)$ so $\operatorname{KL}(p\Vert q)$ is strictly convex in $p$ (otherwise KL might be infinite). Also $\operatorname{supp}(q) \subseteq \operatorname{supp}(p)$ suffices for strict convexity in $q$ (modifying $q$ merely on $\operatorname{supp}(p)^c$ does not change $\operatorname{KL}(p\Vert q)$). – Georg Pichler Sep 18 '18 at 14:35
  • Great, thanks. This clears it up. If you want please include this last comment in the answer and I'll accept it! Ping me with a comment. – a06e Sep 18 '18 at 15:12
  • Alright, I added the comment to the answer. – Georg Pichler Sep 23 '18 at 19:11
  • Is it not true though that if $p_1 \neq p_2$ and $q_1 \neq q_2$ then the inequality is strict? @GeorgPichler? – Vladimir May 28 '20 at 02:46
  • What $p_1, p_2, q_1, q_2$ are you referring to? – Georg Pichler Jun 07 '20 at 22:54
  • Just about your last statement that KL divergence is (strictly) convex in q for a fixed p. I am looking at the log sum inequality, and if p and q are both probability measures, then they both sum to one, so all the inequality states is that KL divergence $\ge 1 \log(1/1)=0$. How does this help us to show that it is convex in q if we fix p? – jeffery_the_wind Jan 28 '21 at 11:51
  • Take $$ \lambda \mathrm{KL}(p||q_1) +(1-\lambda) \mathrm{KL}(p||q_2) = \sum_{x \in \mathrm{supp}(p)} p(x) \left( \lambda \log\frac{\lambda p(x)}{\lambda q_1(x)} + (1-\lambda) \log\frac{(1-\lambda)p(x)}{(1-\lambda)q_2(x)} \right) \ \ge \mathrm{KL}(p||\lambda q_1 + (1-\lambda) q_2) , $$ with equality if and only if $q_1(x) = q_2(x)$ for all $x \in \mathrm{supp}(p)$. – Georg Pichler Jan 31 '21 at 14:26
  • @Vladimir: I assume that you are talking about $p_{1,2}$ and $q_{1,2}$ in Theorem 2.7.2 of (Cover and Thomas, 2006). In this case, the counterexample given in the answer would correspond to $p_1 = q_1 \neq p_2 = q_2$.

    Using the condition for equality in the Log-Sum inequality, it is possible to construct four pairwise distinct distributions $p_{1,2}$ and $q_{1,2}$ on an alphabet of size 3, such that equality holds in eq. (2.105) of (Cover and Thomas, 2006) for all $\lambda \in [0,1]$.

    – Georg Pichler Aug 12 '22 at 12:31