Prove that the normalizing constant of the continous categorical distribution is positive

Question

Problem statement

Given a vector $\mathbf{x} \in \mathbb{R}^{K}$, where $x_{i} \neq x_{j}, \forall i \neq j$. Prove that:

\begin{equation} \sum_{k=1}^{K} \frac{\exp(x_{k})}{\prod_{i \neq k} (x_{k} - x_{i})} \ge 0. \end{equation}

If it is wrong, I would appreciate if a counter example is provided.

Background

I am reading the paper "The continuous categorical: a novel simplex-valued exponential family". In the paper, the authors proposed a continuous version for the categorical distribution. They then provided the normalizing constant. To calculate the log-likelihood, we need the log of that normalizing constant. However, I do not know how to prove that it is always positive, so that we can take a log. Here, I reformulate the expression of the normalizing constant for convenience. If one read the paper, we can analogize $\mathbf{x}$ as $\pmb{\eta}$. Note that although $\eta_{K} = 0$, we could simply consider it as any vector, and just shift $\eta_{i} - \eta_{K}$ to obtain $\pmb{\eta}$ as in the paper. The non-shifting version does not change the positivity nature of the expression of interest.

In addition, I tried the implementation provided by the authors, but it was unclear to me since the authors calculated the denominator $\propto (x_{k} - x_{i})$, not $(x_{i} - x_{k})$ as specified in the paper. Of course, it would lead to an additional factor $(-1)^{K - 1}$ in the denominator. This factor would then neutralizes the $(-1)^{K + 1}$. However, the implementation still have something with $(-1)^{K-1}$ (Note that the implementation denotes a variable $K$ as the value $(K - 1)$ in the paper.)

I did try the implementation by passing a vector as input to the provided function. However, it does not always return a value, but sometimes, not-a-number (NaN).

score 1 · Accepted Answer · answered Feb 18 '21 at 02:06

Good question! The simple answer is that the expression you wrote corresponds to the integral of a positive function, therefore it must be positive. To be more explicit, Gordon-Rodriguez et al. showed (in Section A of the supplement for the original paper): $$ \int_{\Delta^{K-1}} \exp\left( \sum_{k=1}^{K} \eta_k z_k \right) d\mu(\mathbf z) = (-1)^{K-1} \sum_{k=1}^K \frac { { \exp\left(\eta_k\right)} } { \prod_{i \ne k } \left(\eta_i - \eta_k\right)}, $$ where $\mu$ is the Lebesgue measure, $(\eta_1, \dots, \eta_{K})$ are real numbers such that $\eta_i \ne \eta_j$ whenever $i \ne j$, and the integral is taken over the simplex: $$ \Delta^{K-1} = \left\{ \mathbf z \in \mathbb R^{K} : z_k \ge 0, \sum_{k=1}^{K} z_k = 1 \right\}. $$ Now, as you correctly pointed out, the $(-1)^{K-1}$ factor corresponds to the changes in sign on the denominator, and so our two expressions are equivalent. Noting that the integrand $\exp\left( \sum_{k=1}^{K} \eta_k z_k \right)$ is always positive, we can conclude that the RHS is also positive, as required.

Let me add a couple of remarks:

The proof that our expression can indeed be written as such an integral is algebraically involved, but it is helpful to assume WLOG that $\eta_K = 0$ (as you already pointed out), and note that our integral is over a $K-1$ dimensional set and can be written as follows: $$ \int_{\Delta^{K-1}} \exp\left( \sum_{k=1}^{K} \eta_k z_k \right) d\mu(\mathbf z) = \int_0^1 \int_0^{1-z_1} \cdots \int_0^{1-z_1-\cdots-z_{K-2}} \exp\left(\sum_{k=1}^{K-1} \eta_k z_k\right) dz_{K-1} \cdots dz_2 dz_1. $$ You can then proceed by induction (details in the supplement).
As for your question about NaN values, you can take a look at section 3.4 from the original paper, as well as section 5 from this other paper. To summarize - computing the normalizing constant is numerically unstable, a problem which becomes worse as $K$ increases (and as a result, these papers only considered examples with $K \le 10$). In terms of the implementation, note that the product in this line is taken over axis=1, which does in fact index rows (not columns), since the 'zeroth' dimension indexes the minibatch samples.

Section 2 in the following may also be of interest: https://arxiv.org/abs/2204.13290 — egr95, Jun 14 '22 at 03:30

Prove that the normalizing constant of the continous categorical distribution is positive

Problem statement

Background

1 Answers1