Understanding mutual information

Question

Given random variables $\vec{x}, \vec{y} \in \mathbb{R}^n$, is it true that

$I(\vec{x}: \vec{y}) \geq \sum_i I(x_i, y_i)$

My interpretation is that that collectively several variables should be able to predict another set of variables at least as good as individually. However, when I compute the quantities in this equation from some data I have, I get the opposite, namely $I(\vec{x}: \vec{y}) < \sum_i I(x_i, y_i)$. Can one prove the above inequality? Does my code have a bug, or is it the understanding that is wrong?

Edit: Following @Mini's answer and some reading, I think I can summarize the answer to the interpretation question. There are two competing processes at play:

Synergy: Several variables collectively share information with a target that is not shared separately by any of its parts. Synergy increases multivariate MI
Redundancy: "Effective dimension" of the problem may be smaller than the actual dimension due to strong dependencies of variables within each vector. Redundancy decreases MI

Thus, my interpretation is correct only in the case when there is no redundancy.

Bonus Question: Is there a redundancy-corrected version of MI, which would only measure presence or absence of synergy?

Mini · Accepted Answer · 2020-03-03T13:57:08.907

The written inequality is not true. A simple example is when $\stackrel{\rightarrow}{X}=(X_1,X_2)$ and $\stackrel{\rightarrow}{Y}=(Y_1,Y_2)$, such that $X_1=X_2$ and $Y_1=Y_2$. In this case $$I(\stackrel{\rightarrow}{X};\stackrel{\rightarrow}{Y})=I(X_1;Y_1)=\frac{1}{2}\sum \limits_i I(X_i;Y_i)\leq \sum \limits_i I(X_i;Y_i).$$ The other direction for inequality is not neither true "in general". In general we have \begin{align*} I(\stackrel{\rightarrow}{X},\stackrel{\rightarrow}{Y})&=\sum \limits_i I(X_i,\stackrel{\rightarrow}{Y}\big| X_1^{i-1})\\ &=\sum \limits_{i,j} I(X_i,Y_j\big| X_1^{i-1},Y_1^{j-1}). \end{align*} If you have the assumption that $(X_i,Y_i)$ are drawn independent of other indices, then you have the equality.

To show that the other part is not correct: Suppose $U$ and $V$ are independent Bernoulli$(\frac{1}{2})$ random variables, and denote XOR operation by $\oplus$. Then $$I(U\oplus V;U)=I(V;U)=0.$$ Moreover $$I(U\oplus V,V;V,U)=I(U,V;V,U)=H(U,V)=2$$ Now let $X_1=U\oplus V$, $X_2=V$, $Y_1=V$, and $Y_2=U$. We have \begin{align*} I(\stackrel{\rightarrow}{X};\stackrel{\rightarrow}{Y})=2 > 0=\sum \limits_i I(X_i;Y_i). \end{align*} Your interpretation considers prediction, which could be close to the concept of conditional entropy. However, here you have mutual information, which is about "what they have in common".

Hey, thanks. A few points: (1) Can you prove that the inequality does not hold the other way? I have high suspicion that it is indeed the case (2) Can you explain why my intuition is wrong, with words? As in, why information shared by vectors is not greater than information shared by their parts? — Aleksejs Fomins, Mar 03 '20 at 11:48
I modified the answer and addressed your questions. If this answers your question, then please rate it and mark as accepted answer. — Mini, Mar 03 '20 at 13:03

Understanding mutual information

1 Answers1