Given random variables $\vec{x}, \vec{y} \in \mathbb{R}^n$, is it true that
$I(\vec{x}: \vec{y}) \geq \sum_i I(x_i, y_i)$
My interpretation is that that collectively several variables should be able to predict another set of variables at least as good as individually. However, when I compute the quantities in this equation from some data I have, I get the opposite, namely $I(\vec{x}: \vec{y}) < \sum_i I(x_i, y_i)$. Can one prove the above inequality? Does my code have a bug, or is it the understanding that is wrong?
Edit: Following @Mini's answer and some reading, I think I can summarize the answer to the interpretation question. There are two competing processes at play:
- Synergy: Several variables collectively share information with a target that is not shared separately by any of its parts. Synergy increases multivariate MI
- Redundancy: "Effective dimension" of the problem may be smaller than the actual dimension due to strong dependencies of variables within each vector. Redundancy decreases MI
Thus, my interpretation is correct only in the case when there is no redundancy.
Bonus Question: Is there a redundancy-corrected version of MI, which would only measure presence or absence of synergy?