4

Murphy, in his Machine Learning: A Probabilistic Perspective, defines two conditionally independent random variables essentially as follows: $X$ and $Y$ are conditionally independent given $Z$, i.e., $X \perp Y \mid Z$, if and only if $$p(X, Y \mid Z) = p(X \mid Z)p(Y \mid Z)\text{.}$$ where $p$ denotes the (conditional) pmfs/pdfs. (I'm not a fan of the notation, but let's put that aside for now.)

One theorem in here is:

$X \perp Y \mid Z$ iff there exist functions $g$ and $h$ such that $$p(x, y \mid z) = g(x, z)h(y,z)$$ for all $x, y, z$ such that $p(z) > 0$.

"$\Longrightarrow$" is easy to see from the definition.

The difficulty with "$\Longleftarrow$" is that it's hard to show that $g$ and $h$ are unique, and $g(x,z) = p(X \mid Z)$ and $h(y, z) = p(Y \mid Z)$. At least, I think this is what I need to show.

I've tried using $$p(X, Y \mid Z) = \dfrac{p(X, Y, Z)}{p(Z)}$$ but this does not lend itself to clean factoring, as desired in the theorem.

Clarinetist
  • 19,519

1 Answers1

1

An answer to this question can be found here. As a complete answer, for "$\Longleftarrow$," by assumption, there exist functions $g$, $h$ such that $$p(x, y \mid z) = g(x, z)h(y,z)\text{.}$$ Let $\mathcal{X}$, $\mathcal{Y}$ denote the supports of $X$ and $Y$ respectively. Integrating on both sides with respect to $x$ gives the equality $$p(y \mid z) = h(y,z)\int_{\mathcal{X}}g(x, z)\text{ d}x\text{.}$$ Denote $\alpha(z) = \int_{\mathcal{X}}g(x, z)\text{ d}x$. Similarly, integrating with respect to $y$, $$p(x \mid z) = g(x, z)\int_{\mathcal{Y}}h(y, z)\text{ d}y\text{.}$$ Denote $\beta(z) = \int_{\mathcal{Y}}h(y, z)\text{ d}y$. Assume $\alpha$, $\beta$ are non-zero. Thus, we have shown that $$p(x, y \mid z) = g(x, z)h(y, z) = \dfrac{p(x \mid z)}{\beta(z)}\cdot\dfrac{p(y \mid z)}{\alpha(z)}\text{.} $$ Integrating over $\mathcal{X} \times \mathcal{Y} \subset \mathbb{R}^2$, we have $$1 = \dfrac{1}{\alpha(z)\beta(z)}$$ so that $$p(x, y \mid z) = p(x \mid z)p(y \mid z)$$ as desired.

Clarinetist
  • 19,519