What am I writing when I write $\mathbf X \mid \mathbf Y$?

Question

Suppose $\mathbf X$ is a random variable and $A$ is an event in the same probability space $(\Omega, \mathcal F, \Pr)$. (Formally, $\mathbf X$ is a function on $\Omega$, say $\Omega \to \mathbb R$; $A$ is a subset of $\Omega$.)

I am comfortable writing $\mathbf X \mid A$ to condition $\mathbf X$ on $A$. This can be defined as another random variable on a different probability space: replace $\Omega$ by $A$, $\mathcal F$ by $\{S \cap A : S \in \mathcal F\}$, and measure $\Pr[\,{\bullet} \mid A]$. Then, just let $\mathbf X \mid A$ have the same value as $\mathbf X$ on every $\omega \in A$. With this definition, the conditional expectation $\mathbb E[\mathbf X \mid A]$ is just the ordinary expectation of this new random variable $\mathbf X \mid A$.

I am less comfortable with a different notation, which is what this question is about:

Suppose $\mathbf X, \mathbf Y$ are two random variables in the same probability space. It might be convenient to describe their joint distribution as "choose $\mathbf Y$, then choose $\mathbf X$ in a way that depends on $\mathbf Y$". For example, we flip $10$ coins and let $\mathbf Y$ be the number of heads; we flip those $\mathbf Y$ coins again and let $\mathbf X$ be the number of heads. We can write this distribution as $$ \mathbf Y \sim \textit{Binomial}(10, \tfrac12) \qquad \mathbf X \mid \mathbf Y \sim \textit{Binomial}(\mathbf Y, \tfrac12). $$ This notation has some nice features. If $\mathbf Z \sim \textit{Binomial}(n,\frac12)$ for a constant $n$, then $\mathbb E[\mathbf Z] = \frac12 n$. Here, we can pretend that we're in the same boat and write $\mathbb E[\mathbf X \mid \mathbf Y] = \frac12\mathbf Y$, which is correct as a description of the random variable $\mathbb E[\mathbf X \mid \mathbf Y]$.

But is $\mathbf X \mid \mathbf Y$ really any kind of random variable (or other object) on its own, or is this just abuse of notation?

Note: I will deal with $\int$ symbols if I must, but if I get an answer just for discrete random variables where these don't show up, that's fine by me.

In the discrete setting you can consider $X \mid Y$ to be the family of random variables $Z_y=X \mid (Y=y)$ where $y$ ranges over the support of $Y$, which then reduces things back to the case of $X \mid A$. I am less confident about the meaning of this when, say, $Y$ is continuous. — Ian, Apr 23 '21 at 15:34
https://math.stackexchange.com/questions/3586221/what-is-the-definition-of-xy-y?noredirect=1&lq=1 — , Apr 23 '21 at 15:38
@Ian That certainly carries the same information. It's possible that the answer to my question is "$\mathbf X \mid \mathbf Y$ doesn't mean anything, and to be more precise we should write $$(\mathbf X \mid \mathbf Y=y) \sim \textit{Binomial}(y,\frac12) \text{ for all }y \in R_{\mathbf Y}$$ instead of $\mathbf X \mid \mathbf Y \sim \textit{Binomial}(\mathbf Y, \tfrac12)$". But I'd like confirmation of this, if so. — Misha Lavrov, Apr 23 '21 at 15:39
@d.k.o. That's not the same thing; $\mathbf Y = y$ is an event, so $\mathbf X \mid \mathbf Y =y$ is just an instance of the first kind of conditioned random variable in my question. — Misha Lavrov, Apr 23 '21 at 15:40
Like the difference between $\mathsf{E}[X\mid Y]$ and $\mathsf{E}[X\mid Y=y]$? — , Apr 23 '21 at 15:43
Both (conditional) expectations describe the same object. Similarly, $X\mid Y\sim ...$ and $X\mid Y=y\sim ...$ describe the same conditional distribution — , Apr 23 '21 at 15:46
Maybe we could treat $X|Y$ as random measure? Meaning $\eta = X|Y$ is a random variable such that $\eta(B) = X|{Y \in B}$. However, not on all spaces it will be well defined. For example on Polish (or spaces borel isomorphic with $\mathbb R$) we have kernel $\eta:\Omega \times \mathcal B(\mathbb R) \to [0,1]$ such that $\eta(\omega,\cdot)$ is a probability measure for every $\omega \in \Omega$ and $\eta(\cdot,B) = \mathbb P(X \in B | Y)$. Maybe we could write $X|Y := \eta$ (it is consistent when $Y$ is dicrete (if $B={y}$ where $y \in R_Y$ we arrive at $\eta(\omega,y)=X|{Y=y}(\omega)$ — Presage, Apr 23 '21 at 16:34
@DominikKutek Right, and the uniqueness of this random measure follows from Theorem 2.2 in Random Measures, Theory and Applications by Olav Kallenberg https://link-springer-com.dianus.libr.tue.nl/book/10.1007%2F978-3-319-41598-7 (sorry, I could not find a free link). Perhaps you can post your comment as an answer? Best — Suman Chakraborty, May 16 '21 at 13:18
Personally, I interpret a phrase such as $X \mid Y \sim \textit{Binomial}(Y, 1/2)$ to be a short way of saying "The conditional distribution of $X$ given that $Y = y$ is the binomial distribution with parameters $y$ and $1/2$." So, in my mind there is no object called $X \mid Y$. I've never been totally comfortable with this notation though, so I like this question and I'm curious to know how probability experts think about it. — littleO, May 17 '21 at 21:20

Presage · Accepted Answer · 2021-05-17T21:11:57.017

Let me post my comment here (with a little adding of references).

Firstly, some notation. If we have two measurable spaces $(E_1,\mathcal E_1),(E_2,\mathcal E_2)$ and I say that $f:E_1 \to E_2$ is a random variable (or measurable function) without preciselly stating with respect to which sigma fields, then I assume that it's $\mathcal E_2 / \mathcal E_1$ measurable (meaning $f^{-1}[B] \in \mathcal E_1$ for any $B \in \mathcal E_2$)

Definition. Let $(\Omega,\mathcal F,\mathbb P)$ be a probabilistic space, and let $(E,\mathcal E)$ be a measurable space. Consider random variable $X:\Omega \to E$ and some sigma field $\mathcal G \subset \mathcal F$. We say that $\eta:\Omega \times \mathcal E \to \mathbb R$ is a regular conditional distribution of $X$ with respect to $\mathcal G$ iff:

For all $\omega \in \Omega$, function $\eta(\omega,\cdot):\mathcal E \to \mathbb R$ is a probability measure on $(E,\mathcal E)$.
For all $B \in \mathcal E$, function $\eta(\cdot,B):\Omega \to \mathbb R$ is $\mathcal G$ measurable
For all $B \in \mathcal E$, function $\eta(\cdot,B):\Omega \to \mathbb R$ is (a.s) equal to $\mathbb E[1_B(X) | \mathcal G] $

We're interested in case of $\mathcal G = \sigma(Y)$ for some random variable $Y:\Omega \to S$, where $(S,\mathcal S)$ is another measurable space. Note that in such case, $\eta$ is a good candidate to actually make sense of something like $X|Y$ (as far as I know, it's rather uncommon notation). Indeed, we can then identify $\eta:\Omega \times \mathcal E \to \mathbb R$ with $\xi:S\times \mathcal E \to \mathbb R$ in such a manner that $\eta(\omega,B) = \xi(Y(\omega),B)$. In other words, $\xi$ works as $\xi(y,B) = \mathbb P(X \in B | Y=y)$ (if we know how to make sense of the latter (*) - see below), when $y = Y(\omega)$.

But one may asks whether it always exists (we've just defined something in terms of $3$ conditions, so we cannot be sure that it even exists), or if the answer to the preceding question is negative, do we have some assumptions on our spaces/sigma fields to actually prove the existence of regular conditional distribution.

Here, I will state theorem in somehow "weird" way, but it will be easier to make references.

Theorem Assume that $(\Omega,\mathcal F,\mathbb P)$ is a probability space, $(E,\mathcal E)$ is a measurable space, $X:\Omega \to E$ is a random variable and $\mathcal G \subset \mathcal F$ is any $\sigma-$field. If any of those below holds

$(E,\mathcal E) = (\mathbb R, \mathcal B(\mathbb R))$
$E$ is separable, complete metric space (polish space) and $\mathcal E=\mathcal B(E)$ (borel sigma field)
$(E,\mathcal E)$ is borel isomorphic/borel equivalent to $(\mathbb R,\mathcal B(\mathbb R))$ ( see (**) below)

Then regular conditional distribution of $X$ with respect to $\mathcal G$ exists.

Obviously $2$ implies $1$. In fact, $3$ implies $2$ (in my opinion it is really non-trivial), but let us firstly define what we mean by borel isomorphism.

$(**)$ Definition We say that measurable space $(E,\mathcal E)$ is borel isomorphic to $\mathbb R$, if there is a map $f:E \to \mathbb R$ such that

$f(E) \in \mathcal B(\mathbb R)$
$f^{-1}[C] \in \mathcal E$ for any $C \in f(E) \cap \mathcal B(\mathbb R) := \{f(E) \cap B : B \in \mathcal B(\mathbb R)\}$
$f(A) \in f(E) \cap \mathcal B(\mathbb R)$ for any $A \in \mathcal E$.

Having said that, finally some references.

Proof of case 1) can be found (for example) in A.N. Shiryaev book "Probability" (second edition) in chapter II, paragraph 7. The author shows how to prove 3) if we already proved 1) and mentions that a polish space with borel sigma field is actually borel isomorphic to $(\mathbb R,\mathcal B(\mathbb R))$ so somehow Shiryaev's book is complete in terms of theorem I stated.

Worth to mention, that proofs (at least of case 1) can be found in many books about Markov Processes.

If anyone's interested in having regular conditional distribution on polish space with borel sigma field (i.e case 2)), but without this fact about borel isomorphism, then R.M. Dudley in his book "Real analysis and probability", in chapter 10, section 10.2 proves case 2) without reffering to borel isomorphism.

$(*)$ I've written something like $\mathbb E[1_B(X)|Y=y]$ but what does it actually mean? Let's look at $E[1_B(X)|Y]$ firstly. There is a following fact

Fact If $Z$ is a random variable with values in $(\mathbb R,\mathcal B(\mathbb R))$ (or even polish space $(E,\mathcal B(E))$) and $W$ is a random variable with values in any metric space $(S,\mathcal B(S))$, and moreover $Z$ is $\sigma(W)$ measurable, then we have some borel function $h:S \to \mathbb R$ (respectivelly $h:S \to E$) such that $Z=h(W)$.

Proof (at least for $\mathbb R$ case (which is sufficient for us)) goes in a standard way, that is firstly assume $Z=1_B$ for some borel set (then $h=1_{W^{-1}[B]}$ (why it's borel?)), then use linearity to pass to case of $Z$ - simple functions, then limiting procedure to pass to non-negative function (that is, any non-negative random variable can be approximated (increasingly) by sequence of simple functions ( random variables ), and lastly write $Z=Z^+ - Z^-$.

Using this with $Z=\mathbb E[1_B(X)|Y]$, $W=Y$, we see that $\mathbb E[1_B(X)|Y] = h_B(Y)$ for some borel function $h_B$ (depending on set $B$, of course), and notation $\mathbb E[1_B(X)|Y=y]$ means exactly $h_B(y)$.

In other words, in the case when such regular conditional distribution exists, we can treat $X|Y$ as a random measure such that $X|Y(\omega)(B) = h_B(Y(\omega))$, where $h_B$ is a borel function such that $h_B(Y) = \mathbb E[1_B(X)|Y]$.

What am I writing when I write $\mathbf X \mid \mathbf Y$?

1 Answers1