What is the definition of $X|(Y=y)$?

Question

Suppose $S$ is a sample space (the set of all outcomes $\omega_i$) for an experiment. A random variable $X$ is defined as a real-valued function which maps elements from the sample space to real numbers, i.e. $X:S\to \mathbb R$.

Discrete Random variable:

The definition of the conditional probability mass function of $X$ given $Y=y$ is $$\mathbb P(X=x|Y=y)=\frac{\mathbb P(X=x, Y=y)}{\mathbb{P}(Y=y)} .$$

Question: In lecture slides I have seen the notation, for example, that $X|(Y=y) \sim \text{Bin}(m, \lambda).$ What is the definition of $X|(Y=y)$? Is it a random variable itself with a restricted sample space? Maybe $X|(Y=y): \{\omega\in S: Y(\omega)=y \} \to \mathbb R$?

What would be the definition of $X|(Y=y)$ for $X$ and $Y$ being continuous random variables?

(Note: If it isn't a random variable, then how can we talk about it's distribution and expected value?)

It denotes the conditional distribution of $X$ given $Y$. However, $X\mid Y=y\sim \text{Bin}(m,\lambda)$ implies that $X$ is independent of $Y$ unless $m$ and/or $\lambda$ depend on $y$. — , Mar 19 '20 at 01:04
@d.k.o. Yes I had just put random parameters there to explain the form of what I had seen. How should one interpret the idea of distribution without a random variable being described with it? Or maybe rather, is it possible to define a random variable $W$ which has this conditional distribution? In which case would appropriate notation for $W$ be $X|Y=y$? — user523384, Mar 19 '20 at 01:10
A (probability) distribution is just a function having certain properties. — , Mar 19 '20 at 01:12
Maybe to restate, if $X$ given $Y$ has a conditional distribution $\mathbb P(X=x|Y=y)$, does that mean $X|Y=y$ is itself a random variable? If not, why not (Is there something wrong with this interpretation?) — user523384, Mar 19 '20 at 01:13
Similar (as an analogy) to how in function notation we denote $(f\circ g)$ to be a function such that $(f\circ g)(x) = f(g(x))$, would it be appropriate to think of $X|(Y=y)$ being notation for a new random variable with distribution being the conditional distribution of $X$ given $Y=y$? — user523384, Mar 19 '20 at 01:16
No, $X\mid Y=y$ is not a random variable. I've seen similar questions here. Check, for example, this post. — , Mar 19 '20 at 01:21
@d.k.o. Yes actually I did see this post, but I honestly didn't quite understand what it meant. I've only learn a little bit of probability with measure theory yet. Is there a simple way to explain why it can't be? — user523384, Mar 19 '20 at 01:23
Also, I think this post was talking about how $(X|Y)$ was not a random variable. But does this also hold for $X|(Y=y)$? — user523384, Mar 19 '20 at 01:23
@user523384 Technically speaking, no new object denoted by $X|Y=y$ is ever defined in the theory of conditional probability. The new object introduced is the conditional distribution: conditional on the event ${Y=y}$ one can define a (new and valid) e.g. mass function $P(X=x|Y=y)$ that obeys all the usual axioms of probability, for each fixed $y$. There simply are no “conditional random variables”, only random variables that when we have some extra knowledge, may turn out to have a more useful conditional distribution, conditional on some event, than their unconditional distribution. — Nap D. Lover, Mar 19 '20 at 01:26
@NapD.Lover Ohh wow I think that makes a lot of sense. So instead of defining a new random variable, this is more like "recalibrating" the probability distribution of $X$ given we know something about $Y$? This recalibration being our own definition on how we think we can best use the information to make a new probability assignment/distribution closer modelling the situation given the new information? — user523384, Mar 19 '20 at 01:29
Technically, for a given $y$ one may define a probability space and a random variable $X_y$ living on that space s.t. the distribution of $X_y$ is the distribution implied by your notation. — , Mar 19 '20 at 01:30
@user523384 Yes, I think that is a good intuitive interpretation. You may consider summarizing all the comments here into an answer of your own question (which is allowed and even encouraged here, by the way!) if you wish. — Nap D. Lover, Mar 19 '20 at 01:42
@d.k.o Yes, I think if we have a valid distribution, then we should be able to say some random variable has this distribution. I thought that would have been the "definition" of a "conditional random variable," if I existed. I'm still not 100% sure why we can't define it this way, but I think I'm okay to under this as finding a new distribution to assign to $X$ — user523384, Mar 19 '20 at 01:42
@NapD.Lover (would you happen to know "why" we didn't define a conditional random variable in this way - were there obstacles in doing so?)
And yes, thank you for the suggestion! I think I will summarise these comments into an answer. Thanks so much for your help — user523384, Mar 19 '20 at 01:44
@user523384 Historically I cannot say. A. Kolmogorov gave the modern axiomatization of probability and, if I recall correctly, extended conditional probability beyond cases of simple finite or countable sample spaces, so you may find some answers in historical reviews or surveys of his work (Foundation of Theory of Probability). But this is, of course, measure theoretic based. — Nap D. Lover, Mar 19 '20 at 01:59

score 4 · Accepted Answer · answered Mar 19 '20 at 02:00

Summarising the very helpful comments from @Nap D. Lover and @d.k.o. - In the original theory of conditional probability, there is no such definition of a "conditional random variable."

Before addressing the notation, a thought about the "requirement" of a conditional random variable

The purpose of a conditional distribution, $\mathbb P(X=x|Y=y)$, is a way to "recalibrate" the probability assignment/distribution for $X$, given we received information about $Y$. (Which intuitively, could be the probability distribution of the temperature $X$ as $\mathbb P(X=x)$ vs. the probability distribution of the temperature $X$, given the humidity $Y$ was $y$, being $\mathbb P(X=x|Y=y)$). It is still a probability distribution designed for the random variable $X$, just "recalibrated" to better model the "true" probabilities for the given situation.
So I guess, in a way, a new random variable for a "conditional random variable" is not really necessary. While it is possible to define a random variable $X_y$ living on a new restricted sample space, maybe it moves away from the idea of this distribution being "rediagnosis" of what the probability distribution of $X$ should be, given the new "symptoms" ($Y=y$).
Hence it makes sense to only need Conditional distributions and Conditional expectation (The expected value of $X$, but weighted in a different way to account for the new information) etc, and not a new random variable itself.

The notation: So the interpretation of the notation can be left as what @d.k.o. said in the very first comment, $X|(Y=y) \sim \text{Bin}(m, \lambda)$ is just shorthand notation for saying "The distribution of $X$, conditioned on $Y=y$, is (from the definition in the question) $\text{Bin}(m, \lambda)$.

$X|(Y=y) \sim \text{Bin}(m, \lambda)$ is an abuse of notation, and you should avoid using it. — kludg, Mar 19 '20 at 04:34
Hi @user523384: I wrote the "other" question on this topic, and I have thought about it some since then. I agree with this summary. But I think that defining $X_y$ (in your notation) tends to actually emphasize the conditional nature of its value -- the condition is right there in its name! Another thought: without getting to far into measure theory, there are definitely technical issues with defining this idea in general (i.e., there isn't going to be a single expansive definition that covers all the cases). But the "intuition" is sound. Conditioning really is a kind of composition — nomen, Sep 07 '21 at 23:51
which is why our intuition was so strongly drawn to the idea of "picking a distribution at random" or "picking a density function at random" in this way. The best analogy I can come up with is as if we're doing naive set theory: It is correct on simple domains but incoherent on complicated enough domains (which is why we need measure theory in the first place). — nomen, Sep 07 '21 at 23:53
Also, I found some resources relevant to actually doing this project in its potential generality on the other math se: https://mathoverflow.net/questions/20740/is-there-an-introduction-to-probability-theory-from-a-structuralist-categorical. There is also a pretty famous paper that discusses the nature of the conditioning operator in terms of "disintegrations" (http://www.stat.yale.edu/~jtc5/papers/ConditioningAsDisintegration.pdf) — nomen, Sep 08 '21 at 16:41

What is the definition of $X|(Y=y)$?

1 Answers1

Linked