Let me post my comment here (with a little adding of references).
Firstly, some notation. If we have two measurable spaces $(E_1,\mathcal E_1),(E_2,\mathcal E_2)$ and I say that $f:E_1 \to E_2$ is a random variable (or measurable function) without preciselly stating with respect to which sigma fields, then I assume that it's $\mathcal E_2 / \mathcal E_1$ measurable (meaning $f^{-1}[B] \in \mathcal E_1$ for any $B \in \mathcal E_2$)
Definition. Let $(\Omega,\mathcal F,\mathbb P)$ be a probabilistic space, and let $(E,\mathcal E)$ be a measurable space. Consider random variable $X:\Omega \to E$ and some sigma field $\mathcal G \subset \mathcal F$. We say that $\eta:\Omega \times \mathcal E \to \mathbb R$ is a regular conditional distribution of $X$ with respect to $\mathcal G$ iff:
For all $\omega \in \Omega$, function $\eta(\omega,\cdot):\mathcal E \to \mathbb R$ is a probability measure on $(E,\mathcal E)$.
For all $B \in \mathcal E$, function $\eta(\cdot,B):\Omega \to \mathbb R$ is $\mathcal G$ measurable
For all $B \in \mathcal E$, function $\eta(\cdot,B):\Omega \to \mathbb R$ is (a.s) equal to $\mathbb E[1_B(X) | \mathcal G] $
We're interested in case of $\mathcal G = \sigma(Y)$ for some random variable $Y:\Omega \to S$, where $(S,\mathcal S)$ is another measurable space. Note that in such case, $\eta$ is a good candidate to actually make sense of something like $X|Y$ (as far as I know, it's rather uncommon notation). Indeed, we can then identify $\eta:\Omega \times \mathcal E \to \mathbb R$ with $\xi:S\times \mathcal E \to \mathbb R$ in such a manner that $\eta(\omega,B) = \xi(Y(\omega),B)$. In other words, $\xi$ works as $\xi(y,B) = \mathbb P(X \in B | Y=y)$ (if we know how to make sense of the latter (*) - see below), when $y = Y(\omega)$.
But one may asks whether it always exists (we've just defined something in terms of $3$ conditions, so we cannot be sure that it even exists), or if the answer to the preceding question is negative, do we have some assumptions on our spaces/sigma fields to actually prove the existence of regular conditional distribution.
Here, I will state theorem in somehow "weird" way, but it will be easier to make references.
Theorem Assume that $(\Omega,\mathcal F,\mathbb P)$ is a probability space, $(E,\mathcal E)$ is a measurable space, $X:\Omega \to E$ is a random variable and $\mathcal G \subset \mathcal F$ is any $\sigma-$field. If any of those below holds
$(E,\mathcal E) = (\mathbb R, \mathcal B(\mathbb R))$
$E$ is separable, complete metric space (polish space) and $\mathcal E=\mathcal B(E)$ (borel sigma field)
$(E,\mathcal E)$ is borel isomorphic/borel equivalent to $(\mathbb R,\mathcal B(\mathbb R))$ ( see (**) below)
Then regular conditional distribution of $X$ with respect to $\mathcal G$ exists.
Obviously $2$ implies $1$. In fact, $3$ implies $2$ (in my opinion it is really non-trivial), but let us firstly define what we mean by borel isomorphism.
$(**)$ Definition We say that measurable space $(E,\mathcal E)$ is borel isomorphic to $\mathbb R$, if there is a map $f:E \to \mathbb R$ such that
$f(E) \in \mathcal B(\mathbb R)$
$f^{-1}[C] \in \mathcal E$ for any $C \in f(E) \cap \mathcal B(\mathbb R) := \{f(E) \cap B : B \in \mathcal B(\mathbb R)\}$
$f(A) \in f(E) \cap \mathcal B(\mathbb R)$ for any $A \in \mathcal E$.
Having said that, finally some references.
Proof of case 1) can be found (for example) in A.N. Shiryaev book "Probability" (second edition) in chapter II, paragraph 7. The author shows how to prove 3) if we already proved 1) and mentions that a polish space with borel sigma field is actually borel isomorphic to $(\mathbb R,\mathcal B(\mathbb R))$ so somehow Shiryaev's book is complete in terms of theorem I stated.
Worth to mention, that proofs (at least of case 1) can be found in many books about Markov Processes.
If anyone's interested in having regular conditional distribution on polish space with borel sigma field (i.e case 2)), but without this fact about borel isomorphism, then R.M. Dudley in his book "Real analysis and probability", in chapter 10, section 10.2 proves case 2) without reffering to borel isomorphism.
$(*)$ I've written something like $\mathbb E[1_B(X)|Y=y]$ but what does it actually mean? Let's look at $E[1_B(X)|Y]$ firstly. There is a following fact
Fact If $Z$ is a random variable with values in $(\mathbb R,\mathcal B(\mathbb R))$ (or even polish space $(E,\mathcal B(E))$) and $W$ is a random variable with values in any metric space $(S,\mathcal B(S))$, and moreover $Z$ is $\sigma(W)$ measurable, then we have some borel function $h:S \to \mathbb R$ (respectivelly $h:S \to E$) such that $Z=h(W)$.
Proof (at least for $\mathbb R$ case (which is sufficient for us)) goes in a standard way, that is firstly assume $Z=1_B$ for some borel set (then $h=1_{W^{-1}[B]}$ (why it's borel?)), then use linearity to pass to case of $Z$ - simple functions, then limiting procedure to pass to non-negative function (that is, any non-negative random variable can be approximated (increasingly) by sequence of simple functions ( random variables ), and lastly write $Z=Z^+ - Z^-$.
Using this with $Z=\mathbb E[1_B(X)|Y]$, $W=Y$, we see that $\mathbb E[1_B(X)|Y] = h_B(Y)$ for some borel function $h_B$ (depending on set $B$, of course), and notation $\mathbb E[1_B(X)|Y=y]$ means exactly $h_B(y)$.
In other words, in the case when such regular conditional distribution exists, we can treat $X|Y$ as a random measure such that $X|Y(\omega)(B) = h_B(Y(\omega))$, where $h_B$ is a borel function such that $h_B(Y) = \mathbb E[1_B(X)|Y]$.