Before giving the Theorem of Von Neumann and asking my questions to its proof, I'll cite the Ergodic theorem of Birkhoff (out of Walters' "An Introduction to Ergodic Theory", p. 34) that is used in that proof of Von Neumann's Theorem:
Birkhoff Ergodic Theorem. Suppose $T\colon (X,\mathfrak{B},m)\to (X,\mathfrak{B},m)$ is measure-preserving (where we allow $(X,\mathfrak{B},m)$ to be $\sigma$-finite) and $f\in L^1(m)$. Then $\frac{1}{n}\sum_{i=0}^{n-1}f(T^i(x))$ converges a.e. to a function $f^*\in L^1(m)$. Also $f^*\circ T=f^*$ a.e. and if $m(X)<\infty$, then $\int f^*\, dm=\int f\, dm$.
Now to Von Neumann's Theorem (Walters, p. 36):
$L^p$ Ergodic Theorem of Von Neumann. Let $1\leq p<\infty$ and let $T$ be a measure-preserving transformation of the probability space $(X,\mathfrak{B},m)$. If $f\in L^p(m)$ there exists $f^*\in L^p(m)$ with $f^*\circ T=f^*$ a.e. and $\lVert (1/n)\sum_{i=0}^{n-1}f(T^ix)-f^*(x)\rVert_p\to 0$.
Here is the proof:
If $g$ is bounded and measurable then $g\in L^p$ and by the ergodic theorem we have that $$ \frac{1}{n}\sum_{i=0}^{n-1}g(T^ix)\to g^*(x)\text{ a.e.} $$ Clearly $g^*\in L^{\infty}(m)$ and hence $g^*\in L^p(m)$. Also $$ \lvert(1/n)\sum_{i=0}^{n-1}g(T^ix)-g^*(x)\rvert^p\to 0\text{ a.e.} $$ and by the bounded convergence theorem $$ \lVert (1/n)\sum_{i=0}^{n-1}g(T^ix)-g^*(x)\rVert_p\to 0. $$ If $\varepsilon > 0$ we can choose $N(\varepsilon,g)$ such that if $n>N(\varepsilon,g)$ and $k>0$ then $$ \left\lVert\frac{1}{n}\sum_{i=0}^{n-1}g(T^ix)-\frac{1}{n+k}\sum_{i=0}^{n+k-1}g(T^ix)\right\rVert_p <\varepsilon. $$
Let $f\in L^p(m)$ and $S_n(f)(x)=\frac{1}{n}\sum_{i=0}^{n-1}f(T^ix)$. We must show that $(S_n(f))_n$ is a Cauchy sequence in $L^p(m)$. Note that $\lVert S_n(f)\rVert_p\leq\lVert f\rVert_p$. Let $\varepsilon >0$ and choose $g\in L^{\infty}(m)$ such that $\lVert f-g\rVert_p < \varepsilon/4$. Then $$ \lVert S_nf-S_{n+k}f\rVert_p\leq\lVert S_nf-S_ng\rVert_p + \lVert S_ng-S_{n+k}g\rVert_p + \lVert S_{n+k}g-S_{n+k}f\rVert_p\\\leq \varepsilon/4 + \varepsilon/2 + \varepsilon/4 = \varepsilon $$ if $n> N(\varepsilon/2,g)$ and $k>0$. Therefore $(S_nf)_n$ is a Cauchy sequence in $L^p(m)$ and hence $\lVert S_f-f^*\rVert_p\to 0$ for some $f^*\in L^p(m)$.
We have $f^*\circ T=f^*$ a.e. because $$ \left(\frac{n+1}{n}\right)(S_{n+1}f)(x)-(S_nf)(Tx)=\frac{f(x)}{n}. $$
I have three questions concerning this proof.
1.) Why is (clearly) $g^*\in L^{\infty}(m)$?
I think it is because by the Birkhoff Ergodic Theorem it is $g^*\in L^1(m)$, i.e. $\int\lvert g^*\rvert\, dm<\infty$. From this it follows that $\lvert g^*\rvert < \infty$ a.e. and so it is $\text{ess}\sup_{x\in X}\lvert g^*(x)\rvert < \infty$.
2.) Why can we simply choose a function $g\in L^{\infty}(m)$ such that $\lVert f-g\rVert_p < \varepsilon/4$? I did understand that we do that in order to apply the first part of the proof but not why we can simply choose such a function. Maybe one may think of a simple function (which is bounded and measurable and therefore in $L^{\infty}$) which approximates $f$ good enough. Don't know.
3.) Why does the last identity show that $f^*\circ T=f^*$ a.e.? Do not see that. Especially why a.e.?
With greetings,
math12