1

I am following the course CS 285 Deep Reinforcement Learning from UC Berkeley. In lecture 4, part 1 (around 17:00), prof. introduces an expectation of rewards of a trajectory over a policy. He first introduces the probability of a trajectory over theta as follows. \begin{aligned} \underbrace{p_{\theta}\left(\mathbf{s}_{1}, \mathbf{a}_{1}, \ldots, \mathbf{s}_{T}, \mathbf{a}_{T}\right)}_{p_{\theta}(\tau)}=p\left(\mathbf{s}_{1}\right) \prod_{t=1}^{T} \underbrace{\pi_{\theta}\left(\mathbf{a}_{t} \mid \mathbf{s}_{t}\right) p\left(\mathbf{s}_{t+1} \mid \mathbf{s}_{t}, \mathbf{a}_{t}\right)}_{\text {Markov chain on }(\mathbf{s}, \mathbf{a})} \end{aligned}

Then he says that we can use linearity of expectation at this point and writes the following equality.

\begin{aligned} \theta^{\star} &=\arg \max _{\theta} E_{\tau \sim p_{\theta}(\tau)}\left[\sum_{t} r\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)\right] \\ &=\arg \max _{\theta} \sum_{t=1}^{T} E_{\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right) \sim p_{\theta}\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)}\left[r\left(\mathbf{s}_{t}, \mathbf{a}_{t}\right)\right] \end{aligned}

And tells that $p_{\theta}(\mathbf{s}_t, \mathbf{a}_t)$ is state-action marginal. I am trying to understand this. Even though we have markov assumption, $p_{\theta}(\mathbf{s}_t, \mathbf{a}_t)$ still depends on $(\mathbf{s}_{t-1}, \mathbf{a}_{t-1})$ so the elements of the joint distribution $p_\theta(\tau)$ are not independent. Don't we need to use conditional probabilities for such expansion of the joint probability in the second equation? Isn't this kind of equal to saying $E_{x,y \sim p(x,y)}[f(x,y)] = E_x[f(x,y)] + E_y[f(x,y)]$. Maybe this is doable when $f(x,y)$ is something like $x+y$ which is I believe similar to the example.

1 Answers1

1

Linearity of expectation does not require independence: it holds whenever the marginal expectations are finite, even if there is dependence among the random variables.

heropup
  • 135,869
  • Thanks. It turns out my question is more about the marginalization of the joint probability in the first expectation rather than the linearity of expectation. I think we can not write $p_\theta(s_t,a_t,s_{t-1},a_{t-1}) = p_\theta(s_t,a_t)p_\theta(s_{t-1},a_{t-1})$ and therefore we can not write the second equality. Can we write the second expectation in that form easily even when $p_\theta(s_t,a_t)$ is dependent on $(s_{t-1},a_{t-1})$ ? – iRestMyCaseYourHonor Mar 15 '21 at 11:06