0

I'm not as well versed as I would like to be to confidently evaluate the following cost function. So any affirmation would be appreciated. Given an initial stage $x_0$

$$J_\pi(x_0) = \lim_{N \to \infty} \mathop{{}^{E}_{w_k}}_{k= 0,1,...,} \{\sum_{k=0}^{N-1}\alpha^kg(x_k,\mu_k(x_k),w_k)\}$$ which is subject to the discrete time system constraint

$$x_{k+1}=f(x_k, u_k,w_k), \qquad k=0,1,...,$$

where $x_k$ is the state, $u_k$ is the control, and $w_k$ is the random disturbance. The objective is to find the best policy $\pi = \{\mu_{0},\mu_{1},... \}$, where $\mu_k : S \mapsto C$, $u_k \in C $ and $ \forall x_k \in S$, so that the cost is minimized. I should also mention that $w_k$, like $x_k$ and $u_k$, has its own space $D$ a countable set. $w_k$ may be dependent on the current state $x_k$ and control $u_k$, but not on $k$ and thus previous disturbances don't affect. As could be implied, $w_k$ is characterized by probability distributions $P(w_k | x_k , u_k )$. Also the scalar $\alpha$ is referred to as the discount factor and $g: S \times C \times D \to \mathbb{R}$ is the cost per stage.

My questions are:
1. What difference does it make having upper bound $N-1$ instead of $N$ when I have a limit as $N$ approaches infinity? Is $N-1$ only written because $k = 0$ and we would like to say the Nth stage and mean it?
2. Is there any reason besides the $w_k$ being inside stuff that the expected value is outside of the sum to make it look pretty? Could it have been inside the sum but enclosing the cost per stage function $g$ instead?

Jack
  • 463

1 Answers1

1

I'm confused somewhat by the notation. You describe $w_k$ as an i.i.d. random variable conditional on the contemporaneous state and control variable which seems odd since this suggests the choice of any $u_k$ minimizes the cost function through the state equation $x_{k+1}$ and by affecting the support (and hence mean) of the $w_k$ shock. But perhaps this is standard in your terrain...?

With that caveat in mind:

  1. The sum running from $N$ or to $N-1$ is incidental in the limit since $\infty = \infty-1$. (This isn't really kosher, since $\infty$ is not a number.) For $N < \infty$ the indices matter if you're, for example, reducing the equation to prove convergence (which is here guaratneed for $|\alpha| < 1$).
  2. Formally, since $E[\bullet]$ is a linear operator, it can be written as either inside or outside summand. This because the expectations are applied at $k=0$ (clearer notation would be $E_0[\bullet]$). If expectations were repeatedly updated, e.g. $\sum_k E_k[\bullet]$, then past shocks $w_{t<k}$ would be part of the information. This would be a very different problem from the one posed in which all shocks are unknown. Hence, for clarity's sake it is preferable to put the expectations operator outside the sum.
  • This is late, but why does this work by only taking the expectation of $w_0$? – Jack Mar 30 '15 at 21:53
  • 1
    @Christopher - it's not that you take the expectation only of $w_0$, you take the expectation of $w_0, w_1, w_3, \ldots $ from time 0. So at any time $t=j$, $E_0[ \alpha^j g ( x_j, \mu_j (x_j), w_j) ] = \alpha^j g ( x_j, \mu_j (x_j), 0)$. Contemporaneous expectations give the mean of their shock, $E_1[w_1] = E_2[w_2] \cdots = 0$, but applied to the function at each $t$ requires you to take account of all previous shocks , i.e., $E_j[ \alpha^j g ( x_j, \mu_j (x_j), w_j) ] = \alpha^j g ( x_j (w_{j-1}, w_{j-2}, \ldots w_0), \mu_j (x_j (w_{j-1}, w_{j-2}, \ldots w_0), 0) $. – Tony Beans Apr 06 '15 at 20:36