How can I derive Bellman's principle of optimality in discrete-time?

Question

In the context of discrete-time optimal control theory, Bellman's principle of optimality is useful for efficiently determining the control signal $\{u_k\}_{k=0}^{N-1}$ that minimizes the following objective function $$ J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) = \sum_{k=0}^{N-1} g_k (x_k,u_k) + g_N(x_N) $$ where $\{x_k\}_{k=0}^{N}$ is the trajectory of the state vector $x_k$, and $\{g_k\}_{k=0}^{N}$ are functions that are chosen such that $J\left(x_0,\{u_k\}_{k=0}^{N-1}\right)$ has a unique global minimum with respect to $\{u_k\}_{k=0}^{N-1}$. Moreover, $x_k$ evolves according to the difference equation $$x_{k+1} = f(x_k,u_k)$$ Bellman's principle of optimality states:

An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

How can I relate this principle to the minimization of $J\left(x_0,\{u_k\}_{k=0}^{N-1}\right)$ with respect to the control signal $\{u_k\}_{k=0}^{N-1}$?

You can check the books of Bertsekas on the topic as this iis a well known result. — KBS, Dec 29 '22 at 09:19
@KBS I already checked Bertsekas’ book on the topic, specifically chapter 1 of “Dynamic Programming and Optimal Control: Volume 1”, but I couldn’t find an explanation of the connection between the principle of optimality and the math. I provide a clear connection in my answer below, which is the reason I asked and answered this question in the first place. — mhdadk, Dec 29 '22 at 09:36
Why did you ask the question if you had the answer? There is everything you need to know about this result in published books. — KBS, Dec 29 '22 at 10:21
@KBS to make it more accessible to others. As I said, I couldn’t find this kind of derivation in the Bertsekas book or in Donald Kirk’s Optimal Control book. Also, according to this page: “If you have a question that you already know the answer to, and you would like to document that knowledge in public so that others (including yourself) can find it later, it's perfectly okay to ask and answer your own question on a Stack Exchange site.” — mhdadk, Dec 29 '22 at 10:40
@KBS I don’t understand why you downvoted my question and answer? What is wrong with them if I’m following SE guidelines? — mhdadk, Dec 29 '22 at 10:44

mhdadk · Accepted Answer · 2022-12-31T08:02:11.007

$\newcommand{bal}{\begin{align}}\newcommand{eal}{\end{align}}$Bellman's principle of optimality becomes apparent when the following optimization problem $$ \min_{u_0,u_1,\dots,u_{N-1}} J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) $$ is decomposed into smaller sub-problems as follows. First, because $$ \min_{u_0,u_1,\dots,u_{N-1}} J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) = \min_{u_0} \left[ \min_{u_1} \left[ \min_{u_2} \left[ \cdots \left[ \min_{u_{N-1}} J\left(x_0,\{u_k\}_{k=0}^{N-1}\right) \right]\right]\right]\right] \tag{1} \label{eq:decomp} $$ then we can re-write the optimization problem as \begin{align} \min_{u_0} \left[ \min_{u_1} \left[ \min_{u_2} \left[ \cdots \left[ \min_{u_{N-1}} \sum_{k=0}^{N-1} g_k (x_k,u_k) + g_N(x_N) \right]\right]\right]\right] \end{align} We can expand the inner summation to get $$ \min_{u_0} \left[\min_{u_1} \left[\min_{u_2} \left[\cdots \left[\min_{u_{N-1}} g_0 (x_0,u_0) + g_1 (x_1,u_1) + g_2 (x_2,u_2) + \cdots + g_{N-1} (x_{N-1},u_{N-1}) + g_N(x_N) \right]\right]\right]\right] $$ Because some of the terms in the objective function are constant with respect to the $\min$ operators, then we can take them out as follows (I've broken up the expression over multiple lines for clarity, as it gets too long to fit on one line) $$ \begin{align} &\min_{u_0} g_0 (x_0,u_0) \ + \\ &\Big[\min_{u_1} g_1 (x_1,u_1) \ + \\ &\Big[\min_{u_2} g_2 (x_2,u_2) \ + \\ &\Big[\cdots \\ &\Big[\min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) \ + \\ &\Big[\min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) \ + \\ &\Big[g_N(x_N)\Big]\Big]\Big]\Big]\Big]\Big] \end{align} $$ We then let $J_N(x_N) = g_N(x_N)$, which represents the terminal cost of being in the final state $x_N$. Using this substitution for $g_N(x_N)$, the optimization problem becomes $$ \begin{align} &\min_{u_0} g_0 (x_0,u_0) \ + \\ &\Big[\min_{u_1} g_1 (x_1,u_1) \ + \\ &\Big[\min_{u_2} g_2 (x_2,u_2) \ + \\ &\Big[\cdots \\ &\Big[\min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) \ + \\ &\Big[\min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N(x_N)\Big]\Big]\Big]\Big]\Big] \end{align} $$ Next, we let $$ \bal J_{N-1}(x_{N-1}) &= \min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N(x_N) \eal $$ Because $x_N = f\left(x_{N-1},u_{N-1}\right)$, then $$ \bal J_{N-1}(x_{N-1}) &= \min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N\left(f\left(x_{N-1},u_{N-1}\right)\right) \eal $$ Note that $J_{N-1}(x_{N-1})$ represents the minimal cost with respect to $u_{N-1}$ of a one-stage process with initial condition $x_{N-1}$. Using this substitution, the optimization problem becomes $$ \begin{align} &\min_{u_0} g_0 (x_0,u_0) \ + \\ &\Big[\min_{u_1} g_1 (x_1,u_1) \ + \\ &\Big[\min_{u_2} g_2 (x_2,u_2) \ + \\ &\Big[\cdots \\ &\Big[\min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}(x_{N-1})\Big]\Big]\Big]\Big] \end{align} \tag{2} \label{eq:opt_prob} $$ To see where Bellman's principle of optimality comes from, we just need to perform one last substitution: \begin{align} J_{N-2}(x_{N-2}) &= \min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}(x_{N-1}) \\ &= \min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}\left(f\left(x_{N-2},u_{N-2}\right)\right) \end{align} Here, $J_{N-2}(x_{N-2})$ represents the minimal cost with respect to $u_{N-2}$ of a two-stage process with initial condition $x_{N-2}$. In this context, Bellman's principle of optimality states that, if $J_{N-2}(x_{N-2})$ is minimized with respect to $u_{N-2}$ and $u_{N-1}$, then, regardless of the initial condition $x_{N-2}$ and the chosen control decision $u_{N-2}$, $J_{N-1}(x_{N-1})$ must be minimized with respect to $u_{N-1}$. To see why this must be true, we expand the $J_{N-1}(x_{N-1})$ term in the expression above for $J_{N-2}(x_{N-2})$ to get \begin{align} J_{N-2}(x_{N-2}) &= \min_{u_{N-2}} g_{N-2}(x_{N-2},u_{N-2}) + J_{N-1}(x_{N-1}) \\ &= \min_{u_{N-2}} \left[ g_{N-2}(x_{N-2},u_{N-2}) + \left[\min_{u_{N-1}} g_{N-1} (x_{N-1},u_{N-1}) + J_N\left(f\left(x_{N-1},u_{N-1}\right)\right)\right]\right] \tag{3} \label{eq:bellman_part2} \\ &= \min_{u_{N-2},u_{N-1}} g_{N-2}(x_{N-2},u_{N-2}) + g_{N-1} (x_{N-1},u_{N-1}) + J_N\left(f\left(x_{N-1},u_{N-1}\right)\right) \tag{4} \label{eq:bellman_part1} \end{align} We can see from \eqref{eq:bellman_part1} that $J_{N-2}(x_{N-2})$ is indeed minimized with respect to $u_{N-2}$ and $u_{N-1}$. Additionally, from \eqref{eq:bellman_part2}, we see that $J_{N-1}(x_{N-1})$ is minimized with respect to $u_{N-1}$, independent of $x_{N-2}$ and $u_{N-2}$. In essence, Bellman's principle of optimality is a result of the property described in \eqref{eq:decomp} and the structure of the objective function $J\left(x_0,\{u_k\}_{k=0}^{N-1}\right)$.

One other important concept to be aware of is that $J_{N-m}(x_{N-m})$ represents the minimal cost of an $m$-stage process with initial condition $x_{N-m}$ and the minimal cost for the final $m$ stages of an $N$-stage process with state at at time-step $k = N-m$ of $x_{N-m}$. In other words, as long as $N \geq m$, then every $m$-stage process is embedded inside an $N$-stage process. This means that, if $\{u_k^*\}_{k=0}^{N-1}$ are the optimal controls for the $N$-stage process with initial condition $x_0$, then $\{u_k^*\}_{k=N-m}^{N-1}$ are the optimal controls for the $m$-stage process with initial condition $x_{N-m}$. This is another way of stating Bellman's principle of optimality.

Going back to the optimization problem in \eqref{eq:opt_prob}, if we proceed with the rest of the substitutions until $k=0$, we get the following recurrence relation $$J_{k}(x_k) = \min_{u_k} \left[g_k(x_k,u_k) + J_{k+1}(x_{k+1})\right]$$ for $k = 0,1,\dots,N-1$ and the boundary condition $J_N(x_N) = g_N(x_N)$, which can be solved recursively starting from the boundary condition.

How can I derive Bellman's principle of optimality in discrete-time?

1 Answers1