Optimal control and Value function

Question

Let's consider this optimal control problem:

Minimize $-x(1)$,

subject to $dx(t)/dt=x(t)u(t)$ for almost every $t \in [0,1]$,

$x(0)=0$

among all the admissible controls $u:[0,1] \to [0,1]$ such that $u$ is Lebesgue measurable.

How can I compute the value function $V:[0,1] \times \mathcal{R} \to \mathcal{R}$?

@effezeta If you are taking a problem from a source, you need to cite this source in your post for reference. The initial condition cannot be correct as the origin is not controllable. — KBS, Jan 02 '23 at 21:52
@KBS the article is "Maximum Principle, Dynamic Programming, and Their Connection in Deterministic Control" by X.Y.Zhou — effezeta, Jan 02 '23 at 22:05
By the way, the initial condition $x(0)=0$ is not necessary to compute the value function. $V(t_0,x_0)$ is the infimum of the terminal cost $-x(1)$ among the trajectories $x(\cdot)$ that satisfy the initial condition $x(t_0)=x_0$. — effezeta, Jan 03 '23 at 12:05

score 0 · Answer 1 · answered Jan 03 '23 at 13:10

This can be solved using Dynamic Programming. Let $V(t,x)$ be the value function which must solve the Hamilton-Jacobi-Bellman equation

$$\min_{u\in[0,1]}\left\{\dfrac{\partial V(t,x)}{\partial t}+\dfrac{\partial V(t,x)}{\partial x}xu\right\}=0$$

together with the terminal condition $V(1,x)=-x$.

Let us consider now that $V(t,x)=p(t)x$, $p(1)=-1$. Then, we get that

$$\min_{u\in[0,1]}\left\{\dot{p}x+pxu\right\}=0.$$

So, if $px>0$, then $u=1$, and if $px\le 0$, then $u=0$.

Assume that $u=1$, then we have that $\dot p+p=0$ and we have that $p(t)=e^{-(t-1)}p(1)=-e^{1-t}$. This show that $V(t,x)=-xe^{1-t}$ when $p(t)x\le0$ or, equivalently, when $x\ge0$.

Assume that $u=0$, then we have that $\dot p=0$ and we have that $p(t)=c$ for some $c\in\mathbb{R}$. Since $p(1)=-1$, then $p(t)=-1$ and $V(t,x)=-x$ when $p(t)x>0$ or, equivalently, when $x<0$.

As a result, we have that

$$V(t,x)=\left\{\begin{array}{rl} -xe^{1-t},&\textrm{if } x\ge0,\\ -x,&\textrm{if } x<0\\ \end{array}\right.$$

I tried a different approach that doesn't use Dynamical Programming. Let's consider the problem $dx(t)/dt=x(t)u(t)$, $x(t_0)=x_0$. The trajectory is $x(t)=x_0*e^{\int_{t_0}^{t}u(s)ds}$. I want to minimize the cost $-x(1)$, that is, I want to maximixe $x(1)$. If $x_0 \ge 0$ then I take $u \equiv 1$ so $x(1)=x_0 e^{1-t_0}$ and $V(t_0,x_0)=-x(1)=-x_0 e^{1-t_0}$. If $x_0<0$ then I take $u \equiv 0$ so $x(1)=x_0$ and $V(t_0,x_0)=-x(1)=-x_0$. Is my solution correct? — effezeta, Jan 03 '23 at 23:17
Yes, that looks fine but you need to use an extra argument that the sign of $x(t)$ remains constant, which implies that there is no switching in the control input. — KBS, Jan 04 '23 at 08:48

Optimal control and Value function

1 Answers1