Understanding Sutton's definition of the Projected Bellman Error

Question

I am reading Richard Sutton's textbook Reinforcement Learning, chapter $11.4$, and I am confused by his definition of the Projected Bellman Error.

He defines a norm on value functions $v: S \mapsto \mathbb{R}$ where $S$ is the set of states, as $$||v||_{\mu} = \sum_{s \in S} \mu(s) v(s)^2$$

Then assuming linear value functions for state vectors $x(s) \in \mathbb{R}^d$ given by $ v(s) = w^T x(s)$ for some vector $w \in \mathbb{R}^d$, a projection operation is defined for an arbitrary value function $v$ as

$$ \Pi v = \operatorname{argmin}_{w} ||v(s)-v_w(s)||^2$$

Further the Bellman Error at state $s$ for a value function $v_w$ from $v_\pi$ (value function given by policy $\pi$) as $$\overline{\delta_w}(s) = (\sum_{a} \pi(a|s) \sum_{(s', r)}p(s', r|s, a)[r+\gamma v_w(s')]) - v_w(s)$$ where $a$ is an action. The mean squared bellman error is defined as $\overline{BE}(w) = ||\overline{\delta_w}||^2_{\mu}$ (the norm of the Bellman Error vector).

Now I am trying to understand the definition of the Mean Squared Projected Bellman Error given by $$\overline{PBE}(w) = ||\Pi \overline{\delta_w}||^2_{\mu}$$

Here I am not sure how to interpret this. Is this then something like

$$\operatorname{argmin}_{w'} ||\overline{\delta_w} - v_{w'}||$$ for some other value function parametrized by $w'$, where I guess you can intepret the bellman error vector itself as a value function? I'm confused by this definition and am looking for a simple English interpretation.

Further the author states that with linear function approximation, there always exists an approximate value function (within the subspace of functions parametrized by $w$), with zero $\overline{PBE}$, the fixed $TD$ (temporal difference) point $w_{TD}$, where I am not sure why this is the case. Any insights much appreciated.

Ted Black · Accepted Answer · 2024-03-03T10:19:25.537

$\def\wmax{\mathrm{max}}$ $\DeclareMathOperator*{\argmin}{arg\,min}$ $\def\mnorm#1{\lVert #1 \rVert^2_\mu}$ $\def\calS{\mathcal{S}}$ $\def\calW{\mathcal{W}}$ $\def\qty#1{\left( #1 \right)}$ The definition of the projection operator $\Pi$ for the policy function $v$ is, $$ \Pi v := v_w \text{ where } w = \argmin_w \lVert v - v_w \rVert^2_\mu $$ The Bellman operator $B$ has the property, $$ v = Bv $$ so $\overline{\delta}_v=Bv-v=0$. Then, $$ \Pi \overline{\delta} = Bv_w - v_w \text{ where } w = \argmin_w \lVert Bv-v - (Bv_w - v_w) \rVert^2_\mu = \argmin_w \lVert Bv_w - v_w \rVert^2_\mu $$ If we denote the solution to the $\argmin$ as $\hat w$ then the Mean Square Projected Bellman Error is, $$ \overline{\mathrm{PBE}} = \mnorm{Bv_{\hat w} - v_{\hat w}} $$

From the definition of $\Pi$ it is clear that if $v(s)$, the policy function for a discrete set of states $\calS$, is written as a vector $\pmb{v}$ and then for a subspace of $\calS$, $\calW$, we define a linear map $\xi: \calW \rightarrow \calS$ we can represent $v_w$ as $\pmb{X} \pmb{w}$ where $\pmb{X}$ is the $\dim \calS \times \dim \calW $ matrix that represents $\xi$. Then, $$ \lVert v - v_w \rVert^2_\mu \quad : \quad \qty{\pmb{v} - \pmb{X} \pmb{w}}^T \pmb{D} \qty{\pmb{v} - \pmb{X} \pmb{w}} $$ where $\pmb{D}$ is a diagonal $\dim \calS \times \dim \calS $ matrix whose diagonal elements consist of the discrete conversion of $\mu(s)$ to a vector $\boldsymbol{\mu}$. This is a generalized least-squares problem and has a solution given by, $$ \hat{\pmb{w}}=\qty{\pmb{X}^{T}\pmb{D}\pmb{X}}^{-1} \pmb{X}^T \pmb{D} \pmb{v} $$ It follows that the projection operator can be written as, $$ \boldsymbol{\Pi} = \pmb{X}\qty{\pmb{X}^T\pmb{D}\pmb{X}}^{-1} \pmb{X}^T \pmb{D} $$ which is a $\dim \calW \times \dim \calS$ matrix.

Since, $$ (Bv)(s) := \sum_a \pi(a|s) \sum_{s',r} p(s',r|s,a)[r + \gamma v(s')] $$ we can follow the same steps to express the transformation using matrices and vectors. First, $$ \rho(s)=\sum_a \pi(a|s) \sum_{s',r} p(s',r|s,a) r $$ can be converted into a $\dim \calS \times 1$ vector $\boldsymbol{\rho}$. Then we can convert, $$ \sum_a \pi(a|s) \sum_{s',r} p(s',r|s,a) v(s') $$ into a matrix equation, $$ \pmb{P} \pmb{v} $$ where $\pmb{P}$ is a $\dim \calS \times \dim\calS$ matrix. So $B$ can be represented by the following affine transformation: $$ Bv \quad : \quad \boldsymbol{\rho} + \gamma \pmb{P} \pmb{v} $$ Since $v_w = \pmb{X} \pmb{w}$ we can write $Bv_{w} - v_{w}$ as, $$ \boldsymbol{\rho} + \gamma \pmb{P} \pmb{X} \pmb{w} - \pmb{X} \pmb{w} = \boldsymbol{\rho} + \qty{ \gamma\pmb{P}-\pmb{I}} \pmb{X} \pmb{w} $$ This is equal to the zero vector if, $$ \hat{\pmb{w}} = \qty{\qty{\pmb{I} - \gamma\pmb{P}} \pmb{X}}^{-1} \boldsymbol{\rho} $$ and hence $\overline{\mathrm{PBE}}=0$.

The notation in Sutton is the Projected Belled Error (PBE), but you have written BPE, is this just a different term/typo? — IntegrateThis, Mar 03 '24 at 02:20

Understanding Sutton's definition of the Projected Bellman Error

1 Answers1