2

What does the following notation mean?

$$V^{(k)}(s) = \max_{a \in A} \sum_{s' \in S}T(s,a,s') [R(s,a,s') + \gamma\cdot V^{(k-1)}(s')]$$

Does it mean max the value of the function with a element of A. What will be the result of the Function a element of A? What do they mean with a element of A max?

Siong Thye Goh
  • 149,520
  • 20
  • 88
  • 149

2 Answers2

4

Good question! I'll try to simplify the situation for you. Suppose, $A = \{1,2,3,4\}$. Also suppose there is a function $f: A\to\mathbb R$, such that $$f(1) = 0, f(2) = 3, f(3) = -2, f(4) = 1$$ I shall explain what $$\max_{a\in A} f(a)$$ means! Well, it essentially asks you to iterate over set $A$, compute the value $f(a)$, and return back the maximum value you see in the process. In our case, $\max_{a\in A} f(a) = 3$. Another useful notion is that of $\text{argmax}$. $$\operatorname{argmax}\limits_{a\in A} f(a) = 2$$ $\text{argmax}_{a\in A}f(a)$ returns the element(s) from set $A$, for which $f(a) $ is maximum. On the other hand, as you saw, $\max_{a\in A}$ returns the maximum value itself! Hope that clarifies the notation.

P.S. Isn't that a Bellman equation in Reinforcement Learning? Super cool, have fun!

  • Thank you very much, but what do you mean with: on the other hadn, max returns the maximum value itself? i thought it returns only element a from set A. and not the value from function V(s)? Yes the value is computed, but V(s) has only an element a as the result? – Hans Mustermann Mar 20 '21 at 19:31
  • 1
    That's incorrect! V(s) is actually the maximum value itself, and not the element a corresponding to it. After all, you want V(s) to be the value function (or its estimate after the kth bellman backup) at state s, which cannot be an "action". Do you see what I'm saying? – stoic-santiago Mar 21 '21 at 02:45
  • I see what you are saying but i have another question to it. If i have state s0 with a1 to s1 and a2 to s2. Then the function makes an addtition over s' (that would be s1 and s2 in this state). Now im asking myself does the function maximize all parts of the function((T(s0,a1,s1) + [....]) + (T(s0,a2,s2) + [...])? Because he cannot find just on a which maxizmize the function if i have one state where i can chose two different actions for two different states like in this example. That max makes only sense if you have one state who has two actions which lead to the same state.right? – Hans Mustermann Mar 21 '21 at 13:30
  • The system looks like: $$s_1 \stackrel{a_1}{\longleftarrow} s_0 \stackrel{a_2}{\longrightarrow} s_2$$ right? The action space has only two possibilities $A = {a_1,a_2}$. First, you evaluate the expression for $a_1$, and then for $a_2$. Whichever is larger, is what the max function returns to you. --- In your particular example, the only state one can go to after taking action $a_1$ is $s_1$, so when you sum over all $s'\in S$, please note that all terms of the form $T(s_0,a_1,s')[...]$ except the $T(s_0,a_1,s_1)[....]$ term are zero. – stoic-santiago Mar 21 '21 at 13:47
  • In the (deterministic) system you described, the sum is of no use (since it evaluates only to a single term). It is only useful in cases when you've got yourself a system like $$s_1 \stackrel{a_1}{\longleftarrow} s_0 \stackrel{a_1}{\longrightarrow} s_2$$ i.e. taking one action can take you to two states (probabilistically). – stoic-santiago Mar 21 '21 at 13:51
  • Yes its true i can only go to s1. but V*(s) would like (T(s0,a1,s1) + [...]) + (T(s0,a2,s2))? So maximize a makes only sense in your second example when one action can bring me into two states? Because otherwise(like in the first example) i would go in both directions because the function sums over s'? And s' is defined as every state you can reach through an action in s0 or in generally in s. And not all Terms like T(s0,a1,s') expect T(s0, a1,s1) have to be zero because it depends on the reward. i mean you what this function tells you is to go recursivly through all states s' from s. – Hans Mustermann Mar 21 '21 at 14:30
1

$\max_{a ∈ A} f(a)$ is just the maximum of the set $\{f(a): a ∈ A\}$, i.e. the largest of the values $f(a)$ where $a$ ranges the set $A$. In other notation, it's $\max f[A]$.

user87690
  • 9,133