2

I'm looking at this (warning: this is a download of a pdf) paper and am having trouble parsing the notation on top of page 11, steps 4.1 and 4.2.

$\forall i \leq t \in T$, $\forall$ $x_i$, $a_i$ Update all Q-Values according to their eligibility traces

$Q_t^{k+1}$($x_i$, $a_i$) $\leftarrow$ $Q_i^{k}$($x_i$, $a_i$) + $\alpha$($x_i^k$,$a_i^k$)$\delta_t^k$$e_t^k$$(x_i,a_i)$

Specifically, I'm having trouble telling what the i is all about in step 4.1. *i and t seemed to be used interchangeably, but I'm sure that's not actually what's going on. Any help would be greatly appreciated.

  • When I tried to follow your link to view the page you ask about, it seems to be a link to download something rather than a PDF or HTML version of the paper. This is really undesirable as a way of asking a question. Please put the formula you want to ask about in the body of your Question, preferably using MathJax/LaTeX but as an alternative an image could be posted so that Readers could help you out with editing in the $\LaTeX$ version of the formula(s). – hardmath Jul 03 '15 at 22:54
  • I tried to post an image, but it won't let me because of my rep. I'll see if I can figure out the LaTeX. – user3704120 Jul 03 '15 at 22:58

2 Answers2

0

Read $\forall$ as "for all" or "for each" and $\in$ as "in" or "belongs to".

pshmath0
  • 10,565
0

I think your confusion is because you have two loops:

foreach t ranging from 1 to m
   ...
   foreach  i ranging from 1 to t
       update Q values, etc...
   ...

Your algorithm is using eligibility traces.

You sample a trajectory of maximum $m$ steps and the current trajectory length is $t$. On every step you revisit every previous state from 1 to $t$, and you use $i$ to track that.

Juan Leni
  • 152