Markov Decision Process for several players

Question

I just read a paper by the authors Kohlberg and Neyman stating that "A single-person stochastic game is known as a Markov Decision Process (MDP)." Does anyone know if the following extension to $n$ players might work?

Take, say, 2 players playing a game with actions $a\in A$ on a state space $S$. By inflating the state space to $S^2$ and the action space to $A^2$, both players´ positions and actions can be accounted for. The result is again an MDP, only with different state and action spaces.

Does this consitute an MDP with 2 (or $n$) players or am I overlooking anything?

Thanks in advance,

Leon

Does the states/actions of each player depend on the other player's? If not, why not just separate the chains? — Garmekain, May 31 '18 at 12:37
Unfortunately, the behavior does depend on the other one´s behavior in my case. — Leon, Jun 01 '18 at 10:35

score 1 · Answer 1 · answered Jun 15 '18 at 00:28

I am not sure why you have pairs of states. Just because you have more than one player, doesn't mean that you get more states. At each state, every agent observes the same state; it's just that only one of those players gets to decide on an action in a particular state.

So, I have come up with the following: A $l$-player MDP is $ ((S_i)_{i\in[l]},P,A,(R_i)_{i\in[l]},\gamma) $ where the pieces mean the following:

$S_i$ is the set of states where it's player $i$'s turn; all $S_i$ are pairwise disjoint
$P(s'|s,a)$ is the probability to get from $s$ to $s'$ with action $a$.
$A$ is set of actions; we assume the same action set for each state
$R_i(s,a)$ is the real reward for player $i$ doing action $a$ in state $s$
$0 \leq \gamma < 1$ is the discount factor

Now, let's define the value function $v_i$ for each player $i$, where $v_i(s)$ is the expected discounted sum of rewards for player $i$ from state $s$ onwards. If $s\in S_j$, then $$ v_i(s) = R(s,a(s)) + \sum_{s'} P(s'|s, a(s) )\cdot \gamma v_i(s'), $$ where $$a(s) = argmax_{a}\; R(s,a(s)) + \sum_{s'} P(s'|s,a )\cdot \gamma v_j(s'). $$

So basically, the value of state $s$ for player $i$ is the value of the expected next state if the action is chosen by the player whose turn it is in state $s$. Note that if $i=j$, so if it's player $i$'s turn, then the above formula becomes: $$ v_i(s) = \max_{a} R(s,a) + \sum_{s'} P(s'|s, a )\cdot \gamma v_i(s') $$ which is exactly as in a $1$-player MDP.

Markov Decision Process for several players

1 Answers1