0

I'm trying to tackle this question to understand MDP. Can someone explain how can you determine or calculate the number of distinct deterministic policies in the below MDP? Or resources where I can learn how to do this. I watched various videos and tried to google a way to calculate this but I seem to be going in circles. I have checked other examples as well but I don't see how it can be calculated.

The question is as follows,

Actions are represented by black dots. The pair of numbers beside each arrow indicates the transition probabilities (the left number) under action and the rewards (the right number) received when the agent goes to a state. Assume a discount factor of γ = 0.95.

MDP

Nick
  • 1

1 Answers1

0

Let $A$ be the set of actions, and $S$ be the set of states of an MDP. A deterministic policy of this MDP is the function $\pi:S \to A$. Therefore, the number of deterministic policies for this MDP is bounded by $|A|^{|S|}$, and this bound is exact if each action can be executed in every state. The idea here is that a deterministic policy selects one action for each state.

In your problem, there are 4 states, in 3 of which there are 2 actions to choose from, and in 1 state, there is 1 action. Thus, the number of deterministic policies for the MDP is $2 \cdot 2 \cdot 2 \cdot 1 = 8$.

Kuzja
  • 450
  • 4
  • 12