Value iteration

https://d2l.ai/chapter_reinforcement-learning/value-iter.html

1 Like

Typo: the left hand side of the equation (17.2.9) should be pi*(s).

Great articles to clarify some basic ideas behind RL. I think it’s really a good starting point to begin RL learning.
But there are some small errors:

  1. The expectation over $r(s_0, a_0)$ is also needed for Eq. 17.2.2;
  2. In Eq. 17.2.9, it should be max rather than arg_max.

Seems like a typo in Equation 17.2.2

Shouldn’t Expectation over a_0, also include the first term?

do these code blocks still work? or do you need to follow the order of the book for the code to work? i haven’t been able to get the first chunk running for this and i’d really like to try out these exercises for reinforcement learning

There are two problems with this article here.
The first is equation 17.2.9, where ‘argmax’ should be corrected to ‘max’.
The second is that in the code implementation of the value_iteration function, as it says in its own comments, “Calculate \sum_{s‘} p(s’\mid s,a) [r + \gamma v_k(s’)]”, which should be fixed to “Calculate [r + \sum_{s’} p(s’\mid s,a) \gamma v_k(s’)]”, based on the previous equation 17.2.13.

Q[k,s,a] += pr * (reward + gamma * V[k - 1, nextstate])

should be fixed to:

Q[k,s,a] += (reward + pr * gamma * V[k - 1, nextstate])