Ritsuki_YAMADA
Typo: the left hand side of the equation (17.2.9) should be pi*(s).
Typo: the left hand side of the equation (17.2.9) should be pi*(s).
Great articles to clarify some basic ideas behind RL. I think it’s really a good starting point to begin RL learning.
But there are some small errors:
Seems like a typo in Equation 17.2.2
Shouldn’t Expectation over a_0, also include the first term?
do these code blocks still work? or do you need to follow the order of the book for the code to work? i haven’t been able to get the first chunk running for this and i’d really like to try out these exercises for reinforcement learning
There are two problems with this article here.
The first is equation 17.2.9, where ‘argmax’ should be corrected to ‘max’.
The second is that in the code implementation of the value_iteration function, as it says in its own comments, “Calculate \sum_{s‘} p(s’\mid s,a) [r + \gamma v_k(s’)]”, which should be fixed to “Calculate [r + \sum_{s’} p(s’\mid s,a) \gamma v_k(s’)]”, based on the previous equation 17.2.13.
Q[k,s,a] += pr * (reward + gamma * V[k - 1, nextstate])
should be fixed to:
Q[k,s,a] += (reward + pr * gamma * V[k - 1, nextstate])