Value iteration

1 Like

Typo: the left hand side of the equation (17.2.9) should be pi*(s).

Great articles to clarify some basic ideas behind RL. I think it’s really a good starting point to begin RL learning.
But there are some small errors:

  1. The expectation over $r(s_0, a_0)$ is also needed for Eq. 17.2.2;
  2. In Eq. 17.2.9, it should be max rather than arg_max.

Seems like a typo in Equation 17.2.2

Shouldn’t Expectation over a_0, also include the first term?