Define a feature as a one-hot representation of states , that is, .
There are choices of () for .
. Hence there are features.
Use rectangular tiles rather than square ones.
In this problem, sophisticated policies are required to reach the terminal state. Hence, an episode never end with an (arbitrarily chosen) initial policy.
Longer episodes are less reliable to estimate the true value.
In "Differential semi-gradient Sarsa" on p.251, replace the definition of with the following one: