めもめも

このブログに記載の内容は個人の見解であり、必ずしも所属組織の立場、戦略、意見を代表するものではありません。

Reinforcement Learning 2nd Edition: Exercise Solutions (Chapter 2 - Chapter 5)

Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series)

Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series)

Chapter 2



Sample code for the multi-armed bandits

github.com

Chapter 3





Chapter 4



Sample code for the Jack's car rental problem

github.com

Exercise 4.8

Since p<0.5, if you keep playing with a constant bet, you will eventually lose in average. So, at some points, you need to bet enough to win at once. In this particular case, the player decided to bet 50 when he/she has 50, and to bet 25 when he/she has 75 (hoping to win with this bet). Similarly, the player decided to bet 25 when he/she has 25 to reach 50 (hoping to win at the next bet).

Exercise 4.9

github.com

Exercise 4.10

\displaystyle q_{k+1}(s,a) = \sum_{s',r}p(s',r\mid s,a)\left\{r+\gamma \max_{a'}q_{k}(s',a')\right\}

Chapter 5

github.com

Exercise 5.12

github.com