Reinforcement Learning 2nd Edition: Exercise Solutions (Chapter 2 - Chapter 5)

Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series)

作者: Richard S. Sutton,Andrew G. Barto
出版社/メーカー: A Bradford Book
発売日: 2018/11/13
メディア: ハードカバー
この商品を含むブログを見る

Chapter 2 Sample code for the multi-armed bandits

github.com

Chapter 3

Sample Notebook for Gridworld example

github.com

Chapter 4 Sample code for the Jack's car rental problem

github.com

Exercise 4.8

Since p<0.5, if you keep playing with a constant bet, you will eventually lose in average. So, at some points, you need to bet enough to win at once. In this particular case, the player decided to bet 50 when he/she has 50, and to bet 25 when he/she has 75 (hoping to win with this bet). Similarly, the player decided to bet 25 when he/she has 25 to reach 50 (hoping to win at the next bet).