Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning series)
- 作者: Richard S. Sutton,Andrew G. Barto
- 出版社/メーカー: A Bradford Book
- 発売日: 2018/11/13
- メディア: ハードカバー
- この商品を含むブログを見る
Sample code for the multi-armed bandits
Sample code for the Jack's car rental problem
Exercise 4.8
Since p<0.5, if you keep playing with a constant bet, you will eventually lose in average. So, at some points, you need to bet enough to win at once. In this particular case, the player decided to bet 50 when he/she has 50, and to bet 25 when he/she has 75 (hoping to win with this bet). Similarly, the player decided to bet 25 when he/she has 25 to reach 50 (hoping to win at the next bet).
Exercise 4.9
Exercise 4.10