Readings
Lesson 1: Episodic Sarsa with Function Approximation
Learning Objectives
Lesson 2: Exploration under Function Approximation
Learning Objectives
Lesson 3: Average Reward
Learning Objectives
Discussion prompt
What are the issues with extending some of the exploration methods we learned about bandits and Dyna to the full RL problem? How can we do visitation counts or UCB with function approximation?
A control agent with function approximation has to explore to find the best policy, learn a good state representation, and try to get a lot of reward, all at the same time. How might an agent balance these potentially conflicting goals?
References
Sutton, R. S., and A. G. Barto. 2018. Reinforcement Learning, Second Edition: An Introduction. Adaptive Computation and Machine Learning Series. MIT Press. http://incompleteideas.net/book/RLbook2020.pdf.
Reuse
CC SA BY-NC-ND
Citation
BibTeX citation:
@online{bochman2024,
author = {Bochman, Oren},
title = {Control with {Approximation}},
date = {2024-04-03},
url = {https://orenbochman.github.io/notes/RL/c3-w3.html},
langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Control with Approximation.” April 3,
2024. https://orenbochman.github.io/notes/RL/c3-w3.html.