Control with Approximation

Prediction and Control with Function Approximation

Coursera
notes
rl
reinforcement learning
the k-armed bandit problem
bandit algorithms
exploration
explotation
epsilon greedy algorithm
sample avarage method
Author

Oren Bochman

Published

Wednesday, April 3, 2024

RL algorithms

RL algorithms

Lesson 1: Episodic Sarsa with Function Approximation

Learning Objectives

Lesson 2: Exploration under Function Approximation

Learning Objectives

Lesson 3: Average Reward

Learning Objectives
Discussion prompt

What are the issues with extending some of the exploration methods we learned about bandits and Dyna to the full RL problem? How can we do visitation counts or UCB with function approximation?

A control agent with function approximation has to explore to find the best policy, learn a good state representation, and try to get a lot of reward, all at the same time. How might an agent balance these potentially conflicting goals?

References

Sutton, R. S., and A. G. Barto. 2018. Reinforcement Learning, Second Edition: An Introduction. Adaptive Computation and Machine Learning Series. MIT Press. http://incompleteideas.net/book/RLbook2020.pdf.

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Control with {Approximation}},
  date = {2024-04-03},
  url = {https://orenbochman.github.io/notes/RL/c3-w3.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Control with Approximation.” April 3, 2024. https://orenbochman.github.io/notes/RL/c3-w3.html.