Readings
Lesson 1: Learning Parameterized Policies
Learning Objectives
Lesson 2: Policy Gradient for Continuing Tasks
Learning Objectives
Lesson 3: Actor-Critic for Continuing Tasks
Learning Objectives
Lesson 4: Policy Parameterizations
Learning Objectives
Discussion prompt
Are tasks really ever continuing? Everything eventually breaks or dies. It’s clear that individual people do not learn from death, but we don’t live forever. Why might the continuing problem formulation be a reasonable model for long-lived agents?
References
Sutton, R. S., and A. G. Barto. 2018. Reinforcement Learning, Second Edition: An Introduction. Adaptive Computation and Machine Learning Series. MIT Press. http://incompleteideas.net/book/RLbook2020.pdf.
Reuse
CC SA BY-NC-ND
Citation
BibTeX citation:
@online{bochman2024,
author = {Bochman, Oren},
title = {Policy {Gradient}},
date = {2024-04-04},
url = {https://orenbochman.github.io/notes/RL/c3-w4.html},
langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Policy Gradient.” April 4, 2024. https://orenbochman.github.io/notes/RL/c3-w4.html.