Policy Gradient

Prediction and Control with Function Approximation

Coursera
notes
rl
reinforcement learning
Author

Oren Bochman

Published

Thursday, April 4, 2024

Keywords

reinforcement learning, neural networks, feature construction, deep networks, The Policy Gradient Theorem, Policy Gradient, Actor-Critic Algorithm, Gaussian Policies

RL logo

RL logo

RL algorithms

RL algorithms

Lesson 1: Learning Parameterized Policies

Learning Objectives

Lesson 2: Policy Gradient for Continuing Tasks

Learning Objectives

Lesson 3: Actor-Critic for Continuing Tasks

Learning Objectives

Lesson 4: Policy Parameterizations

Learning Objectives
Discussion prompt

Are tasks really ever continuing? Everything eventually breaks or dies. It’s clear that individual people do not learn from death, but we don’t live forever. Why might the continuing problem formulation be a reasonable model for long-lived agents?

References

Sutton, R. S., and A. G. Barto. 2018. Reinforcement Learning, Second Edition: An Introduction. Adaptive Computation and Machine Learning Series. MIT Press. http://incompleteideas.net/book/RLbook2020.pdf.

Reuse

CC SA BY-NC-ND

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Policy {Gradient}},
  date = {2024-04-04},
  url = {https://orenbochman.github.io/notes/RL/c3-w4.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Policy Gradient.” April 4, 2024. https://orenbochman.github.io/notes/RL/c3-w4.html.