RL

Published

Monday, January 13, 2025

Policy Gradient

Sometimes, the behavior codified in the policy is much simpler than the action value function. Thus, learning the policy directly can be more efficient. Learning policies is…

Thursday, April 4, 2024

Control with Approximation

In this video, Adam White, discusses the algorithm for “Episodic SARSA with function approximation”. He explains how it can be used to solve reinforcement learning problems…

Wednesday, April 3, 2024

Constructing Features for Prediction

This is not a video lecture or notes for a learning goal. This is however my attempts to cover some material from the readings from chapter 9 of (Sutton and Barto 2018) menti…

Tuesday, April 2, 2024

Constructing Features for Prediction

We discussed methods for representing large, an possibly continuous state spaces. Ways to construct features. A representation is an agent’s internal encoding of the state…

Tuesday, April 2, 2024

On-Policy Prediction with Approximation

Some of the notes I made in this course became a bit too long. Rather than break the flow of the lesson I decided to move them to a separate file. This is one of those notes.

Monday, April 1, 2024

On-Policy Prediction with Approximation

I did not find the derivation of the SGD alg particularly enlightening and I have seen it several times. However the online setting is the best motivation for the use of SGD…

Monday, April 1, 2024

Sample-based Learning Methods

In these module we define cover model based RL sampling. We start with the Dyna architecture. Then we consider tabular Q-planning algorithm, the Tabular Dyna-Q and Dyna-Q+…

Monday, March 4, 2024

Temporal Difference Learning Methods for Control

This week, we will learn to using TD learning for control, as a generalized policy iteration strategy. We will see three different algorithms based on bootstrapping and…

Sunday, March 3, 2024

Temporal Difference Learning Methods for Prediction

In these unit we define some key terms like rewards, states, action, value functions, action values functions. Then we consider at the the multi-armed bandit problem leading…

Saturday, March 2, 2024

Monte-Carlo Methods for Prediction & Control

In this module we learn about Sample based MC methods that allow learning from sampled episodes. We revise our initial algorithm to better handle exploration. In off policy…

Friday, March 1, 2024

Dynamic Programming

In week 4 we learn how to compute value functions and optimal policies, assuming you have the MDP model. You will implement dynamic programming to compute value functions…

Thursday, May 5, 2022

Value Functions & Bellman Equations

In week 3 we learn about Value Functions and Bellman Equations, which are the key technology behind all the algorithms we will learn. We learn the definition of policies and…

Wednesday, May 4, 2022

Markov Decision Processes

In week 2 we learn about Markov Decision Processes (MDP) and how to compute value functions and optimal policies, assuming you have the MDP model. We implement dynamic…

Tuesday, May 3, 2022

The K-Armed Bandit Problem

In week 1 we define some key concepts like rewards, states, action, value functions, action values functions. We consider the the multi-armed bandit problem, leading to…

Monday, May 2, 2022

Course Introduction

In week 1 we define some key concepts like rewards, states, action, value functions, action values functions. We consider the the multi-armed bandit problem, leading to…

Sunday, May 1, 2022