mindmap root((RL)) Main Concepts [MDP] Continuing Tasks Episodic Tasks Markov Property [Reward] [Action Values] [Policy] {{maps states to likelihood of actions}} Deterministic {{one action per state}} Stochastic {{multiple actions per state}} Exploration Exploitation [Policy Ealuation - Predication] {} Control {} Dynamic programming Synchronous Asynchronous Learning On Policy learning {{Agents learn from their policy}} Off Policy learning {{Agents lean from another policy or Data}} Online Offline Optimistic initial values Math Bellman Equations {{State-Value Function}} {{Action-Value Function}} {{State-Value Optimality Function}} {{Action-Value Optimality Function}} Policy Improvement Theorem Algorithms [Bandits] Epsilon greedy Thompson sampling Upper confidence bound Contextual Regret Follow the normalized leader Contractual regret Greedyfication [Policy Iteration] [Generalized Policy Iteration] [Value Iteration] Brute force search Monte Carlo Bootstrapping [Sample Based Methods] [Temporal Difference Learning] [SARSA] [Q-Learning] [Function Approximation Methods] Others Dyna
Citation
BibTeX citation:
@online{bochman2025,
author = {Bochman, Oren},
title = {RL {MindMap}},
date = {2025-01-13},
url = {https://orenbochman.github.io/posts/2024/2024-03-25-rl-maps/},
langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2025. “RL MindMap.” January 13, 2025. https://orenbochman.github.io/posts/2024/2024-03-25-rl-maps/.