Engineering Reinforcement Learning Algorithms

Author

Oren Bochman

Published

Wednesday, January 10, 2024

Keywords

compositionality naive compositionality language emergence deep learning neural networks signaling systems emergent languages topographic similarity positional disentanglement bag-of-symbols disentanglement information gap disentanglement

As I progressed though the RL specialization I came to realize that the research is mostly about coming up with a new algorithm that works better. This is tricky and so in many cases what happens is that people end up considering some narrow setting better yet is it is new and so they can publish a paper claiming novelty. On the other hand almost all algorithms revolve around a rather small set of techniques. One might call these techniques ‘magic tricks’ to avoid the fact that they are also grounded in computer science and mathematics. Some researches show thier algorithms converge better.

Outside academia RL is a tool. We don’t have quick simulators but we do have challenging examples. I notice that most people use a few algorithms rather than coding thier own from scratch. This makes a lot of sense if out problem is very similar to other problems that the algorithm has been tested on. But what if our problem is different? We can try a bunch of algorithms and see which one works best. But what if they are mediocre?

Another approach which I consider continuing the approach espoused in (CiationNeeded?) The Algorithm Design Manual by Steven S. Skiena and (CiationNeeded?) Algorithm Design by Jon Kleinberg and Éva Tardos.

I imagine a research algorithm that has all these tricks available in a modular way and that can adaptively reconfigure itself to the problem at hand.

And one more thing it would then expose its hyperparameters to be tuned by Optuna or some other hyperparameter optimization tool….

All this power suggest the one ring to rule them all so perhaps this might be the master algorithm.

Magical Tricks:

From the domain of Bandits

  • epsilon greedy exploration

  • upper confidence bound exploration

  • thompson sampling exploration

  • gittins index exploration

  • ‘stateless’ setting

note: it appears we might want the setting to be a configuration parameter

MDPs

  • Bellman Equation for V Value Function

  • Bellman Equation for Q Action Value Function

  • Bellman Optimality Equation for V Value Function

  • Bellman Optimality Equation for Q Action Value Function

  • Bellman Equation for Average Reward

  • Bellman Optimality Equation for Average Reward

  • discounted rewards gamma trick

  • average reward formulation

Exploration using - Optimistic Initialisation

  • algorithmic tricks
    • value iteration
    • policy iteration
    • generalized value iteration
    • generalized policy iteration
  • tabular setting
  • limited horizon setting
  • infinite horizon setting

Montecarlo Methods

  • First visit Montecarlo

  • Every visit Montecarlo

  • Monte Carlo target

  • episodic setting

  • in the continuos setting Monte Carlo require an eligibility traces, a replay buffer or time limit i.e. temporal abstraction.

Temporal Difference Learning

  • td targets for V Value Function

  • td targets for Q Action Value Function

  • bootstrapping

  • SARSA

  • Q-Learning

  • Expected SARSA

Model Based RL

  • replay buffer

  • proretised replay buffer

  • prioritised sweeping

  • Dyna-Q

  • Dyna-Q+

Temporal Aggregation

  • SMDP (semi-markov decision processes)

  • options (temporal abstraction)

  • gvf (generalized value functions)

  • meta gradients - learning Rewards over multiple agent lifetime

  • ‘liflelong’ setting

Partial Observability

  • POMDPs

Multi-Agent RL

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Engineering {Reinforcement} {Learning} {Algorithms}},
  date = {2024-01-10},
  url = {https://orenbochman.github.io/posts/2025/2025-01-10-Engineering-RL-Algs/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Engineering Reinforcement Learning Algorithms.” January 10, 2024. https://orenbochman.github.io/posts/2025/2025-01-10-Engineering-RL-Algs/.