replay buffer questions – Oren Bochman’s Blog

Replay Buffer

for continuous environment we should think about coverage.

given a paramertrization of the value function, for a level of generalization/discrimination we get an induced set of features. Is some set of experiences sufficent to do prediction or control.
if we have an estimate of the coverage can we use it to place a bound on the error of the value function.
can we do better if we also have an estimate \mu(s) of the importance/long term probability of the states ?

Traces present a highly correlated view of the state space.

How much do we need to wory about this.

does replay buffer violate markov state.?

according to Shirli Di-Castro Shashua
- Analysis of Stochastic Processes through Replay Buffers
- Sim and Real: Better Together
- the storage operation preserves the markov property
- the sampling operation preserves the markov property
- the mean operation om the replay buffer violates the markov property…

can reduce correlation between samples ?
can we be more stategic about what we keep in the RB

say we have a key using a hash[\delta(state), action] neighbourhood
- we can use the key to decide if to insert/replace the current buffer
- we can use it to decide what to discard
we can use the buffer to estimate mu(s)
- might also have more info like states we did not insert or deleted.
- if we also have mu(mu) - the state importance to decide what to keep
do we prefer complete recent traces or many partial traces.

Can we use options/skills to orgenize the buffer more effectively ?

we should aim to keep full options traces in the buffer
keep traces in & out or options.
before and after the options.

Think of the four room environment - there are different options to get from one room to another. they are composable. Once we have good coverage entry into the op

Ergodicity

in an environment is a maze and I have a one way door dividing the left side from the right parts of the maze. is this environment ergodic ?
If not how come we can still learn the optimal policy ?

interchip dotan castro - sim to real

Replay buffers -

storing sequence of states
State action state

PMDPs

Citation

BibTeX citation:

@online{bochman2025,
  author = {Bochman, Oren},
  title = {Replay Buffer Questions},
  date = {2025-01-13},
  url = {https://orenbochman.github.io/posts/2024/2024-07-01-generalization-in-ML/2024-07-01-replay-buffer-questions.html},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2025. “Replay Buffer Questions.” January 13, 2025. https://orenbochman.github.io/posts/2024/2024-07-01-generalization-in-ML/2024-07-01-replay-buffer-questions.html.