Replay Buffer
- for continuous environment we should think about coverage.
- given a paramertrization of the value function, for a level of generalization/discrimination we get an induced set of features. Is some set of experiences sufficent to do prediction or control.
- if we have an estimate of the coverage can we use it to place a bound on the error of the value function.
- can we do better if we also have an estimate \mu(s) of the importance/long term probability of the states ?
- Traces present a highly correlated view of the state space.
- How much do we need to wory about this.
- does replay buffer violate markov state.?
- according to Shirli Di-Castro Shashua
- Analysis of Stochastic Processes through Replay Buffers
- Sim and Real: Better Together
- the storage operation preserves the markov property
- the sampling operation preserves the markov property
- the mean operation om the replay buffer violates the markov property…
- can reduce correlation between samples ?
- can we be more stategic about what we keep in the RB
- say we have a key using a hash[\delta(state), action] neighbourhood
- we can use the key to decide if to insert/replace the current buffer
- we can use it to decide what to discard
- we can use the buffer to estimate mu(s)
- might also have more info like states we did not insert or deleted.
- if we also have mu(mu) - the state importance to decide what to keep
- do we prefer complete recent traces or many partial traces.
- Can we use options/skills to orgenize the buffer more effectively ?
we should aim to keep full options traces in the buffer
keep traces in & out or options.
before and after the options.
Think of the four room environment - there are different options to get from one room to another. they are composable. Once we have good coverage entry into the op
Ergodicity
- in an environment is a maze and I have a one way door dividing the left side from the right parts of the maze. is this environment ergodic ?
- If not how come we can still learn the optimal policy ?
interchip dotan castro - sim to real
Replay buffers -
- storing sequence of states
- State action state
PMDPs
Citation
BibTeX citation:
@online{bochman2024,
author = {Bochman, Oren},
title = {Replay Buffer Questions},
date = {2024-12-20},
url = {https://orenbochman.github.io/posts/2024/2024-07-01-generalization-in-ML/2024-07-01-replay-buffer-questions.html},
langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Replay Buffer Questions.” December
20, 2024. https://orenbochman.github.io/posts/2024/2024-07-01-generalization-in-ML/2024-07-01-replay-buffer-questions.html.