Author

Oren Bochman

Published

Tuesday, September 17, 2024

Replay Buffer

  1. for continuous environment we should think about coverage.
  • given a paramertrization of the value function, for a level of generalization/discrimination we get an induced set of features. Is some set of experiences sufficent to do prediction or control.
  • if we have an estimate of the coverage can we use it to place a bound on the error of the value function.
  • can we do better if we also have an estimate \mu(s) of the importance/long term probability of the states ?
  1. Traces present a highly correlated view of the state space.
  • How much do we need to wory about this.
  1. does replay buffer violate markov state.?
  1. can reduce correlation between samples ?
  2. can we be more stategic about what we keep in the RB
  • say we have a key using a hash[\delta(state), action] neighbourhood
    • we can use the key to decide if to insert/replace the current buffer
    • we can use it to decide what to discard
  • we can use the buffer to estimate mu(s)
    • might also have more info like states we did not insert or deleted.
    • if we also have mu(mu) - the state importance to decide what to keep
  • do we prefer complete recent traces or many partial traces.
  1. Can we use options/skills to orgenize the buffer more effectively ?
  • we should aim to keep full options traces in the buffer

  • keep traces in & out or options.

  • before and after the options.

Think of the four room environment - there are different options to get from one room to another. they are composable. Once we have good coverage entry into the op

Ergodicity

  1. in an environment is a maze and I have a one way door dividing the left side from the right parts of the maze. is this environment ergodic ?
  2. If not how come we can still learn the optimal policy ?

interchip dotan castro - sim to real

Replay buffers -

  • storing sequence of states
  • State action state

PMDPs

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Replay Buffer Questions},
  date = {2024-09-17},
  url = {https://orenbochman.github.io/posts/2024/2024-07-01-generalization-in-ML/2024-07-01-replay-buffer-questions.html},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Replay Buffer Questions.” September 17, 2024. https://orenbochman.github.io/posts/2024/2024-07-01-generalization-in-ML/2024-07-01-replay-buffer-questions.html.