compositionality naive compositionality language emergence deep learning neural networks signaling systems emergent languages topographic similarity positional disentanglement bag-of-symbols disentanglement information gap disentanglement
Starting episode 1, New State: 2
Starting episode 2, New State: 0
Starting episode 3, New State: 0
Starting episode 4, New State: 0
Starting episode 5, New State: 2
Starting episode 6, New State: 2
Starting episode 7, New State: 1
Starting episode 8, New State: 1
Starting episode 9, New State: 0
Starting episode 10, New State: 2
Mean rewards over 10 episodes:
Sender: 0.2
Receiver: 0.2
The above is a basic version of the Lewis Signaling Game implemented in PettingZoo. The game consists of a sender and one or more receivers.
What would be nice is to:
have agents that learn via various algorithms
hammerstein
Roth–Erev reinforcement (Has a Goldilocks property) similar a softmax policy with a linear preference.
\begin{align}
h'(a) & \leftarrow \alpha h(a) + \mathbb{1}_{a\ taken} r \\
\pi(a) & \leftarrow \frac{e^{h(a)/\tau}}{\sum_{a'} e^{h(a')/\tau}}
\end{align}
note: I re-interpreted A the update attraction A as the preference h, and \psi the forgetting/recency parameter as \alpha a learning rate as they are used as what goes into a Softmax which is parameterized by a preference in policy gradient methods.
Bush–Mosteller Reinforcement similar to policy gradient with linear reward function :
\pi'(a) \leftarrow \pi(a) + \alpha[\mathbb{1}_{a\ taken} R - \pi(a)]
Bochman fastest coordination
Bochman belief based coordination
Bochman adaptive huffman coding coordination
Bochman adaptive arithmetic coding coordination
Tabular Monte Carlo RL
Policy Gradient or Gradient Bandit
expected return metrics for the signaling system
entropy metrics for the signaling system
topographic similarity metrics for the signaling system
positional disentanglement metrics for the signaling system
bag-of-symbols disentanglement metrics for the signaling system
learning rate per cycle
learning rather per state space size
state space generators + distribution for states.
simple -
structured - group action for feature morphology
structured and simple (generate atomic states, then combinations)
trees - atoms and trees of atoms based on a one rule grammar.
problem space - states and actions from an MDP.
have multiple recievers that share information to speed up learning
support for injecting errors in communication
support for injecting risks into communication
suport for different signal aggregation functions.
bad of symbols
sequence of symbols
symbol parse trees ??
DAGs ????
custom - user defined
Citation
BibTeX citation:
@online{bochman2025,
author = {Bochman, Oren},
title = {Lewis {Signaling} {Game} for {PettingZoo}},
date = {2025-01-01},
url = {https://orenbochman.github.io/posts/2024/2024-10-10-marco-baoni-composionality/lewis-signaling-game-petting-zoo.html},
langid = {en}
}