I have been thinking about Lewis Signaling games recently, and I had come up with a couple of questions that I wanted to answer.
Better Initialization
First has to do with initilizing the algorithm in some optimal way. Like the battle of the sexes there is no easy way to initialize the algorithm unless the agents can coordinate on a single equilibrium. If the state are unevenly distributed, or if they can listen to some prior signal, then they can coordinate on a permutation ordered by frequency for the signals and its inverse for the actions. Otherwise the agents will have to learn the equilibrium through trial and error which is the essence of the game.
However the idea of a prior remained and the complexity of specifing it kept bugging me since I had failed to find a way to do it.
Accelarating Learning using Multiple Agents
A second question that I had was not covered in the literature. I wanted to know if the multiple agents were signaling to each other, in a visible way, would the agents be able to coordinate on a single equilibrium significantly faster just a pair of agents.
One obvious point is that move by nature would slow down the process is agents are unlucky. For optimal signaling the same state would be remain until agents could coordinate and would not reoccur until the agents had coordinate on all the other states. So for multiple agents some agents would be closer to this optimum and may learn faster then the others. Secondly since matching siganl action pairs are rare, (1/k^2) for a k state game, having between k to k^2 should significantly increase Expectation of a matching signal-action pair. So this could speed things up. But this also raises the issue of differential signaling systems arising if by chance some two or more pairs learned different signal/action pairs. The learning process would need to break such ties (Skryms might call it spontaneous symmetry breaking) But it could slow down the learning process.
Actually such a state of affairs could lead to a partial pooling equilibrium, where all the agents had learned a synonym. This would be a suboptimal equilibrium, but it will provide a maximal payoff for all the agents if there are no homonyms.
Some ideas on how to break the symmetry would be: 1. the group might defer to seniorirty i.e. the sender with the lowest id. - (takes no extra time). 1. agents could vote at random for a signal. (would take just one more step if we ignore one draw if the votes are tied) 2. ask the other agents to vote who likes signal a and who likes signal b. if the sender or reciever match the sender/reciever they like it so there would be 0 1 or 2 votes for each signal. the might be draws too and each agent would need to pick a new permutation and vote again. - (would take a few more steps) 3. the senders might pick a pair of at random untill they both pick the same one. - (would take a few more steps)
Any way you look at it there are many advantages to consider learning by multiple senders. They seem necessary for complex signaling as well. However I was pretty certain that the analysis would keep getting more complex as we considered more options like learning grammar, contexts or a noisy environment….
Bayesian Perspective
I had already implemented learning using different algorithms and to explote the Gittin’s index from (Sutton and Barto 2018) I had already implemented a Beta-Bernulli contextual bandit with Gittin’s index and with Thompson sampling.
I was already thinking how to improve it but I did not have a very good idea regarding the prior. I had a fairly efficient algorithm for the learning but I wanted a better way to model the updating and the right prior. My idea of using a Multinomial-Dirichlet conjugate pair had not worked and would probably take a while to trouble shoot and fix, and it was not really the full solution I was looking for.
More so I was coming to terms that I could likely come up with Bayesian updating schemes that were novel and I would quickly find myself deep in uncharted territory. This had some attraction - it was not the first time I came a cross a problem that did not seem to have a conjugate prior pair to fit with prior knowledge I wanted to bring to bear in the model, but Baysian updating is just one aspect of Bayesian methodology and I was worried of getting to a dead end because of working with a new type of distributions.
The Model
At a fundamental level the Lewis signaling game of coorrdination. Sender and reciever are trying to learn a mapping between states and signals. The mappings need to be inverse of one another and to have a maximal reward the mappings need to preserve the messages - synonyms are ok by homonyms are not. And if thes number of states and signals and actions are the same then the mappings need to be one to one and onto.
So in such a case synonyms are not allowed and the mappings need to be not just permutation but rather cycles of length k. This is something I had understood intuitively but I had ot been very clear about.
I was now thinking about distribution over groups - somthing I had not considered before. However it dawned on me that the two other aspects of the complex signaling game being grammar and context might be modeled additional group structures. And if we could learn cycles efficiently then we might generalize to more complex signaling systems in a reductionist way intimated in chapter 12 of (skyrms2010signals?).
The point is that cycles are not the simplest structure in this problem either. What we are looking at each state of Nature is a pair of transpositions that cancel each other out. A transposition is a very simple structure but it is also a base element of a permutation. The Cayley theorem tells us that any group is isomorphic to a group of permutations. If we can define our prior using transpositions then we can define a prior over permutations or general on any group.
Another point in favor of transpositions is that they have one operation, their composition just a product and since probabilities are multiplicative too the two seem to be a good fit.
So I had three point to consider.
- constructing the prior for cycles based on transpositions.
- updating the prior using based on moves in the Lewis signaling game.
- implement it as an rl/Bayesian model say using Thompson sampling.
Citation
@online{bochman2024,
author = {Bochman, Oren},
title = {Lewis {Game} from a {Bayesian} {Perspective}},
date = {2024-02-12},
url = {https://orenbochman.github.io/posts/2024/2024-05-01-signals/baysian-game-2.html},
langid = {en}
}