There are many ways for signaling systems to emerge. Gold’s theorem even suggests that if one makes a few assumptions there are scenarios that emergent languages become a garden with infinitely bifurcating paths each leading to a different language, and that the sender and receiver may never be sure they are on the same wavelength. While I consider these scenarios unrealistic, I do believe that the garden of forking paths is more than a metaphor for complex signaling systems.
Brian Skyrms - “Some Dynamics of Signaling Games” at he start of this Skyrms suggest that there are many games and mechanism in which signaling systems arise.
The origin stories
Like so much of the work on signaling systems, most of the work and interpretation originated with Brian Skyrms. The more I made progress on this working paper the more I discovered that I was increasingly going over material published or review by Skyrms in (Skyrms 2010) which I had read a number of times.
The real headache was that I was more interested in documenting my ideas, but I had to give credit to the many other researchers who had worked on signaling systems before me. In reality though most of the papers dive much deeper into the mathematics of signaling systems then I have. That is at least an advantage of finding the origin stories of the different dynamics.
Motivations
Initially I just wanted to document ideas that came up while replicating some of the classic papers.
However, as I got deeper into complex signaling, I realized that assumption and intuition for the classic Lewis game often seem to reverse for the complex game. Thus ideas that I had considered trivial dynamics turned into stepping stones across the moat that separates simple and complex.
Also as I try to developed increasingly efficient RL algorithms for learning signaling systems I realized that I was not the only one working on this problem. It was a again a good idea to track the different ideas that seem to work and to check if others had already discovered them.
TL;DR
Emergent Languages in
One fascinating aspect the Lewis signaling game (Lewis 1969) is that although there are many theoretical equilibria initially the agents will inevitably fail to coordinate and they can only reach the optimal signaling systems after some iterations of the game in which they either evolve or use reinforcement learning to coordinate a themselves to a common signaling strategy. In the prisoners dilemma agents can learn to cooperate if the game is iterated. In the Lewis signaling game agents can learn to coordinate on a signaling system if the game is iterated.
Generally to find a good signaling system requires some kind of algorithm and at least between N and N^2 steps as well some number of iterations. I don’t recall seeing a discussion of the the minimum or the expected number of iterations required to reach a signaling system under different algorithms. In other words most researchers have considered the complexity of coordination in signaling systems. This is actually a fairly simple problem to solve in the most common settings.
Lewis, David Kellogg. 1969. Convention: A Philosophical Study. Cambridge, MA, USA: Wiley-Blackwell.
Another two point primarily addressed by the evolutionary game theory community who view evolution in terms of replicator dynamics is that of stability of equilibria and the notions of evolutionary stable strategies.
The first has to do with convergence of learning to an optimal signaling system.
The second has to do with the ability of an equilibrium to resist invasion by a mutant strategy.
Enumerating the different type of signaling systems and the other types of equilibria.
A related issues is that of enumerating different types of equilibria in larger games. For basic Lewis Signaling games this is not very difficult as there are N! signaling systems in games with N signals and N states,
For a complex signaling system with N states and M signals we can enumerate the signals as the first N base M numbers. Once again we deal with N! permutations. However the sender may chose any set of base N base M numbers. This creates an potentially unbounded number of signaling systems. This is perhaps a reflection of Wilhelm von Humboldt characterization of “infinite use of finite means” meaning that a language as systems in which a finite number of symbols can be combined in an unbounded number of ways, c.f. (Humboldt, Losonsky, and Heath 1999)
Humboldt, W. von, M. Losonsky, and P. Heath. 1999. Humboldt: ’On Language’: On the Diversity of Human Language Construction and Its Influence on the Mental Development of the Human Species. Cambridge Texts in the History of Philosophy. Cambridge University Press.
This perhaps makes the complex signaling game special as a game theoretic problem. At least in the sense of what we consider bounded rationality. It is not at all clear what solution concept could be used to create an optimal signaling system, in this case it should require deep insights into group theory, topology, information theory, category theory, probability theory. Also though I consider the problem of equilibria in terms of an enumeration of states via numeric signals, it is does not at all follow that this is the best way to consider the problem. If we use a an alphabet of M phonemes we may for instance run into phonotactic constraints that are not at all present in the numeric representation. Thus another source of complexity may arise in terms of the actual realization of the signaling system. This is perhaps why this needs to be a working paper – in which new ideas can be added as they come to me.
For complex signaling systems we need to consider
Are the infinite number of signaling systems equivalent up to an isomorphism? I believe that the answer to this is yes by the following rationale. Any signaling system can be viewed as a permutation of signals to states. And according to the Cayley’s theorem any groups can be represented as a permutation. Hence we can view any complex signaling system as some group! And groups are equivalent if they are related by a group homomorphism. However there are still a couple of conundrums to consider. If the prelinguistic objects we call the states have a group structure and this is preserved this seems like a signaling systems that is a faithful representation. But it is also possible that there is a mismatch - that some of the structure is lost or that some additional structure in the language is added that is not in the original pre linguistic objects. I think that some of these might be viewed as happy accident while others may be failures in therms of signaling systems.
Can the Lewis signaling game together with the pre-linguistic object imbue the language with semantics ?
At what point can we view signaling systems universal in terms of the Sapir-Whorf hypothesis. I.e. when does the semantics of the signaling system becomes capable of representing the semantics of any natural language?
IT is entirely possible to
Another point of interest to me is to consider the emergence of grammar and of a morphology. In (Nowak and Krakauer 1999) The authors give a result for the emergence of grammar in a signaling system. This is that there are many more
Nowak, Martin A, and David C Krakauer. 1999. “The Evolution of Language.”Proceedings of the National Academy of Sciences 96 (14): 8028–33.
I think it worth while to list them in this space — particularly as I believe that signaling systems are a key for transfer learning in reinforcement learning which together with learning to represent complex states may be the key to AGI.
Introduction
Listing number of different scenarios on how signaling systems can arise in the Lewis signaling games.
I will start with a story
Next add some details like some variants and look some basic analysis.
Finally I’ll try to place it into the context of MARL. Note that we will be dealing with partially observed multi agent RL. But each scenario can have a different setting.
lewis signaling game
In The book signals (Skyrms 2010) the author, Skyrms, discusses how Lewis challenged the skepticism of his advisor Quine regarding the meaning and convention may arise via an arbitrary mechanism like symmetry breaking.
When I considered solving some additional issues surrounding the fundamentals of signaling systems I realized that I had a few different scenarios in mind and that writing them down with some semblance of formalism might be helpful. It turns out that indeed this turns out to be a stepping stone towards developing an optimal algorithms for learning signaling system in different RL settings.
Let’s face it under different settings the task of acquiring a signaling system can be easier or harder. In (Skyrms 2010) the author points out that at symmetry breaking all the different signaling systems that could be learned are equivalent. However if there is an asymmetry in the form of a non-uniform distribution of states or different signaling risks then we we might prefer some signaling systems over others and there might even be a unique optimal signaling system. Furthermore like in reality one would expect that with time distributions of states might change and the optimal signaling system might change as well.
This is the dynamics of the game that is most relevant to evolutionary game theory. Under certain conditions results for replicator dynamics may be replicated and therefore equivalent to those for Reinforcement Learning. c.f (Skyrms2010signals?) chapter on learning.
I think that since some of the smartest people to work on signaling games have been in the field of evolutionary game theory, it’s worth considering the replicator dynamics of signaling games if nothing else then for the rigor of the analysis and methodology. A second aspect is that numerous examples of signaling games are found in nature where replicator dynamics are often the most appropriate model.
Perhaps even more importantly, language do undergo changes over time. It is quite interesting to consider the dynamics of such changes and if there are traces that may allow us to infer the earlier forms of a language from its many descendants.
Elsewhere I have enumerated a Desiderata for signaling systems and I am fascinated if these may emerge by simply applying respective list of selection pressures a population of agents in the Lewis signaling game. This idea may be also considered in the context of RL but it seems more natural to initially consider it in the context of evolutionary game theory.1
1 Many key results in RL are based on the steady state distribution of the Markov chain for the MDP’s dynamics. If the game evolves over time then we may not be able to use these results. As such researchers in RL are reluctant to consider MDPs whose dynamics change over time.
Huttegger, Simon, Brian Skyrms, Pierre Tarres, and Elliott Wagner. 2014. “Some Dynamics of Signaling Games.”Proceedings of the National Academy of Sciences 111 (supplement_3): 10873–80.
Describes the replicator dynamics as the fundamental model of evolutionary game theory.
The replicator dynamics is the fundamental dynamical model of evolutionary game theory
Presents both one-population and two-population replicator dynamics. > The two most common varieties of replicator dynamics are the one-population and the two-population replicator dynamics.
Notes that the replicator dynamics in signaling games is often not structurally stable, making it important to study the effects of perturbations such as mutation. > Both the two-population and the one-population replicator dynamics are driven by the difference between a strategy’s fitness and the average fitness in its population. This captures the mean field effects of natural selection, but it disregards other factors such as mutation or drift. In many games these factors will only play a minor role compared with selection. However, as we shall see, the evolutionary dynamics of signaling games often crucially depends on these other factors. The reason is that the replicator dynamics of signaling games is generally not structurally stable (10). This means that small changes in the underlying dynamics can lead to qualitative changes in the solution trajectories.
Discusses the selection mutation dynamics as a plausible perturbation to the replicator dynamics. > One plausible deterministic perturbation that has been studied is the selection mutation dynamics (11). We shall consider this dynamics in the context of two population models.
Story: Name
Dynamics
Pooling
Partial Pooling
Separating Population
Stable
Stable
Dynamically unstable
Structurally Stable
Stable
Dynamically unstable
Structurally Stable
Unstable
The evolution of signaling systems
In this section I want to address some of the questions that drive my research on signaling systems.
When do we expect signaling systems to evolve?
When agents fitness is increasingly predicated on coordination or communication they will get a benefit for evolving signaling systems. I.e. a evolutionary pressure to communicate will lead to the evolution of signaling systems.
What are the main desiderata for signaling systems?
Here are some of the main desiderata for signaling systems:
Efficiency - the signaling system should be as short as possible.
Salience - the signaling system should be most salient for the distribution of states.
Cost - the signaling system should be as cheap as possible to learn and use.
Robustness - the signaling system should be robust to noise and deception.
Adaptability - the signaling system should be able to adapt to changes in the distribution of states.
Compositionality - the signaling system should be able to be combined with other RL activities to form
more complex signaling system.
more complex policies.
This is most clearly illustrated in:
The predation scenario where
Agent’s short term survival is predicated on their ability to respond to signals indicating the presence of predators by take the appropriate precautions. Of course signals need a source.
Agents can send a signals for the state they perceive or to stay mute.
Agents can repeat signals they receive or stay mute.
As predation increases, selection pressure may induce signaling systems to evolve.
The Dowery/Courtship scenario where:
The game can be cooperative or competitive.
In the competitive case only the fittest agents get a mate.
In the cooperative case all agents get to mate but some will mate more often, or with more desirable mates.
Agent must collect resources (e.g. a bill of goods for a dowery) before they can reproducing from a changing landscape.
Only the top n dowries will generate an offspring. (bills of goods slowly perish but the size and diversity of is important).
Alternatively only the agent that is the the best at courtship n times can generate an offspring. (this time there are smaller bills of good that quickly perish)
Resources are plentiful but evanescent.
Agent that can signal would be able to collect a dowery faster and increase thier fitness.
As competition increases benefits signaling systems should evolve.
This is interesting as the exploration/exploitation dilemma caps the rate at which agents can reproduce. Yet signaling will allow agents to over come this cap.
This is also a case where agents may get a benefit from sending false signals if the receiver is a serious contender. So that the receiver will waste time and resources.
The agents must learn to discriminate To handle deception agents may also develop a model of the mind of the sender to predict the likelihood of deception. They may also want to tally if the sender has been deceptive in the past.
Or
The Knights & Knaves scenario where:
Agents need to:
Classify agent by type. (knight or knave, monkey, insane, etc.) to interpret the semantics of their signals.
Assemble the state from messages with different semantics to recover the state of the world.
This scenario does assumes the agents have an underlying motivation to learn to signal.
And now add a selection pressure on the evolution of basic logic and semantics.
Agents that communicate can spend less time exploring and more time exploiting. . In this case the agents will evolve a signaling system that is most salient for the distribution of states. This is the most likely scenario for the evolution of signaling systems. The reason why agents might want to learn a signaling system is to maximize their fitness
What are the main parameters that affect the learning of signaling systems?
state distribution (these are the states of the world and signaling is used to share these states with others to maximize fitness - the expected progeny)
saliency distribution (weights for states ranking thier risk)
voracity of senders.
cost of signaling (risk of predation).
What are the different settings for learning signaling systems?
Some other questions within these contexts might be:
What are the number of signaling systems for a given number of states and actions?
What are the number of pooling equilibria for a given number of states and actions?
Let’s break these down by the degeneracy of the pooling equilibrium. This might suggest the minimal number of signals needed in an experiment to learn the signaling system. It might also suggest the thresholds of success for optimal signaling systems in different settings.
Can we estimate the regret for different RL algorithms ?
What is the expected signaling success for each of the above?
What is the expected and the mean number of steps to acquire a signaling system for a given number of states and actions under different settings?
How does having more senders or receivers affect the above?
What is the complexity of n-agents to come up with a common signaling system?
under full communication
under partial communication
How does locality affect the time to a universal signaling systems?
if there is full observability
if communications are one to one
if communication are different neighborhood, Von Neuman, Moore, hexagonal, other lattices, chains, rings, random graphs. (need to use optimal dynamics)
Another question that like a lemma on time needed for an agent to become experienced enough to setup an optimal signaling system?
Given distribution S of states with k states and some the rarest state s' having probability p(s') = \alpha what is the expected number of observations needed for agents to approximate the distribution of states to within some credible interval \epsilon<\alpha?
Note while there is no lower bound on alpha the upper bound is \alpha = 1/k for a uniform distribution of states. I think this is the Bayesian version of an empirical distribution. This would be a waiting time for becoming experienced.
After this waiting time a steady state distribution should be known to all agents.
Under partial observability the agents need to cooperate to learn the signaling system in a distributed manner. If the agents are on a grid or on a graph what are the bounds on coordination time for learning the signaling system - using a gossip protocol - i.e. each agent can only communicate with its neighbors - using a broadcast protocol - i.e. each agent can communicate with all other agents - using a random walk protocol - i.e. each agent can communicate with a random agent - using a central coordinator - i.e. each agent can communicate with a central coordinator - using an ambassador - i.e. each agent can communicate with an ambassador who can communicate with many other agents per Ramzey’s theory
While reviewing a paper of this subject I had realized that there are a number of hypothetical scenarios for signaling systems to arise.
In RL we have different setting for learning optimal strategies. Some of theres different scenarios can be framed in this form.
I wanted to list them here so I can reference them later
But thinking as I list these I notice that some provide an easy solutions to problems that others don’t.
One point of interest. If the agents are concerned with picking the right action for each state, they should collapse any states which share the same optimal action into a single signal. This will reduce the number of signals they must be learned and reduce the overall message length and cost of signaling. So in reality we should not be overly concerned with the number of actions exceeding the number of states.
When there are not enough signals agent need to learn to aggregate signals.
add
learning by evolution:
replicator dynamics with
agents have random signaling systems assigned and the systems with most payoffs is selected through population dynamics.
children learn thier parent matrix via sampling.
one parent (perfect and imperfect transmission)
two parents
pidgins via shared dictionaries
creoles shared grammars and dictionaries
adding some mutation - adding mutations to the childrerns signaling system.
based on paper by (Nowak and Krakauer)
learning via reinforcement learning
spontaneous symmetry breaking scenarios vs planning
If there are N signals, states and actions is there an advantage to planning a signaling system vs letting it evolve in terms of the number of steps needed to learn the signaling system?
random signaling means that each step is an independent trial.
Sender can send N signals and
Receiver can guess N Actions
So there are N^2 combinations per turn.
So there are Only the ones with A=T get a reward so there are N good combinations. So there is a N/N^2 = 1/N chance of getting a reward. So we can expect that the number of steps needed to learn to signal the state T is N.
planning means that the sender picks one signal and sticks to it. In this case Receiver gets to systematically eliminate an action every time.
sender has 1 signal and
receiver can guess N at first and N-1 at second and N-k-1 at kth turn.
So there are n+1/2
actions giving 1*N combinations and only ones with A=T get the payoff. So there is a 1/N chance of getting a reward. So we can expect that the number of steps needed to learn to signal the state T is N.
Thus planning is faster than random signaling.
random signaling means that there are (2n/n*n)^n = 2
is agent use positive reinforcement only then
are there conditions where the signaler/receiver gets to determines the signaling system?
if Sender sends random signals from L-{coordinated} R must guess the state From L-{coordinated}.
if S wants to switch X and Y ? and does so R get 0 . If R is epsilon greedy he will find the new semantics.
A meta protocol would require a code switching signal be “Swap X Y”
Source coding scenario errors in encoding & decoding - based on paper by (Nowak and Krakauer)
errors in the transmission channel based on paper by (Nowak and Krakauer)
risks - there are signals with monotonically increasing risk.
payoffs for signals are symmetric
cost associated with the risky signals are borne by the sender
if receivers can respond correctly after getting a partial message they get a bonus.
we can also consider sharing cost and rewards symmetrically. — creating a complex system with compositionality using self play
costly signaling
what else is this called
Story: Name
Dynamics
Pooling
Partial Pooling
Separating Population
Stable
Stable
Dynamically unstable
Structurally Stable
Stable
Dynamically unstable
Structurally Stable
Unstable
Replicator dynamics
what else is this called
Story: Name
Dynamics
Pooling
Partial Pooling
Separating Population
Stable
Stable
Dynamically unstable
Structurally Stable
Stable
Dynamically unstable
Structurally Stable
Unstable
The Oracle of Saliency
In many cases the arduous task O(n^2) of coordination that makes signaling hard might be avoided. If both agents share some mechanism for enumerating the pre-linguistic objects they encounter, they can systematically enumerate them before the game start. The enumeration is a lexicon and can be used as a signaling system.
You mention that oracles are cryptic - They can now pick use the binary number as canonically they can avoid some or all of the cost of coordination.
Story: The Oracle of Saliency
The sender in an ex-ante step consults an oracle, perhaps the I Ching asking what to do for each prelinguistic objects. Each consultation returns a cryptic message not unlike a cryptographic hash. The sender then sorts the objects in increasing order of the cryptic messages, assigning each subsequent binary number to each.
The receiver also uses the same process with his copy of the oracle.
Since both the sender and receiver now have access to the same lexicon, they can use it as a signaling system.
Even is the oracle does not provide unique enumerations it will reduce the task from O(N^2) to O(N). as once the oracle is consulted the sender and receiver can proceed by trial and error to further coordinate on the ambiguous signals. e.g. by assigning a sub-index to each prelinguistic object that share a signal. This will now look more like a prefix coding scheme.
In the lewis game errors cost nothing, but in RL we often encode a sense of urgency by adding a penalty to time wasting moves. If there a penalty e.g. -1 for wasting time on miscoodination, we might call the game the Lewis game with urgency. Agents that get to play an infinite nuymber of time would see an infinite rewards past at most O(n^2) penalties and want to play. In this case though even if consulting the oracle has a cost C, so long a the expected cost of trial and error is greater than k i.e. if there are k^2 uncoordinated states agents would have incentive to consult the oracle.
In the previous points we ignored the reality in which states are not uniformly distributed and that agents may not be able to pay the oracle upfront. If the agents know the most likely states they this knowledge to setup a self-financing scheme to raise funds for consultations of the oracle as well as reduce thier cost of coordination. However it’s worth pointing out that our next scenario considers how knowing the distribution of states is just another type of oracle.
Two cases come to mind.
They have booth been observing the state space long enough to infer the distribution of states to a high degree of confidence.
They can listen to a third party who knows the distribution and learn to signal from them.
They can access a state classifier and send it random noise thus deriving an empirical distribution of states in the classifier (not nature) and use it to learn the signaling system.
coordinate via the state distributioncoordinate by imitationcoordinate via a classifier
Once a distribution of states in known it can be used to create huffman codes using 0 and 1. These signals are then ranked.
There is a distribution of the states of the word known to all players.
In the easiest case each state has a different probability of occurring. -It is easiest because all players can infer a canonical signal system from such a distribution of states.
They order states and corresponding actions in decreasing expected value. The canonical system is the one mapping between the states and the actions.
Thus the salience distribution breaks the symmetry of all viable signaling systems and leaves just one option.2
In each subsequently harder case there are two or more states with equal probability of occurring. These probabilistic symmetry of these states cannot be broken as before and require the use of coordination. The coordinators can break the symmetry by trial and error when that state arises. Once all the symmetries have been coordinated the players can infer the rest via the canonical signal system from the distribution of states.
In the worst case all states have equal probability of occurring. This is the hardest case because after each state signal pair the problem is still maximally symmetric. The players need to solve this by using trial and error.
2 This is notion of a most salient mapping acts as an optimal policy for agents who need to quickly avoid the long run costs of a non salient signaling system
MARL Formulation
In terms of the MARL formulation:
A PMDP has states S and actions A.
States are observed by agents of type S whose actions are signals
Actions are performed by agents of type R.
Rewards are assigned symmetrically to both All senders and receivers when the receiver action matches the sender observed state.
States can be uniformly distributed or be drawn from a distribution.
We like to call such a distribution the saliency distribution after Schelling notion of a focal point AKA (Schelling point) in his book The Strategy of Conflict. In a lewis signaling game there are n! signaling systems if there are n states, signals and actions. If the states are uniformly distributed then all signaling systems are equivalent. But if the states probabilities are monotonicaly distributed then there is a unique optimal signaling system which is precisely the Schelling point.
Since saliency
Caution
It is also worth noting that many algorithms for MARL use shared parameters, the same critic and so on. And if the agents can access this system the oracle of saliency provides a shortcut to a cannonical emergent language for all agents as well as a general purpose coordination mechanism they might use to coordinate on other tasks. Thus such oracles should be treated with care if we also wish to study the the emergence of a universal language
2. Learning the Saliency distribution.
In this case agents are in an MDP or a PMDP. They are observing the states of prelinguistic objects and we need to assume that this distribution is the same for all agents. I.e. they are learning the distribution of states by sampling. Should they engage in developing a signaling system or wait until they have learned the distribution of states? What if signaling cost is fixed like above in the Lewis game with urgency?
Story: Creation of the Oracle of Bayes
In another tribe of agents are too busy observing and recording the states of the world to coordinate on a signaling system. However, it is inevitable that sooner or later as they compare notes they will notice that they have recorded the full empirical distribution of states and that all thier records are in agreement up to an acceptable margin of error.
The agents can actually use this distribution as a bayesian oracle.
This time each prelinguistic object is assigned a probability of occurrence. The agents order the states by decreasing probability.
Again if states share probabilities - they will have to be assigned a word sense index to distinguish them using trial and error.
However doing nothing might give these agents a lot of time on thier hands and they might also notice how their Empirical distribution is evolving over time with a slowly increasing number of states (the most frequent ones) getting the same share of the probability mass….
This suggests to the bayesian minded agents that they should estimate the bayesian credible interval for the signals and use the implied signaling systems to communicate about the states that are common knowledge.
Whenever a state’s probability emerges into ‘significance’ it should be recorded into the lexicon. If the term has entered into the lexicon by chance it can be dropped if it is no longer within the bayesian credible interval.
The main reason I like thinking about thus story is has to do with it relation with corpus linguistics and ir relation to language modeling. We know that language modeling is at the heart of Large language models and this may be a kind of thought experiment about how long in terms of a clock that ticks time in samples collected a RL language modeler would be able to make good inference about increasingly rare states. Look at a sufficiently long n-gram and almost all are sparse. And for a fixed n with even a uniform distribution the probability of most n-grams in an empirical distribution will be very low, unless the corpus is allowed to grow combinatorially.
We can also use this to think beyond the lexicon. One cause of hallucinations in LLM is called out of distribution queries. This is when there isn’t data corresponding to the query and the model tries to construct an response based on an mostly random approximation. We often get hallucination also for queries the LLM has been trained on and even when it can give a good answer to a better prompt. I like to think of these as signals that do not have separating equilibria.
Complex signaling systems are built on top of an alphabet of primitive signals. There like our alphabet might be without meaning or they may be used in a huffman code and assigned to the most frequent states.
Another point is to consider that if agents just observe states long enough they should eventually learn to approximate the state distribution. How long would this take ?
Here is a back of the napkin calculation.
If there least common state has probability \alpha and the agents want to know the distribution with confidence \epsilon they would need, according to Hoeffding’s Inequality
K\ge\frac{log(2/\epsilon)}{2\alpha^2} \qquad \text{(samples to learn S)}
also recall that although there is no lower bound on \alpha when S\sim Uniform[N] the upper bound is 1/N
K\ge\frac{N^2log(2/\epsilon)}{2} \qquad \text{(samples to learn uniform S)}
Code
import math# Given valuesK =8# statesepsilon =0.34# confidence# Calculate time to learn the saliency distribution # N using the formula N >= (K^2 * log(2 / epsilon)) / 2N = (K**2* math.log(2/ epsilon)) /2print(f'Expected time {int(N)} to learn a {K} state distribution with confidence {epsilon}') # Expected time to learn a signaling system with N statesT = K * math.log(K)print(f'Expected time {int(T)} to learn a {K} signaling system ')
Expected time 56 to learn a 8 state distribution with confidence 0.34
Expected time 16 to learn a 8 signaling system
So learning a signaling systems is easier then learning the distribution of states. Once they they know how to signal states it is easy to use this system to communicate the distribution to all the receivers.
We have not put a cost on learning the signaling system. But if there was a cost associated with learning we could use it to model when agents would prefer to learn the signaling system or just wait until they can infer the distribution of states and infer they systems from that.
A third point is that if they are bayesian they could start to infer the signaling system after viewing a few stats and update thier system as they update their beliefs regarding the distribution of states.
Ship of Fools
Story: Ship of Fools
Senders and Receivers lack all prior knowledge. They follow an optimal strategy for a related game the battle of the sexes. Is a state is uncoordinated senders will explore randomly pick a signal and receivers will randomly pick an action until they get a reward and exclude the signal action pair from exploration.
This strategy is not the best one for senders, but it is easier to analyze.
If the state is T and there are N states, signals and actions then are N\times N choices for sender and recievers of which the ones with action A=T get a reward. So the expected reward is 1/N chance of getting a reward.
The expected rewards are 1/N but since the sender is randomizing each turn is independent. Can they do better?
The steady navigator
Indeed they can do better. If the sender picks a signal and sticks with it the receiver can eliminate an action each turn. This is the optimal strategy for this, the most common setting of the Lewis signaling game.
Story: The Steady navigator
Senders and Receivers lack all prior knowledge. For each new state, the sender picks a signal at random but if the state is the same as the last state the sender sticks to the same signal. The receiver must explore an action at random but if the signal is the same as the a previous seen signal the receiver will explore an an untested action for the signal until they get a reward.
Lets estimate the expected rewards under this strategy for a state T and N states, signals and actions.
Sender has 1 signal and
Since the sender sticks with the same signal the receiver can eliminate an action each turn.
Receiver has N choices initially with 1 correct choice so we has a expected chance of 1/N of getting a reward.
Next he can eliminate his first choice and has N-1 choices with 1 correct choice so we has a expected chance of 1/(N-1) of getting a reward.
And after k tries he has N-k+1 choices with 1 correct choice so we has a expected chance of 1/(N-k+1) of getting a reward.
In the worst case he will have to try all N actions but
The Expected number of steps
\begin{aligned}
\mathbb{E}[steps] &= \sum_{k=1}^{N} \frac{1}{P_{\text{success k}}} \times P_\text{failure up to k} \newline
&= \sum_{k=1}^{N} \frac{1}{{N-(k-1)}} \underbrace{\times \prod_{i=1}^{k-1} \frac{N-i}{N-i+1}}_{\text{telescopic product}} \newline
&= \sum_{k=1}^{N} \frac{1}{\cancel{{N-(k-1)}}} \times \frac{\cancel{{N-(k-1)}}}{N} \newline
\end{aligned}
MARL Formulation
This is basicaly an optimistic initialization strategy. The sender does not explore. The reciever intilizes all signal action pairs optimisticaly with value of 0.5. This way he will keep exploring untill he gets a reward of 1.0 At this point exploration ends.
So we can expect that the number of steps needed to learn to signal the state T is N. They should pick a signal for a state and stick with with it.
The Guru’s Prior
The Sender is a privileged elder who knows the distribution of the states, the associated risk and cost of signaling to sender and receiver and figures our the optimal signaling systems. As such he selects a specific signaling system. This means that students need to coordinate to this system.
This means that whenever the state s_i arises we will get signal sig_i=Send(s_i) rather then some random signal. This means that the student for a mistake the receiver can use a negative reinforcement for <sig_i,action_j> is the return is 0. This should allow the receiver to narrow down the actions chosen for the next time we he gets that signal.
This is second hardest learning scenario but also most realistic. We don’t want to have to learn a new language for every person we meet.
What could happen - the distribution of states could evolve over time.
The prophet’s prior
The sender knows the distribution of the states and how it evolves over time. He choses the currently optimal signaling system. The receivers must learn the signaling system but once a change in the state distribution is observed they will switch to the the new optimal signaling system.
Imagine a world with many predators troubling the signaler. To avoid becoming prey agents must send a risky signals to their neighbors. They should use the signaling with the least expected cost. This cost combines the predator risk and its frequency. Signals can be 1 or 0. 1 is risky and 0 is safe. As frequency of the predators change the optimal signaling system will change as well.
The Gurus’ Posterior
Here there are multiple gurus with knowledge of different distribution. Can they coordinate on the most salient signaling system with respect to thier common knowledge ?
This should be the signaling system that is most salient for a mixture distribution with weight w_i for each guru.
Lets perhaps assume that there are a very large N and a cutoff \epsilon probability for which the gurus won’t bother to include rare sates.
In the second setting two or more students must come up with any signaling systems as fast as possible.
Babylon Consensus
Multiple senders and receivers take shelter in common ground and need to arrive at a common signaling system.
They can want to learn the least costly signaling system in terms of learning.
They want to learn the most salient signaling system in terms of the distribution of states.
There is an agent who knows the current distribution of states and the optimal signaling system.
There isn’t such an agent but the senders want to use a
Cost of learning a second dialect
for each agent and for each signal that is different from the target signalaling system add a cost of 1.
C = \sum_{i=1}^{N} \sum_{j=1}^{M} \delta_{ij} \\
\tag{1}
where \delta_{ij} is 1 if the signal j is different from the target signal for state i and 0 otherwise.
POMDP
In this settings one or multiple senders only a partial state.
Again we consider a hypothetical case where the state describe predators and that it can be partitioned into disjoint parts like <type, proximity> or <type, proximity, number> or <type, proximity, number, direction>. This partioning is also at the basis of compositionality in signaling systems.
Skyryms first considers three different settings.
observation one of mutually exclusive partition: the case where each sender views one part of the partitioned state.
observation of all mutually exclusive partition the case where senders see all the parts of the state but don’t have a mechanism in place to coordinate who sends which part of the state.
observations of all mutually exclusive partition with coordination the case where one sender see all the parts of the state but lacks symbols to send the full state and needs to send each part. He must send the parts one at a time resulting in a sequence of signals.
In the first settings the receiver somehow knows that he should first aggregate the signals using a logical and then decode the state.
In the first settings
where the agent again observe the full state but don’t have a a coordination mechanism for picking differnt parts of the message.
They send a partial signal to the receiver who must infer the state and take the appropriate action. The receiver must
aggregate the messages
infer the state
take the appropriate action
note:
In the first case so long as each part of the state is a unique signal the state can be infered by the receiver using conjunction. The second case if more problematic and shows us a new way that some signaling systems can be better then others.
part the agent can’t infer the state better then chance. However reinforcement of random partition the senders can learn to send they both need to learn a decorelated partition for each state the state and send different parts of the state. The issues is if the semantics are composable.
An issue here is that there is no guarantee that the senders will send the same part of the state at each turn. If the aggregation rules is conjunction, i.e. logical and, then the receiver will be able to decode the state so long as he gets all the pieces.
Knight & Knaves – Bayesian Adversarial Signaling
I had this idea a while back regarding how to address two different problems.
How agents in non-cooperative games might evolve signaling systems that are robust to adversarial agents. I.e. could they learn based on the positive domain by inferring that certain action are more cooperative or at least not prone to deception? Could they learn in the Negative domain. I.E. could they learn to infer the opponents type by using bayesian updating for cases where action are predominantly adversarial?
Can agents use contrafactual reasoning to accelerate learning to signaling. This is a system where agents infer private information of another agents (e.g. their type, thier lexicon and or grammar rules) based on observation the other agents behavior, which is more costly to use for deception than say cheap talk. This uses the idea of measuring contrafactual influence.
Deception by Knight & Knaves
There are multiple senders and each state is known to all of the multiple senders
Each sender has a voracity parameter \nu \in [0,1], this is the probability that they send a faithful signal.
For a complex game this can be interpreted as the probability that the sender will make a error when sending a signal
More generally we might have three parameters \nu_s, \mu_r, \nu_c which are respectively the voracity for the sender, the channel and the receiver.
This is an idea is similar to the approach used by (Nowak2006Evolutionary?) on errors in signaling. However in this case we are more interested in the case where there are adversarial agents who are trying to deceive the receivers and the problem of these receivers to learn a signaling system that is robust to adversarial agents
At the extreme the agents have types (like knights and knaves) and the receivers must learn to classify the agents by type and then learn to both signaling and reason.
One idea based on knight and knaves might be to allow reveivers to request request a response to a query i.e. if the state is %X% what is the signal that you would send? the signal Y?
A novel idea here is to learn a Bayesian hierarchical cognitive model that captures
the voracity of the senders V_i \sym Beta(\alpha, \beta)
H_i hypotesis on different states of the world i.e. the voracity of the senders
how each and all statements made by the senders are consistent with each hypothesis H_i
Babbling Bayesian Babies
This is a simple model of babbling in language development.
Babies in the babbling stage of language development are learning to signal. They are sending all possible phonemes and the parents and thier parents either respond or talk to each other. The babies are collecting the feedback and reinforcing both positively and negatively until they only use the phonemes that are in the language of thier parents.
Typical They start with over 300 phonemes and end up with 40-50.
In this scenario the sender operates at random. Both the sender and the receiver must observe the rewards and reinforce state signal action triplets.
Different cognitive dynamics to model via bayesian learning
Just learning the phonemes by passive listening to the parents etc.
Positive v.s. negative examples i.e. are the babies asked to repeat and get rewards or do they just observe states passively.
Active teaching (Parents also present a prelinguistic object for the phoneme)
should boost rate of learning more phonemes particularly
Positive v.s. negative examples
Adding learning a syllable structure to the language then the babies should learn not just the phonemes but the syllables. This adds another layer to the hierarchical model.
Adding picking the syllables from a number of options
Positive v.s. negative examples
Adding phonotactic constraints and that the babies should learn the constraints.
Positive v.s. negative examples
The Bayesian view of the signaling systems
In this section let’s consider a view of the Lewis signaling game in terms of a bayesian game theory. This is a perspective I used in a article on planning in the complex signaling game and helped me think more precisely about the how a signaling system might evolve in a formal setting.
The game has n states and n signals. Internality the a the agents will learn a permutation of the states and signals and its inversion. So there are n! signaling systems that can be learned. In the world of bayesian agents each such permutation characterizes an agent type. THe game starts by nature picking a type for the agent. I say this this is because the agent needs to define a strategy which is a response for each state! Now the same is true for the receiver. The receiver’s strategy is to pick an action for each signal. After that it can use bayesian updating to update it’s belief about the type of the sender. These probabilities can guide it in the process of learning the signaling system. As the pair make progress, the receiver is able to update it’s belief about the sender’s type, discarding options that are inconsistent with the signals it has received! Once it has finds n-1 signals it can be certain about the sender’s type and it will have an expected payoff of 1.
Framing games and evolution of domain specific languages
In this scenario we considers if learning a shared language could be a game changer in some strategic interaction like a social dilemma. We may ask be able to not only interpret different signaling system as embodying different semantics derived from the framing game but also consider if these create linguistic relativism where that agent’s languages shapes thier perception of the framing game by allowing them to develop newer strategies. We can also consider the beginning of ethics in such a system by considering if the introduction of language allows increases or decreases the overall welfare of the agents.
Story: Framing Games
Agents tasked with maximizing a reward signal under conditions of strategic interaction. We call this the framing game and it might be as simple as a 2x2 matrix game like the battle of the sexes, a social language like the iterated prisoner’s dilemma or as complex as a Sugarscape simulation . At some point, perhaps at the start send and received actions are introduced into the agents action space. They can now assume the role of the sender and receiver in a Lewis Signaling game.
If the incentives are right e.g. the framing game is cooperative they could learn to signal to each other. Note though that it is conceivable that agents could learn to signal if they are in a competitive game if they are sufficiently driven to explore the send and receive actions. However the resulting equilibrium might not be a perfectly separating one if the agents are not suitably incentivize to use the language to coordinate.
Furthermore these signals may then be incorporated into the planning and allow the agents to coordinate on the framing game.
One expects that the language that arises under such circumstance would be limited to the domain of the framing game and that its semantics would be inherited from the framing game. However larger framing games with many generation of agents might lead to dynamics that lead to the emergence of a more general language.
This kind of scenario actually contains a rich set of paths to the emergence of many different languages. For agents in the lifelong settings the emergent language might gain additional strata of semantics from multiple domain and then evolve to a more general language.
A questions then arises what simple framing games can lead to agents to develop languages that are imbued with sufficiently rich semantics that the language has the Sapir-Worf property of being able to express semantics from any other language.
Another idea I have been using liberally in my thinking is that of a framing game for the lewis signaling game. This idea comes from the field of Multi agent Reinforcement learning however it should also be valid in terms of Game theory.
Simply put an agent may be tasked some general problem like playing chess or solving a maze. In the past I worked on wizards that configure servers or home networks for telecoms.
I could envision an RL agent learning to do these job by learning from experience. However it seems that if it can play a lewis game and learn a signaling system that is a subset of english that approximate its domain then it can chat with people rather then relay on a some user interface.
For a home networks it might need a smaller subset of english and for a server configuration it might larger one. What seems to be the point is that the agent tasked with some external task might be able to learn a signaling system that has semantics inherited from the task. If such task is a strategic interaction we may view it as a game. And together we can view the framing game and the lewis signaling game as single iterated game in which the agent learns to play a new variant of the framing game in which it has access to a coordination mechanism that is a domain specific language.
I think that we if we naively combine a game like the battle of the sexes a pure coordination problem with the lewis signaling game the agents will learn a language like ‘football’, ‘opera’. And that these can arise within three iterations and allow the agents to then coordinate on the battle of the sexes so as to score the highest payoffs. This could happen both if the agents alternate signaling or one always gets to signal first, and always picks opera.
On the other hand with iterated prisoners dilemma signaling might not make a difference as the language may not be able to change the payoffs sufficiently to make the agents act any differently. In this case it is entirely possible that a signaling system will not arise at all regardless of what the agents say they will act in their own best interest. This leads to a completely pooling equilibrium.
So the question that comes to mind is this - 1. can we setup up the signaling game so that the agents will always learn a signaling system if coordination is a benefit. 2. How can we encode the signaling system so that its prelinguistic object will be the states of the world and the actions of the agents. I.e. we want them to be able to talk about the outcomes of the framing game in the signaling game. 3. We want cost and benefits of signaling to be decoupled from the framing game - i.e. we may deduct the payoff for signaling success once the signaling system is learned. 4. We do want the agents in the framing game to aware of the outcome of the signaling game. 5. Finally we want to identify if there are strategies in which coordination increase of decrease overall welfare.
e.g. in the battle of the sexes we should expect perfect rewards e.g. in a three way traffic junction game we might expect the agents to signal thier intentions to turn left or right and to go straight. This would allow them to avoid accidents. One such mechanism might be a game of paper rock scissors to determine the priority of the agents. e.g. in the Braess’s Paradox establishment a high way though a city we might end up increasing the traffic jams.
Rita, Mathieu, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, and Florian Strub. 2022. “Emergent Communication: Generalization and Overfitting in Lewis Games.”https://arxiv.org/abs/2209.15342.
(FAIR)†, Meta Fundamental AI Research Diplomacy Team, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, et al. 2022. “Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning.”Science 378 (6624): 1067–74.
Barrett, Jeffrey A, and Brian Skyrms. 2017. “Self-Assembling Games.”The British Journal for the Philosophy of Science.
Story: Co-adaption
Family members, best friends and members of closely knit societies tend to develop a language that is unique to them. IT can start with in jokes, invented words and phrases and co-opting the meaning of existing words to mean something else. This is a form of co-adaption where the language and the society co-evolve. If allowed to evolve the language can drift so that a stranger would be at odds to understand what the speakers are saying and this is called semantic drift.
For RL agents it is possible for them to develop a language that is unique to them as suggested above. It is also possible that as conditions change e.g. the framing game is switched from Battle of the sexes to Prisoner’s dilemma the languages will remain a 4 state 4 signal language but the meaning of the signals will drift.
this story is more about something we might want to avoid.
having more agents should reduce co-adaption.
Semantic drift is inherent in the evolution of language. However we may want to allow the language to evolve but for certain aspects to remain fixed. This is one of the desiderata for emergent languages. What we would prefer that grammar and much of to be stable over multiple generation so that great grandfathers can still communicate with thier great grand children. Why is this a problem? In agentic systems we design language emergence to be fast. In most cases every agents need to learn it from scratch, they enjoy the benefits of perfect recall and a noiseless channel. This also means that languages might change very quickly to and that we as researchers will have a tough time understanding the agents over the course of thier simulation. However in natural languages we have a similar situation and for language to work for large populations and for record to make sense for thousands of years we want much of the language to be stable with possiblies for evolution at the fringes….
How do we ensure semantics persist over time ? We call this is the idea of grounding. Imagine all the most important ideas were written down in a book and that book was passed down from generation to generation. Everyone might need to learn the book a little after they learned basic language skill in school. soon the book becomes cannon and no one may change it. Over time though it might be permitted to add bits when new concepts were discovered and proved important enough to preserve.
Short of starting a religion for our agents we may want good mechanism that will keep the language grounded so that cooadaption and semantic drift are kept in check.
Another point here is that if our agents are aware that the framing game has been swapped from battle of the sexes to Prisoners dilemma that they may want to keep their semantics for the Battle of the sexes intact and use them as a template or prior for the Prisoners dilemma. Since prisoners dilemma is non-cooperative, there may not even be a perfectly separating equilibrium for framing game so that assigning a language from a template prior might actually be of benefit. .
Note
##work in progress
Another way to look at the coordination AKA best response
Here is a little paradox. While in the simple signaling game the goal of the agents is to find an equilibrium that is perfectly separating. In most complex signaling game there are seems to be many partially pooling equilibria that are arbitrary close to such a perfectly separating equilibrium. In many MDPs a subset of states are more important than others, e.g. bottle necks in a maze, central square in chess. This is often formalized in terms of the average time an agent is likely to spend in each state. and so on and in language with a Ziphian property most words have a low frequency of use. This means that learn an approximate signaling system that gives arbitrary close to the optimal signaling system with much higher probability than learning the optimal signaling system. This makes more sense if the language makes infinite use of finite resource. This may be the case if the langue has a recursive grammar.
Now it is worthwhile to set down a few definitions.
Complex State - a structured state that might be described by a sentence or a paragraph. Since we can rephrase any paragraph as a single sentence can assume that the state can be captured by an arbitrarily long sentence. In many cases though there will be a data structure or an image. It is worth noting that if the states have a rich structure we may be able to replicate this within the language.
A simple example is if we want to represent a tree and trees we might have use a prefix code to repsient singular and another the plural form. We might reuse this prefix to represent plural in both nouns and verbs.
A related example is that we might learn a prefix for two trees. We could generalize this to any number of trees.
We could make use of recursion to represent a number system for all number of trees using a few symbols.
This recursive rule would expand our signaling system to capture an infinite number of states using a finite number of signals. And what is more interesting we would be able to learn it from a small set of examples.
The point here is that if the states have such rich structures the language that preserves these can potentially be orgenised to in such a way that it can be learned more efficiently then in the tabular case. The idea is that instead of a table we could use a rule perhaps recursive to encode this part of the state. But only if it has such a structure.
Finally note that even if there are many such functions we could be able to compose them in a way that the become a single function. This is the idea of compositionality in language and it likely the key for learning to represent arbitrary complex signaling systems.
Signal - In the complex signaling game the signal can be viewed as
as a string of arbitrary length made using a limited alphabet of size |L|
as a number N in base |L| i.e. N_L
Prelinguistic object
I noticed that people use this term as a synonym for the state. I want to use it a little differently. I want it to correspond to a sub-state that may be interpreted as a unit of meaning. There may be multiple prelinguistic objects in a state. We may consider these as parts of a picture for example and each part may need servral words to describe it. Or we may refernce a bit in a binary vector.
as noted above the states may have a rich structure, e.g. nouns, verb, inflection, a recursively defined number system. Or a recursively defined system of clauses. The last might even generlise the number system.
The prelinguitic objects may be in a list, a tree, a grid a graph or some other data structure. However it seems that we might gravitate towards trees as they are the most common representation for parsing natural languages and more importantly they can be defined using a simple recursive rule.
Encoder - a function that the agent learns to convert the state into signals it needs to
serialize the prelinguistic state into a sequence
convert each prelinguistic object into a sequence in the alphabet L.
Possibly use some kind of symbol as a delimiter. (Prefix code precludes the need for this)
Decoder - a function that the agent learns to convert the signals into actions. It the inverse fuction of the encoder. And has the same steps but inverse order.
Tabular Language - when we learn a tabular representation of the signaling system we can assume that the encoder and decoder are perfect and that the signals are perfectly separating. However in this case the language is unlikely to generalize. I.e. we need to learn all the signals to be alble to understand what states they refer to. Since the best case we need to learn |S| signals for |S| states we need to to test |S|^2/s signals in the worst case. This is the main shortcomming of the tabular approach. Note that the endocde and decoder are function but in this case they are acting as a lookup table in step 2. of the encoder.
Functional language - This is a language that is learned fitting parameters in a model that approximates the tabular language. Since linear function approximation can exactly replicate the language we can without much trouble approximate any language with a linear function. With one paprmeter per-state we can preferctly replicate the tabular language. The advantage of the functional language is that it can generalize to new states. This means that we might learn the grammar, and morphology using a very smaller subset |MS| and then we would need to learn a fraction of the lexicon consiting of the base forms of the words |MS| + |BL|. The learning time would be (|MS| + |BL|)^2/s. And it would cover all the states. The more infelction and dervied forms we have the greater the speaker’s ability to generelise. Also the syntax would then allow to combine these words into sentences. The language might be suitable to handle potentialy infinite number of states with a finite learning time stated above. However as I pointed out just now the replicating a tabular language will not endow it with this generalization ability, not unless by some lucky coincidence the encoder is able to capture and preserve the full structure of the state space.
Now it is worth making a couple of observation.
we can learn a large table of states and thier associated signals by enumerating them using base |L| numbers. This is one baseline. IT has a lexicon |S|.
We may use train an encoder to encode some natural language say English sentences into binary signals. Then we could use english to encode any state. This is a second baseline. It has an alphabet of size 27 and a lexicon of size - the number of words in the english language. |E| + |S|. English is a very general purpose language. The down side is that the language will be large and require lots of resources for each new agent to learn it. About (|E|+|S|)^2/s steps in the worst case of an optimal algorithm. Note though that using English would also require learning rules of grammar and syntax and so on. We might also need to be sure to avoid ambiguity in the sentences our encoder uses, we might just use english and specify the details that let the receiver resolve any ambiguity. However the size of the lexicon is impractical.
We could do better by giving each word a previously unused prefix to indicate things like the word sense, the part of speech, or an clues as to what it is referencing. This would only make the lexicon a little larger but would possibly eliminate all ambiguity.
English has a complicated grammar. We might also simplify it. This would be easier if we made liberal use of prefixes in item #3. This would convert the language to a more morphologically rich language and allow us to radically simplify the grammar. For example we could use the bases of words and the prefixes to get a highly predictable morphology and drastically reduce the lexicon.
With all the pos-organized as regular and unambiguous we might just also be able to discard things like agreements and our grammar could become much simpler. We would have a single recursive rule that lets us parse the sentence into a tree.
If |S| are structured and not particularly complicated. We might even require a rather small subset of english to encode the states. (|ME|+|S|)^2/s This minimalist english should be much easier to learn and use but expresive enough to encode the states. This is the third baseline. It has an alphabet of size 27 and a lexicon of size - the number of words in the minimalist english language. |ME| + |S|.
We might also consider that in most cases we don’t even need that much expressive ability in the language. In this case we are looking at a domain specific version of minimalist english i.e. |DME| + |S|. This is the fourth baseline. It has an alphabet of size 27 and a lexicon of size - the number of words in the domain specific minimalist english language. |DME| + |S|.
If we use function approximation with Todays signaling systems is tommorows partial pooling equilibrium. As new states and thier associated prelinguistic objects manifest, agents will need to extend thier state space and action space to handle these new states and objects.
Complex signaling systems have three main facets that are different than simple signaling systems.
Limited signals but a longer message length.
Complex states spaces
In terms of signaling systems here is my best idea:
Inductive Learning via Hierarchial Bayesian Frameworks
This is an idea that I had been thinking about for a while. It is based on the idea that we can model learning as a process of generalization. Giving Lewis agents different states to coordinate on leads them to learning a new lexicon for each scenario. Unfortunately they need to learn everything from scratch. The main issues are the inability to generalize to new states and the consequence is the inability to transfer learning from scenario to another.
If they were able to learn rules there would might be able to use them to assemble a more complex language. This might happen by nesting them or it might be by replicating them them specializing for new tasks.
So this leads to a chicken and egg problem - how to represent the states in a way that allows agents to see that different tasks are similar so that they may use skills already learned before to bear on these new tasks.
I wanted to use RL with temporal abstraction. The life long learners would develop as children do. They would learn basic representations, then derive rules for them, to handle poverty of stimulus they might use a bayesian model to infer the rules by induction on a few examples. As they learn more complex skills they might learn new to represent different paradigms of knowledge. become exposed to more tasks
Agents should learn to generalize from a limited set of examples. There are a number of increasingly complex tasks that agents might need to learn to handle. It would help if they could train on a curriculum of tasks that increase in complexity. It would be even better if they could transfer learning from one task to another And it would seem possible that give a minimally rich curriculum they may be able to learn a representation powerful enough to match arbitrarily complicated states.
There are two main ideas here.
agents need to make hypothesis about states e.g. the structure the states, and then to pick the hypothesis that is best supported by the data. This is the idea of inductive learning.
learning simple models first using easy examples
infer the prior that best supports the inductive bias for each subtask
assemble these into deeper hierarchies to combine earlier learning into more complex structures.
check if these nested models can be used to learn
A hypothetical Bayesian Curriculum :
Agents need to coordinate on one of two hypothesis (which coin C_1 | C_2 generates a sequence of states S=[H|T]+)
Agents need to coordinate on one of an infinite set of hypotheses (which coin C_\theta, \theta \in[0,1] generates a sequence of states S=[H|T]+).
Agents need to coordinate on one of two hypothesis of different complexity. E.g. similar to the previous scenario but
H0 \theta \in [0,1]
H0 \theta \in [0.45,.55] Which requires penalizing the more flexible model for it’s complexity!
agents should learn semantic heirarchies for the language (the fixed number of categories corresponds to a derisclet prior over the states)
agent should learn morphological categories for the words
agent should learn to describe properties of objects say animals or plants encoded in a feature vector using nouns and adjectives. (Two word state)
agent should learn Propositional Logic over binary features over some arbitrary number
agent should be able to learn to parse and evaluate simple arithmetic expressions possibly nested
The
It was not clear how it
using the ‘blessing of abstraction’ together with probablistic learning / induction vs
the poverty of stimulus and the curse of dimensionality.
resources:
In https://videolectures.net/videos/icml07_tenenbaum_bmhi/ Tenenbaum talks about the inductive learning using bayesian models. These hierarchial bayesian models can learn some simple but non trivial distributions that alow agents to model learning certain tasks in young children. This is somewhat in line with an notion that learning might be accellerated with an inductive bias tuned for specific tasks. And that by learning a language should be broken down into a curriculum based on mastering semantics for simpler prelinguistic objects before learning the full complexity of states.
“Bayesian models of cognition” chapter in Handbook of Computational Psychology
https://cocosci.princeton.edu/tom/bayes.html
Imitation Learning
what else is this called
Story: Imitation
In nature one species, say primates can gain fitness benefits if it can learn and imitate the signals of other species say birds.
If they can understand but cannot reproduce the signals (e.g. bird calls) they may resort to using a different set of signals (e.g. hand signals) to communicate with each other.
Dynamics
We do not require that the original signals come from one species, there may be a number of species that can provide signals that are useful to the primate and these might change seasonally or due to migration to new habitats.
Once a rudimentary signaling system is established, it may be natural to extend it to more sophisticated system that can be used to communicate additional states.
The ability to signal may well lead the tribe to become more cohesive and more efficient at hunting and gathering and avoiding predators. Some examples are:
It now becomes more useful to assign idle or less productive members to roles of sentry or lookout while the others broadcast their messages.
Also this creates a new benefit that takes the form of social welfare. This means that derelict members of the watch or those that raise false alarms will be subject to punishment by thier peers. This is a form of costly signaling that is used to maintain the integrity of the tribe.
Less capable forgers will be able to more quickly learn the locations of food sources by sharing signals of the more experienced foragers whenever they return to the safety of the tribe. This may also allow the tribe to better exploit the food sources that are available to them in a far larger area. Thus the carrying capacity of the tribe is increased by sharing of information, and the tribe might also become more localized and less nomadic. They may choose to migrate to new areas only when the food sources are depleted in a larger neighborhood.
Stag hunts - hunting large game that requires coordination of multiple hunters, the ability to signal and coordinate the hunt using this new signaling system may allow the tribe to hunt larger game that they would not have been able to hunt before. This will also let them specialize in hunting.
These and numerous other social benefits may accrue to the tribe that is able to learn by imitation augmented by invention.
In natural language
Imitation is prevalent in natural languages as well. Loan words as well as idioms are often borrowed from other languages and may have similar sound structures as well as semantic meanings. Loan words frequently require some adaptation to the phonetic structure of the borrowing language. Some languages like english preserve the original phonetic structure of the loan word, while others like Japanese may adapt the phonetic structure of the loan word to the phonetic structure of the borrowing language.
In many cases if the words are used a lot and it is a poor fit for the phonetic structure of the borrowing language. it will be replaced by another word that is a better fit and the original word will be forgotten.
This mechanism is likely understood as part of Hamming Moat Formation that keeps languages from merging
Sources
A number of sources for this phenomenon are available in [@]