TL;DR: A desiderata For Emergent Languages
In this space I want to collect a desiderata for emergent languages. Following (Skyrms2010signals?) and others my view of emergent languages is through the lens of minimal extensions to the Lewis Signaling game. The lewis game particularly is not well understood when we extend it to compound signals. Thus my desiderata for emergent languages remains minimal and shaped in part by my intuition of the simple and compound variants. I also do not see the need for deep RL in this space.
(Skyrms2010signals?) and others have shown that we can also add a lot of structure to the signaling system. In his book he considers, learning to reason, learning to form efficient communication networks, learning to use better communication protocols. Others have considers inducting communication protocols from a meta protocol. [] And this has been coupled with the idea of learning to represent the structure of the state space.
I think that the coordination problem is the easy part and I have devoted some time to solve it in a number of settings. I think that the next challenge is to learn a reductionist concept for signal aggregation. Something that allows bags of words, ordered sequences and recursive structures to be learned in much the same way. I think this we have not been asking the right questions to find it yet since aggregation is a hard problem. However I think that at this point a starting point exists and the next problem is to learn the structure of the state space. I think that here we can apply some existing algorithms that may be a good fit for Lewis signaling games but it is quite important because the structure of the state space is what will determine the optimality of the language to a particular domain. A second facet is that we should think of a minimal state space that captures the essence of the real world. This should be a model that we can consider sufficiently powerful for real world agents. But in terms of the emergent signaling system I don’t think it will be qualitative or quantitatively all that different from other systems other than this it could act as a good interlingua for transferring between natural languages as well as between tasks in RL. This is therefore the holy grail of emergent languages at this point.
Regarding deep learning - I think that the best work in deep learning is done by researcher who have a clear view of the problem and use the DNN to approximate the function they cannot extend or scale beyond thier simple model. In the case of Emergent Langauges people have built kits and adapted other architectures to that is overkill instead of focusing on the real problem at hand. So I think that most of the work in that space if flawed even if some the results are interesting.
Most of the desiderata for emergent languages are sourced from natural languages. However when we look at natural languages we see that the desiderata are not a feature of such languages but rather an idealization. More alarming is that these are not provided with a base line metric from different languages as a baseline.
I do think that most of these properties are sourced from natural languages. A second point is that natural language is not optimal in any of these desiderata and this is a point well worth remembering.
- Natural languages are not easy to learn.
- Natural languages combine regularity with much irregularity and this happens at all levels of the language from phonology, orthography through morphology and also syntax.
- Students, particularly children, are prone to misgeneralize and need to be corrected with the right forms. Instead of learning one base form of a word in most languages you have to learn a number of additional forms.
- Natural languages contain numerous homophones (different words tha sound the same) and homonyms (single words with multiple senses).
- Written languages often require punctuation to make the semantics precise. (The spoken version may often be subject to misinterpretation.)
- Natural languages contain lots of redundancy that is not particularly useful for better communication and makes them hard to learn.
- Natural languages are rife with ambiguity and though one can make a case that we can disambiguate them from context. This is true when we want to communicate. It is not true when people want to dissemble - listen to any politician or lawyer on the spot and you will discover that they are using a lot of words but to say very little. this is not the case, it is just the way we parse them. Given a number of parse tree of an ambiguous sentence we are told we can usually pick the right one. However there are more sentences in the language that have many many valid parse trees then just one or two.
- Idealy item in the desiderata should come with some metric that can be used to formalize it and to rank different signaling systems.
- Also it would be nice if there were examples.
My Desiderata
So the desiderata for emergent languages are
Important:
Easy to learn
My experience with the Lewis signaling game is that it is easy to learn and that Natural Languages are not.
Hypothesis: Complex signaling that fulfill enough desiderata may suffer from reduced learnability. I have already written an article on some different ways that signaling systems can be arise.
Questions: Howe can we evaluate the learnability of a signaling system? What are the metrics that we can use to evaluate the learnability of a signaling system?
Minimal description length MDL - the number of bits needed to describe the signaling system is what agents need to coordinate between them to learn a shared communication system.
We like to consider two cases: 1
1 I have already written an article on some different ways that signaling systems can be arise.
- there is a sender/teacher with a good signaling system and a receiver/student learning it.
- there is no sender/teacher and the agents have to construct such a signaling system from scratch.
the above notion if MDL is a good metric for the first case but not the second. In the second case we need to consider the complexity of the state space as well as the algorithmic complexity arriving at a common communication system. The cost of coordination of an MDL is subsumed by the cost due to complexity of constructing optimal signaling system that faithfully represent the structure of the state space.
Another two points:
- Learning a partial system should give agents better benefits than not.
- Learning as a group should be easier and quicker than learning individually.
- e.g. Learning of rules (grammar/morphology) should amplify the learning and generalization of the speaker wrt the structure of the state space.
Optimal for Communication
Agents should be able to communicate with a high success rate. (This is a doorway to information theoretic formulations)
Emergent Communications should have an expected success rate of almost 1.
Many systems with with expected success rate less then are acceptable however we can tend to see agents reach close to 1.
Resilience to Errors
Signaling systems should be resilient to errors. As we inject errors into the signaling system we should see a number of features from natural languages emerge.
- Signaling systems should be resilient to errors.
- Sender errors -
- Receiver errors -
- Channel errors - channel coding
- Complex Signaling systems should be easy to decode
- Complex signaling systems should be
- salient wrt the distribution of states
- risk minimizing wrt risks associated with signaling - particularly in the case of risks affecting agent’s fitness!
- minimize costs/overhead associated with signaling (in rl there should be a cost associated with each marginal bit that the they send across the channel) This may be the reason why the most common states are the shortest signals - using the unmarked case as the default. This is a form of source coding. (Perhaps this items is more fundamental then risk and salience) It may also be the reason why we have vowel harmony in some languages and why there are other types of redundant agreement in different languages.
- A theorem: if a (natural) language arising via evolution has a redundancy that may be removed without loss of information or via context then it will be compressed and eroded or elimnated given time. Thus such features are that exist and are stable are will have a measurable benefits in terms of communication.
- The signaling system should be able to generalize to new states
Nice to have:
Signaling systems should be able to faithfully encode spacio-temporal and hierarchical structures in the state space.
Distributional Semantics2 & Distributed Representations
- Signaling systems should be alignable with 2000 discourse atoms c.f. (Arora et al. 2018), or a subset if they come from a much simpler state structure.
- In fact a major point to reassearch on emergent languages get to see if they manifest distributional semantics. I hypothesize that this will happen if the the state space is has a semantic basis - i.e. the state space is a vector space with dimensions that are semantically orthogonal.
Generalization - every time an agents learns another part of the system it should have. My solution here leans on using group actions to structure the state space. Either one big one group action like for hebrew or a number of smaller ones like for english.
Morphosyntax should be stable over time and be composable with the abstarct morphology structure of the state space.
compositionality - is state has structure the languages it should be preserved/mirroered by the language significant increase in what it can express and understand. I lean towards adding a topology to captures semantic hierarchies. The different signaling system are associated with a lattice of topological groups with the complete pooling equilibrium at the top and the unstructured seperating equilibrium. In between are partial pooling equilibria and the various structured separating equilibria. For compositionality we want to pick certain structured pooling equilibria over the structured seperating ones.
Disentanglement - the language should be able to encode multiple concepts in a single signal (this is a form of compositionality but also not what we see in ) I think this
Entanglement - when language encode two or more semantics in a single signal. e.g. ‘They’ encodes (third) person and plural (number) as one signal. This is a pronoun but it is not inflected and is not made of two bound morphemes, it is a single morpheme.
2 a word is characterized by the company it keeps
I want to come up with a information theoretic notions behind driving Entanglement and Disentanglement. 1. I think they are based on the mutual information between the signals and the states and relative entropy. 2. THe number of sub-states in the structure is high it best encoded as a group action i.e. a rule 3. If the sub-states are a few it is best encoded as a dictionary 4. If like a pronoun a complex signal is high frequency and high entropy there is a greater fitness to compress them into a single signal. And we might want to reduce errors by intentionally making boosting the phonemix contrast.
In reality natural languages are not optimal in any of these desiderata. They are the result of a long evolutionary process that has been shaped by many factors. However I think that the desiderata are a good starting point for designing a language that is optimal for a particular task.
- stability of regularity and irregularity (resilience to errors and to evolution) consider that a language that generates entagled structtures to compress and reduce mistakes for something like its pronouns these should be stable over and not replaced by a more regular system that is less efficient…. i.e. the loss for having such a pronouns should be less then a the gain from having a more regular system.
- Zipfian distributions
An evolving desiderata of Emergent Languages
mappings between states and signals
- morphosyntax mappings preserve partial states (Homomorphism of normal subgroups)
- mappings preserve semantic topologies (if a is close to b then f(a) should be close to f(b))
Stability
- Regualrity is stable (mappings used in syntax and morphology are stable over time)
- Irregularity is stable (mappings used in irregular verbs and nouns are also stable over time) In english we maintain many irregular borrowings from other languages and thus preserve thier regularity - making such exceptions easier to learn too.
Compositionality
Brevity (source coding)
Self correcting (channel coding to detect errors and correction them through redundancies like agreement, vowel harmony, etc.)
Learnability - how many things need to be coordinated; the complexity of the strucures, the Hoffding bound on rates of learning distribution when there are errors. The Bonferroni correction for multiple learners.3
Stable irregularities
zipfian distributions - the most common signals should be the shortest and the least common the longest. This is a form of source coding and would arrise naturally from huffman coding, except that this isn’t practical for several reasons. It could also arise out of laziness in the sender
Faithfulness
Distributional stability
Decidebility - easy to disambiguate ambiguous signals from thier context
Expressivity - the ability to express a wide range of concepts
Generalization - learning the language is possible from just a few signal state pairs.
3 multiple learners has similar logic as a multiple hypothesis testing problem, for each learner postulating different signaling system with each failure or success in a Lewis game. More so when learners get to observe each other’s actions and rewards.
Some metrics
- Compositionality
- Topographic similarity
- Source coding
- Compression ratio
- Entropy
- Mutual information
- Error detection and correction
- Error rate
- Hamming distance
- Learnability
- Number of examples to learn a new word
- Number of examples to learn a new rule
Another random thoguht or two:
VO and OV via symmetry breaking
If we use Huffman coding like process to organize the order of the morphological and syntactical elements (effectively making the fixing the on avarge most surprising partial signals before the next on average most surprising ones) we should have emergent languages that are rather similar and fairly easy to learn Like Turkish and Japanese. However there is at the start the question of how to apply aggregations. If action is first we get V O languages if it is second we get OV languages. I think that V carries more entropy in Predation and Resource gathering games so that VO should be more pravelent. However once this decision is made most partical algorithms will not be able to reverse it.
Vowel Harmony
if agents backprogogate with topographic similarity in mind and the basic signals (phonemes) are endowed with a similarity they may end up with systems with vowel harmony and alternation of consonants to capture sets normal subgroups with greater similarity.
if these regular configuration also lead to better channel coding the benefits should persist.
Compositionality in Lewis signaling games
So here is a sketch idea for an algorithm for learning a compsitinal language in a lewis game.
We need a language designer. This is can be the sender, the reciever or implicit. Without loss of generality we can assume that the sender is the language designer.
THe language designer needs 1. to a ‘semantic’ metric to decide when two state are close or distant. 2. a way to deconstruct states into atomic orthogonal/independent parts. I am thinking of normal subgroups.
Note that we can define the metric on the parts and aggregate them to get the metric on the whole. This is a form of compositionality.
More abstractly we can just say that the state is an algebric topological group.
So the language designer can use a template with n part (on for each of the subgroups) Idealy ordered with by the decresing size to prefix code the substates. If they there are duplicate sizes this will yeild multiple equilibria to be selected via sponatneous syemtry breaking.
The designer now can allocate the states to one of the points in the topology. By picking the system with loweset overall distances we get a regular compositional language.
Since there are many ways to do this the designer needs to coordiante with the reciever. However since there is graear regularity they onely need to coordinate a minimal set with each atomic substate apearing once.
Citation
@online{bochman2024,
author = {Bochman, Oren},
title = {Emergent {Languages} - {A} {Desiderata}},
date = {2024-10-14},
url = {https://orenbochman.github.io/posts/2024/2024-10-18-Desiderata-For-Emergent-Languages/desiderata-for-emerent-languges.html},
langid = {en}
}