import heapq
class Node:
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(chars_freq):
"""
Builds the Huffman tree for given character frequencies.
Args:
chars_freq: A dictionary of characters and their frequencies.
Returns:
The root of the Huffman tree.
"""
= []
nodes for char, freq in chars_freq.items():
heapq.heappush(nodes, Node(char, freq))
while len(nodes) > 1:
= heapq.heappop(nodes)
left = heapq.heappop(nodes)
right = Node(None, left.freq + right.freq)
parent = left
parent.left = right
parent.right
heapq.heappush(nodes, parent)
return nodes[0]
def update_huffman_tree(root, updated_freqs):
"""
Updates the Huffman tree with new character frequencies.
Args:
root: The root of the current Huffman tree.
updated_freqs: A dictionary of characters and their updated frequencies.
Returns:
The root of the updated Huffman tree.
"""
# 1. Extract leaf nodes and their frequencies
= []
leaf_nodes def get_leaf_nodes(node):
if node is None:
return
if node.char is not None:
leaf_nodes.append((node, node.freq))
get_leaf_nodes(node.left)
get_leaf_nodes(node.right)
get_leaf_nodes(root)
# 2. Update frequencies of leaf nodes
for node, old_freq in leaf_nodes:
= updated_freqs.get(node.char, old_freq) # Use old freq if not updated
new_freq = new_freq
node.freq
# 3. Rebuild the Huffman tree
return build_huffman_tree({node.char: node.freq for node, _ in leaf_nodes})
def encode_char(root, char, code=''):
"""
Encodes a character using Huffman codes.
Args:
root: The root of the Huffman tree.
char: The character to encode.
code: The current code (initially empty).
Returns:
The Huffman code for the character.
"""
if root is None:
return ''
if root.char == char:
return code
= encode_char(root.left, char, code + '0')
left_code if left_code != '':
return left_code
= encode_char(root.right, char, code + '1')
right_code return right_code
def decode_char(root, code):
"""
Decodes a Huffman code to get the character.
Args:
root: The root of the Huffman tree.
code: The Huffman code to decode.
Returns:
The decoded character.
"""
= root
current for bit in code:
if bit == '0':
= current.left
current else:
= current.right
current
if current.char is not None:
return current.char
def encode_message(root, message):
"""
Encodes a message using Huffman codes.
Args:
root: The root of the Huffman tree.
message: The message to encode.
Returns:
The encoded message.
"""
= ''
encoded_message for char in message:
+= encode_char(root, char)
encoded_message return encoded_message
def decode_message(root, encoded_message):
"""
Decodes a Huffman-encoded message.
Args:
root: The root of the Huffman tree.
encoded_message: The encoded message.
Returns:
The decoded message.
"""
= ''
decoded_message = root
current for bit in encoded_message:
if bit == '0':
= current.left
current else:
= current.right
current
if current.char is not None:
+= current.char
decoded_message = root
current
return decoded_message
def print_tree(node, level=0):
"""
Prints the Huffman tree in a visually appealing format.
"""
if node is None:
return
+ 1)
print_tree(node.right, level print(" " * 4 * level + f"[{node.char or ''}: {node.freq}]")
+ 1) print_tree(node.left, level
Review of “Compositionality and Generalization in Emergent Languages”
Very exciting - this is a paper with a lot of interesting ideas. It comes with a a lot of code in the form of a library called EGG as well as many JuPyteR notebooks. There is also a video of the talk at NeurIPS 2020.
In (Chaabouni et al. 2020) the authors look at ideas from representation learning and apply them to emergent languages in deep networks. THey come up with a number of results.
Abstract
Natural language allows us to refer to novel composite concepts by combining expressions denoting their parts according to systematic rules, a property known as compositionality. In this paper, we study whether the language emerging in deep multi-agent simulations possesses a similar ability to refer to novel primitive combinations, and whether it accomplishes this feat by strategies akin to human-language compositionality. Equipped with new ways to measure compositionality in emergent languages inspired by disentanglement in representation learning, we establish three main results.
First, given sufficiently large input spaces, the emergent language will naturally develop the ability to refer to novel composite concepts.
Second, there is no correlation between the degree of compositionality of an emergent language and its ability to generalize.
Third, while compositionality is not necessary for generalization, it provides an advantage in terms of language transmission: The more compositional a language is, the more easily it will be picked up by new learners, even when the latter differ in architecture from the original agents. We conclude that compositionality does not arise from simple generalization pressure, but if an emergent language does chance upon it, it will be more likely to survive and thrive
Outline
Here is the outline of the paper:
Introduction
- Describes a variant of Lewis signaling game used to study the emergence of reference to composite concepts in deep multi-agent simulations.
- Discusses two specific and intuitive compositionality strategies that capture common compositional structures in natural languages.
- Introduces two new compositionality measures, positional disentanglement (posdis) and bag-of-symbols disentanglement (bosdis), inspired by work on disentanglement in representation learning.
Measurements
- Describes the commonly used topographic similarity (topsim) metric.
- Introduces and defines two new measures of compositionality:
- posdis - positional disentanglement and
- bosdis - bag-of-symbols disentanglement.
- Explains how the new measures are similar to the Information Gap disentanglement measure used in representation learning.
- Illustrates the behavior of the three compositionality metrics on three miniature languages in the Appendix.
Given these two lists, the topographic similarity is defined as their negative Spearman ρ correlation (since we are correlating distances with similarities, negative values of correlation indicate topographic similarity of the two spaces). Intuitively, if similar objects share much of the message structure (e.g., common prefixes or suffixes), and dissimilar objects have little common structure in their respective messages, then the topographic similarity should be high, the highest possible value being 1 – (Lazaridou et al. 2018)
\mathit{topsim}=\rho\left(\left\{d\left(\mathbf{x}^{(i)}, \mathbf{x}^{(j)}\right), d\left(\mathbf{m}^{(i)}, \mathbf{m}^{(j)}\right)\right\}_{i, j=1}^{n}\right)
positional disentanglement (posdis) metric measures whether symbols in specific positions tend to univocally refer to the values of a specific attribute. This order-dependent strategy is commonly encountered in natural language structures (and it is a pre-condition for sophisticated syntactic structures to emerge) – (Chaabouni et al. 2020)
\mathit{posdis}=\frac{1}{c_{len}} \sum_{j=1}^{c_{len}} \frac{\mathcal{I}(s_j,a^j_1)-\mathcal{I}(s_j,a^j_2)}{\mathcal{H}(s_j)}
- s_j the jth symbol of a message and
- a^j_1 the attribute that has the highest mutual information with s_j : a^j_1 = arg max_a \mathcal{I}(s_j ; a)
- a^j_2 the attribute that has the second highest mutual information with s_j : a^j_2 = arg max_{a \neq a^j_1} \mathcal{I}(s_j ; a)
- \mathcal{H}(s_j) the entropy of j-th position (used as a normalizing term)
positions with zero entropy are ignored in the computation.
Posdis assumes that a language uses positional information to disambiguate symbols. However, we can easily imagine a language where symbols univocally refer to distinct input elements independently of where they occur, making order irrelevant.3 Hence, we also introduce bag-of-symbols disentanglement (bosdis). The latter maintains the requirement for symbols to univocally refer to distinct meanings, but captures the intuition of a permutation-invariant language, where only symbol counts are informative – (Chaabouni et al. 2020)
\mathit{bodis}=\frac{1}{c_{voc}} \sum_{j=1}^{c_{voc}} \frac{\mathcal{I}(n_j,a^j_1)-\mathcal{I}(n_j,a^j_2)}{\mathcal{H}(n_j)}
- n_j a counter of the j-th symbol in a message
Generalization Emerges “Naturally” if the Input Space is Large
- Presents an experiment showing that emergent languages are able to generalize to unseen combinations as long as input size is sufficiently large.
- Discusses how the results challenge claims in the recent literature that deep networks fail to generalize.
- Notes that the minimum channel capacity required for the emergence of a generalizing language is significantly larger than the minimum channel capacity required for a perfectly compositional language.
- Presents additional experiments in the Appendix analyzing the effects of agent capacity and input density on generalization.
Generalization Does Not Require Compositionality
- Presents results showing that there is no correlation between compositionality and generalization ability.
- Analyzes the language of a specific run with near-perfect generalization accuracy and medium posdis score.
- Discusses how the analyzed language uses a “leaky disentanglement” strategy where two positions largely specialize as predictors of two attributes, respectively, but a third more entangled position is still necessary for perfect communication.
- Briefly analyzes in the Appendix a language with near-perfect generalization accuracy and very low posdis score.
Compositionality and Ease of Transmission
- Discusses the hypothesis that compositional languages are easier to decode and transmit to new learners.
- Presents an experiment where new Receivers are trained on frozen Senders that achieved a high level of generalization accuracy.
- Finds that learning speed and generalization accuracy of new Receivers are strongly positively correlated with degree of compositionality.
- Mentions further experiments in the Appendix that replicate the ease-of-transmission analysis across various channel capacities.
Discussion
- Summarizes the main findings of the paper, highlighting the results that challenge common assumptions in the emergent language literature.
- Relates the findings to the ongoing debate on the origins of compositionality in natural language.
- Discusses the potential benefits of compositionality for developing languages that are quickly usable by wide communities of artificial agents.
- Highlights the connection between compositionality and disentanglement in representation learning.
My Thoughts
We have three main results:
- Generalization Emerges “Naturally” if the Input Space is Large
- No Correlation Between Compositionality and Generalization
- Compositionality increases Ease of Transmission
From a simple analysis of the lewis signaling game. The cost of coordination for an non compositional language is exponential in the number of states. For a compositional language the cost can reduced an exponential of the size of the lexicon (number of atomic signals) plus some constant factor for learning the grammar if it is some small set of aggregation rules. (Usually one rule is sufficient to support compositionality).
The lewis signaling game is more or less guaranteed to converge to some a signaling system1 with the suitable algorithm. Without a good algorithm the game is more likely to converge to a partial pooling equilibrium where payoffs are less than 1 due to one side or both being unable to conflate different states.
1 one that has an expected payoff of 1
What this means is that all things being equal the compositional language will emerge much sooner the the non compositional language given the oopportunity to do so. So why don’t we see this.
- When the number of states isn’t larger than the number of basic signals it is easier to learn a non compositional language. So we should put a cap on the number of basic signals.
- The multiplier for learning grammar may be big especially if we use a neural network, more so if the grammar is complicates. So if we don’t see compositionality perhaps we need to make the grammar simpler or the state space bigger.
- This is theoretical - perhaps since we use a neural network rather then a tabular RL solution we need lots of data to learn anything. As we go though the epocs there may be enough rounds in the lewis game for use to establish a convetion without the need for compositionality.
- Ok Let’s say we have a fast learning algorithm and a big but managable input space. Do we get compositionality. The answer is not necessarily.
Let dive deeper into this last point:
Despite what Chat GPT will tell you if you ask the lewis signaling game only outputs simple languages. No complex signals, no grammar, no recursion and no compositionality. It has many possible equilibria and none correspond to a language complex signals or grammar.
For grammer and complex signals to emerge you need to tweak the lewis signaling game. (Skyrms 2010) reports on a couple of papers which produce complex signals. Most of the iteresting work came out after the book. Regardless in the papers the agents were given a simple aggregation rule to follow. The conjuction leads to a bag of words. The concatenation leads to sequences. But what they don’t seem to stress is that for a complex signals we want a state that decomposes into a way we can match in our aggregation rule. Think group homomorphism. And there may be multiple decompositions so think normal subgroups.
There isn’t a how to guide to get the agents to use some arbitrary grammar. (Not AFAIK). There are a bunch of books and many papers but they don’t seem to have a the kind of recepies that are needed here. In my view most of these books look for the answers based on what they know rather what they need to know. They may have fascinating points but lead to greater confusion rather then more clarity.
One abstraction I came across is that the notion of a grammar is essentialy a decision tree mapping the signals back into the state. Decision trees sounds simple enough but this tree is not given but needs to be learned by trial and error. Signals due to sucesses are sparse. There might be a setting in which a sender can construct the tree and then the reciever just needs to learn it. But it requires the sender to have access to the distribution of states and sub states. This distribution can be used to come up with a tree that is optimal for a given set of signals. If the sender and receiver don’t have access to this distribution the can learn it. But my work indicates that to learn the distribution to a high degree of accuracy requires more turns then my lewis signaling algorithm does. At least for simple signaling systems.
For a simple signaling system I developed two algorithms. THe first learened a signaling system. The second first enumerated all signaling systems of a certain size and then selected one from those it believed were optimal. Each new state would reduce the belief until the sender and reciever had a common belief. This may not scale but can adapt to new distributions seemlessly. I thought about a similar approch for working with complex grammars. But In this case I did not have an efficent way to enumerate all possible grammars. However there seems to be a way to do this. Instead of considering all decision trees, we can instead consider just huffman trees. These means that the sender and reciever use the lewis signaling game to learn a shared huffman tree. The outcome is that the tree should compress the state space. The only problem is that such a grammar is not likely to be compositional and would be very difficult to learn for humans.
So what we need is for the agents to learn the tree interactively. Two approaches come to mind and these are 1. huffman coding - which builds the tree but doesn’t update it to account for distributional shits. 2. Vitter algorithm for adaptive huffman coding - which updates the tree as new states are seen. THis is 3. adaptive arithmetic coding - which is a generalization of adaptive huffman coding.
One point to consider is that such a grammar is likely to provide a good compression of the state space. This is due to the these algorithms also being compression algorithms.
I would imagine that the output of such a grammar to be a binary sequence. This suggest that this would lead to a entangled representation with no discernable compositional structure.
Now there are reputedly many languages with very simple grammars. But the ones we are familiar with are not simple. They also have large lexicons. We need to put that aside and look for ways to work with simple grammars. It is quite possible to come up with two or three rules that can generate both morphology and a recursive syntax. It might be possible with one rule.
Ok lets briefly consider the other two points.
The paper
there is also a video at
The code
and code at
and even the colab link to
# Example usage
= {'a': 45, 'b': 13, 'c': 12, 'd': 10, 'e': 9, 'f': 5, 'g': 2, 'h':1}
chars_freq = build_huffman_tree(chars_freq)
root = {'a': 45, 'b': 55, 'c': 12, 'd': 10, 'e': 9, 'f': 5, 'g': 2, 'h':1}
updated_freqs
#message = "abcdef"
def test(root, freqs=None):
if freqs is not None:
= update_huffman_tree(root, freqs)
root
print_tree(root)for message in ["a", "ab" , "aba", "abc", "abcd"]:
print("Original message", message)
= encode_message(root, message)
encoded_message print("Encoded message:", encoded_message)
= decode_message(root, encoded_message)
decoded_message print("Decoded message:", decoded_message)
print_tree(root)
test(root) test(root, updated_freqs)
[e: 9]
[: 17]
[f: 5]
[: 8]
[g: 2]
[: 3]
[h: 1]
[: 30]
[b: 13]
[: 52]
[c: 12]
[: 22]
[d: 10]
[: 97]
[a: 45]
Original message a
Encoded message: 0
Decoded message: a
Original message ab
Encoded message: 0110
Decoded message: ab
Original message aba
Encoded message: 01100
Decoded message: aba
Original message abc
Encoded message: 0110101
Decoded message: abc
Original message abcd
Encoded message: 0110101100
Decoded message: abcd
[a: 45]
[: 84]
[c: 12]
[: 22]
[d: 10]
[: 39]
[e: 9]
[: 17]
[f: 5]
[: 8]
[g: 2]
[: 3]
[h: 1]
[: 139]
[b: 55]
Original message a
Encoded message: 11
Decoded message: a
Original message ab
Encoded message: 110
Decoded message: ab
Original message aba
Encoded message: 11011
Decoded message: aba
Original message abc
Encoded message: 1101011
Decoded message: abc
Original message abcd
Encoded message: 11010111010
Decoded message: abcd
Citation
@online{bochman2025,
author = {Bochman, Oren},
title = {Compositionality and {Generalization} in {Emergent}
{Languages}},
date = {2025-01-01},
url = {https://orenbochman.github.io/posts/2024/2024-10-10-marco-baoni-composionality/paper2.html},
langid = {en}
}