Complex Lewis Signaling - The Research Questions

Clarifying my research questions

A list of clarified research questions regarding emergent languages, planning, and grounding in multi-agent systems.
research
emergent languages
planning
grounding
Author

Oren Bochman

Published

Wednesday, April 2, 2025

Modified

Monday, September 22, 2025

Talking with some academics at BIU it dawned on me that I need to clarify my research questions if I want to write a thesis or a paper or even to just solve them.

I took a time out today 2025 to update the research questions for the Lewis signaling game and emergent languages in multi-agent systems.

  1. I have made lots of progress recently but I have not been able to articulate it well.
  2. It’s not clear what direction I am going with my research as there are a number of related questions I am trying to answer
  3. My research questions are not sufficiently clear.
  4. A number of them are such that it’s unlikely that anyone will be able to say “I am an expert in this area”. This is an issue if I want to write this up as a thesis or a paper with someone else as an advisor.

Posing research questions is a skill that I can’t say I have had much experience with. What I have looked into which is similar is how to write a story premise for a screenplay. If an author can write a good premise then they are well on thier way to writing a good story. In fact Michael Hague suggest putting it on top of your typewriter or screen and using it as a focus for everything you write. Now I can say that with research questions things are rarely as simple you can’t invent a good research from a solid premise. On the other hand problem solving Mavens like Polya and others have suggested that a well posed problem is half solved.

I wrote this post in a funny way. I started with a list of broader questions that I had been thinking about and then refined them to more specific research questions. These underwent some further refinements. At a certain point I began to see that some of the questions were too broad and would need to be excluded from the current thrust of my research.

I went further to write one clear research question and I then one that places it in the context of Bayesian Non-Parametrics (BNP) which seems to be the most promising approach to solving the problem.

Research Question

There should be one clear research question that I am trying to answer. It should be drawn from the Research Questions below. It should be solved in at least one setting.

Research Questions for the Complex Lewis Signaling Game

Find an algorithm with suitable models that allow a the Sender agent to quickly and efficiently plan, teach a language possibly using a grammar for communicating states over a channel using a sequence of symbols drawn from a restricted alphabet. This algorithm will then be evaluated in MARL experiments for different States Spaces

Modifying the Lewis Signaling Game to facilitate Curriculum Learning.

To facilitate this curriculum learning approach we will assume that the agents get reward signals for a reduced challenge comprising of a subset of the full state space.

If they succeed they can use the reward to buy a more complex challenge with a larger sub-state. (The default is to keep trying the next challenge until they succeed). This allows the teacher to tech the most common sub-states first and then more to aggregation rules for deriving more complex states from simpler ones….

There is of course a built in penalty for not dealing this the full states. I.e. they will only get a partial reward for solving a sub-state. There is also a small cost for buying a more complex challenge so that that solving multiple partial challenges provides higher payoffs but not as much as solving the full challenge!

BNP Version of the Research Question

Find an inference algorithm based on a separate BNP models for sender and receiver that facilitate a Sender agent to quickly and efficiently plan, teach a language possibly using a grammar for communicating states over a channel using a sequence of symbols drawn from a restricted alphabet. This algorithm will then be evaluated in MARL experiments for different States Spaces

Alternatively the agent might not need to plan a full language but develop it online as states stream in. This is more like a POMDP setting where the agent only knows the states it has seen and their distribution. However This is more complex and may be better suited for future work.

Research Questions

These are more detailed research questions and their interrelations.

Question 1: Communication and the Lossy Channel

  1. How can agents learning to coordinate on a shared language to communicate complex states of the world using sequences comprising of symbols from a fixed alphabet? (Three problem settings)
  1. under ideal conditions. (Has prohibitive coordination costs)
  2. How can agents handle mistakes in communication e.g. lossy channel (selection for signal robustness)
  3. Under risky conditions where each signals (symbol) presents a risk to the sender and/or receiver. (select for shorter messages)
  4. Risks and noisy channel. (selection for both robustness and shorter messages) i.e. error correction codes.

Planning of optimal languages.

We might assume that agents are rational and can plan their communication to optimize some reward. Rather than relying on spontaneous symmetry breaking to emerge a language, how can agents plan or improve thier initial language for optimal language for communication under different settings.

  1. Under the following settings how can the sender plan an optimal language for the receiver or a group of receivers?
  1. long/short horizon - receivers spawn offsprings and become their senders. Original senders may die off after some time.
  2. full information in fixed world setting with full information the sender knows the full distribution of states and the receiver only the possible states. how can the sender plan an optimal language that is easy to learn by the receiver or a group of receivers?
  3. ** POMDP** sender only knows states it has seen and thier distribution.
  4. ** POMDP and Dynamic world** state and thier distribution are changing over time. (we might have a drift of the current state and changes in the dynamics themselves like in the NDLM). This corresponds to a life long learning setting where the agents must learn to adapt to changing environments and tasks.

The main

where the states are evolving over time according to some dynamics how can the sender and recievers plan the language to maintain its optimality? Can we derive some sort of regrets bounds for the loss of optimality as the states evolve? c. Under the

  • What are the determinants of optimality in this language?
    • Optimality in pure lewis game here generally refers to highest fitness.
    • At a deeper level thought fitness may be a evaluated as the agents ability to achieve diverse goals in a given environment where coordination is an asset

Out of Scope for now - these are questions that are beyond the complex Lewis signaling game

Reducing the coordination problem : Curriculum learning and Causal Structure for survival

In reality we don’t learn a language in one go. We learn it using a curriculum based on some causal structure of the language possibly based on the causal structure of the environment. This lets us learn parts of the language that are more useful first and then build on them to learn more complex parts of the language. I.e. we want to get to fitness > 0.5 as fast as possible and then refine language and other skills to get to fitness greater than other agents or to the 1.0 if we are cooperating.

Secondly we may assume some problem solving strategies to reduce the coordination problem into smaller problems. If the agents only need to coordinate on two states they can learn a very a signaling system quickly. But is they have to deal with 1000 as they have a scaling law of n^2 for the number of tests they need to try per signal state pair.

  1. How can the sender develop a teaching curriculum for a language it plans so that the receiver or a group of receivers learns faster and maximize their individual and collective fitness?

This seems to be a meta-learning problem which is again beyond the scope of the Lewis signaling game. It may however lead to a “higher solution concept” in the Lewis signaling game in the sense that such agents will reach a signaling system faster than agents that do not use a curriculum. In a resource constrained environment such agents may out compete other agents.

This is worth considering as developing the curriculum may actually be easier than planning the full language. There may even be a trade off. Learn a perfect language very slowly or learn a language with is less than perfect but become fluent much faster. Also there is a core idea that if the language is larger there is anecdotal evidence that for selection of greater generalization.

an new twist on an old idea: A general purpose language confers more benefit than a domain specific language but is harder to develop and to learn.

Grounding

If the agent have some algorithm with Expected fitness > 0.5 can they repurpose their signaling faculties to handle changing environments or tasks?

  1. If agents are equipped with the algorithms that can solve the above problems how can they efficiently repurpose their signaling faculties for use in
  1. A new environment or
  2. Under the life long learning setting?

Grounding is beyond the scope of the Lewis signaling game but may be conducive for developing transfer learning in RL. Deep learning allows agent not only to learn a classifier or regressor but to learn low level features.

If our agents are equipped with a model that like an Indian Buffet Process allows representation of features to be learned in a non-parametric way, agents might be able to use learn an interface that lets them use parts of a language for use in a new environment. Here again things may happen much faster if they get some signal from their mistakes as well as their successes.

Before these questions were posed I had developed less precise and wider ranging questions. I have now refined them to be more precise and more focused on the Lewis signaling game and its extensions. Here are the broader

questions that I had before. ## Broader Questions

  1. How can agents use a language to facilitate transfer learning across curriculums of multi-agent environment.

  2. What are some (efficient) ways for a language to emerge in a MGP with a number of interacting agents?

  3. What are the desirable properties of such emergent languages?

  4. Given that a language can emerge by spontaneous symmetry breaking, how can an agent plan a better language for such a collective? If the over arching concern is learnability, is there a clear way to measure the learnability of a language and to

  5. Paradigms for language emergence ?

    • Can we consider the language emergence as independent of the other agents other behavior?

      • If we so wish what assumptions must we make?
      • e.g. if we have MGP i.e. a MARL environment and with Lewis a asymmetrical viewed state, send/receive actions and a cooperative reward signal for the receiver who decodes the state correctly, then wouldn’t the agents with an exploration strategy eventually learn a signaling system even if the greedy actions is to zero sum action?
    • is this a game theoretic question ?

    • should we treat language emergence as Lewis step in an extended form game where payoffs and strategies are intimately entangled?

    • should we treat language emergence as an iterated sequence of games where in each step decision is made independently of the previous step?

  6. If we consider wide classes of complex lewis signaling games can we characterize types of equilibria that are more likely to emerge. If some equilibria more stable or stronger attractors than others

  7. Grounding:

    • The goals of the grounding is specific if not rather broad. But should be considered as an MVP, i.e. the top goals are to be realized first the others as future work.
      1. map a GPL general purpose language encompassing many different MDPs and the experiences gained within them onto the current mdp using a subset of the GPL that is isomorphic to the current DSL.
    • The grounding is a set of symbols that can be used to communicate about the state of the world and the actions of the agent. The DSL or Domain Specific Language is a set of symbols and an aggregation rule that can for agent to communicate and some additional symbols that can let the agent communicate about its models with an LLM or some other agents. The LLM here might be a gateway to a RAG system with access to its past experiences, dynamic models, values, policies, options, general value functions, etc.
    • There seems to be a rather trivial way to do grounding for an MDP/MGP for each action in the MDP/MGP we can assign a unique verb symbol for each state in the MDP/MGP we can assign a unique noun symbol
    • To handle case where state is structured e.g. in two parts we can use a noun phrase with two nouns e.g. cat and dog
    • We thus need a set of primitive symbols to capture the state + a syntax to combine them into a phrase.
    • This would allow us to describe even an image as a list of pixels….

Some other questions

  1. how much of a natural language vs synthetic language is overhead ? what are good metrics for this?
  2. is this still true when resources are as severely restricted as in the case of a human agent?

Citation

BibTeX citation:
@online{bochman2025,
  author = {Bochman, Oren},
  title = {Complex {Lewis} {Signaling} - {The} {Research} {Questions}},
  date = {2025-04-02},
  url = {https://orenbochman.github.io/posts/2025/2025-04-02-research-questions/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2025. “Complex Lewis Signaling - The Research Questions.” April 2, 2025. https://orenbochman.github.io/posts/2025/2025-04-02-research-questions/.