Multi-agent Reinforcement Learning in Sequential Social Dilemmas

paper review

paper review
multi-agent reinforcement learning
sequential social dilemmas
cooperation
Markov games
agent-based social simulation
non-cooperative games
Author

Oren Bochman

Published

Monday, June 10, 2024

Matrix games like Prisoner’s Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Qnetwork, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.” – (Leibo2017marl?) abstract

Warning

Initially at least it seems to me that the “sequential social dilemma” is just an analog of an iterated prisoner’s dilemma game. This game considered the basis of all other social dilemmas has been treated at great length in (Axelrod1980?) and (Axelrod1997?) with the famous Axelrod tournaments. Some interesting results include the fact that the best strategy in the tournament was the simplest. Population dynamics are such that in population of agents with one strategy, the introduction of new agents with a different strategy can gain a foothold and cause the dominant strategy in the population to switch to the new strategy. This is known as the “evolution of cooperation”.

This work led to a growing interest in the study of cooperation in non-cooperative games with about 300 new papers published on the topic in the 10 years following the publication.

Later work looked at the robustness of the winning strategies in the Axelrod tournaments, see (Nowak2004?).

Introduction

In (Leibo2017marl?), the authors introduce a new class of social dilemmas, called sequential social dilemmas, which are inspired by the classic matrix game social dilemmas like the Prisoner’s Dilemma.

Citation

BibTeX citation:
@online{bochman2024,
  author = {Bochman, Oren},
  title = {Multi-Agent {Reinforcement} {Learning} in {Sequential}
    {Social} {Dilemmas}},
  date = {2024-06-10},
  url = {https://orenbochman.github.io/posts/2024/2024-06-10-review-MARL-in-sequential-social-dilemmas/},
  langid = {en}
}
For attribution, please cite this work as:
Bochman, Oren. 2024. “Multi-Agent Reinforcement Learning in Sequential Social Dilemmas.” June 10, 2024. https://orenbochman.github.io/posts/2024/2024-06-10-review-MARL-in-sequential-social-dilemmas/.