Two problems
I have thinking about two problems and it seems that they may be related in an interesting ways.
Demand from Preferences
The first involves developing a demand model based on a non-parametric model for preferences.
From ABM to MDP by inducing learning a reward function
I, along with over a million other students took an online course called model thinking. You can see my notes in the link. When I took the courses in the Reinforcement Learning specialization one of the point ,ade by Scott kept resonating again and again, which was that there is the following hierarchy for behavioral models:
- Rule based - In rule based models agent follow simple predetermined rules yet we often see how even one rule can lead to complex emergent behavior. Sugarscape is a classic example.
- Formal Models - Formal models are not explained as well in the course, but clearly they are a step up from rule based models in which the modeler has formalized using mathematical equations. The predator prey model which is governed by the Lotka-Volterra equations is a classic example. And it can be shown that the predator prey model is a special case of the more general replicator dynamics model from evolutionary game theory.
- Game Theoretic Models - in which we have agents have clearly defined actions and payoffs. The prisoners dilemma is such a game. One advantage of a game theoretic model is that we can use game theory to find the equilibria and use these for study the different possible strategies.
After learning about RL I came to believe that Game theoretical models often require infinite sets to define and that some RL algorithms can do thier magic with a very limited sample from such a game. In retrospect it is an important point though not very precise. RL requires MDP which have the same issues as Game Theoretical models at least as far as a a formal specification is involved. However the RL algorithms can find an optimal policy and at times with very limited resources. However RL was only considering single agents scenarios and I was just as interested in the strategic case of multiple agents.
The second is about how to convert an agent based model ABM into a markov decision process (MDP). If we can do that we can use an RL algorithm like Q-learning to find optimal policy for an agent and perhaps even consider how it relates to global optima that we may find by optimizing a global welfare function for many agents.
Some of the guest lectures in the RL courses I took discussed the issue of learning rewards. I later realized that both state transition and rewards were often something that were not known in advance. So this is a real problem and we may often be able to learn to approximate given sufficient samples of actual behavior.
More recently I learned about the Bayesian view of state space models. Here one can use algorithms like the Kalman Filter and DLM to learn to to predict future and past states based on noisy trajectories.
Models of demand
- In this post I will develop a Polya-Rubinstien microeconomic model of consumer demand. The model focuses on the aggregation of individual preferences.
In the next posts I will consider how to make the model more useful. So we can develop notions like substitutes and complements. For this we may want to add a clustering structure to the products perhaps extending the Polya urn to the closely related Chinese Restaurant Process. And to add another level of interest.
Although I have studied microeconomics as well as supply and demand curves, elasticity etc it seems that a view of demand based on a variance covariance matrix of demand functions or even as a precursor to a elasticity cross elasticity matrix can lead to inconsistency.
The … model of demand discusses thier model which is supposedly consistent.
I recall however that Ariel Rubinstien’s Book on microeconomic theory starts with the concept of preferences then builds it into utility. So perhaps if we can model preferences we could be on the way to a more consistent model of demand.
We will see that the Polya urn model is consistent with the Rubinstien axioms of preferences that make the preferences a rational ordering.
i.e.
- for any two products the consumer to be able to state a preference of one over the other or indifference.
- there is no order effect
- transitivity.
p.s. one issues with preferences is how do they map to price. I guess we will need to deal with that too.
- There are k (10) products in the market each with a price and some id.
- We have a square grid of side L with agents
- Agent start with an endowment and some preference over products
- Preferences are based on numbers of colored balls in an agent’s Polya urn. Initially there is just one ball per product.
- Each turn the agent draws a random colored ball from the urn but returns 2 such colored balls. This reinforces the agent’s preference for the product associated with the ball.
Ok so we have a process that creates heterogenous preferences that reinforce over time.
That is a good start but we see that if we aggregate we get a very steady uniform demand.
We may want to see the welfare of the agents. How often can they satisfy their preference ?
To understand behavior be need to map preferences to a budget allocation. We have imposed a budget constraint on the agent by setting an individual endowment. Now we want to see how they will allocate their budget based on their preferences.
One idea is that should attempt to maximize their utility given their budget constraint.
But I talk about utility informally.
The may however assume that the agents will will want to buy as products in proportion to thier their preferences. Also we may also suppose that for these agents having more is better than having less so they will want to maximize their consumption.
So here we see that the urn meets the market.
sku | count | price |
---|---|---|
1 | 4 | 30 |
2 | 3 | 20 |
3 | 2 | 2 |
they would want to buy a market basket that is the maximal largest multiple of within their budget.
3 sku_1 + 2 sku_2 + 1 sku_3
Q_i = \arg \max_k k \times \sum count_i \times price_i \leq budget
One important edge case is what will they do if they want more than they can afford. In this case preferences will be truncated to what they can afford.
say they have a budget of 30 should they buy
one of thier #1 prefernce or one of thier #2 preference and five of their #3 preference.
demand. If I like chicken twice as much as rice and I have some budget how much would I allocate to each product?
- Agent than buy the top preferred products and gain a happiness score in to
- number of top k-products they can afford
- Prices for
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 600
#| components: [viewer] #[editor, viewer]
_='''{.markdown}
## Tasks:
'''
# app.py — ShinyLive-ready ABM Polya-Urn demo (Mesa-lite core) + Altair via shinywidgets.
# Paste into shinylive.io/py/examples as app.py and Run.
from __future__ import annotations
from dataclasses import dataclass
from typing import Dict, List, Tuple
import random
from collections import Counter
import pandas as pd
import altair as alt
from shiny import App, Inputs, Outputs, Session, reactive, render, ui
from shinywidgets import render_altair, output_widget
# ---------- helpers ----------
def top_n(d: Dict[int, int], n: int) -> List[Tuple[int, int]]:
return sorted(d.items(), key=lambda kv: (-kv[1], kv[0]))[:n]
def weighted_choice(keys: List[int], weights: List[int]) -> int:
return random.choices(keys, weights=weights, k=1)[0]
# ---------- domain ----------
@dataclass(frozen=True)
class Product:
idx: int
sku: str
letter: str
price: float
@dataclass
class Agent:
aid: int
wealth: float
urn: Dict[int, int] # sku_idx -> bead count
counts: Dict[int, int] # cumulative draws per sku
top4: List[int]
happy: bool
def step(self, products: List[Product], l: int = 1) -> int:
last = -1
for _ in range(l):
ks = list(self.urn.keys())
ws = [self.urn[k] for k in ks]
k = weighted_choice(ks, ws)
self.urn[k] += 1 # Polya reinforcement
self.counts[k] = self.counts.get(k, 0) + 1
last = k
self.top4 = [k for k, _ in top_n(self.counts, 4)]
total_price = sum(products[k].price for k in self.top4) if self.top4 else 0.0
self.happy = (total_price <= self.wealth)
return last
class Market:
def __init__(self, K:int=8, n_agents:int=100, seed:int=42, base_wealth:float=10.0, wealth_jitter:float=2.0):
random.seed(seed)
self.products = make_products(K)
self.agents = [
Agent(
aid=i+1,
wealth=max(0.0, random.gauss(base_wealth, wealth_jitter)),
urn={j:1 for j in range(K)}, # one bead per SKU
counts={},
top4=[],
happy=False,
)
for i in range(n_agents)
]
self.t = 0
self._demand_rows: List[Dict] = []
def step(self, l:int=1):
self.t += 1
drawn = [a.step(self.products, l=l) for a in self.agents]
tick_counts = Counter(drawn)
for idx, cnt in sorted(tick_counts.items()):
p = self.products[idx]
self._demand_rows.append({"t": self.t, "sku_idx": idx, "sku": p.sku, "letter": p.letter, "count": cnt})
def grid_df(self) -> pd.DataFrame:
ids = list(range(1, len(self.agents)+1))
return pd.DataFrame({
"id": ids,
"happy": [a.happy for a in self.agents],
})
def top4_df(self) -> pd.DataFrame:
recs = []
for a in self.agents:
letters = [self.products[k].letter for k in a.top4]
price_sum = round(sum(self.products[k].price for k in a.top4), 2)
recs.append({"agent": a.aid, "top4": "".join(letters), "sum_price": price_sum, "happy": a.happy})
return pd.DataFrame(recs)
@property
def demand_ts(self) -> pd.DataFrame:
return pd.DataFrame(self._demand_rows) if self._demand_rows else pd.DataFrame(columns=["t","sku_idx","sku","letter","count"])
# ---------- factories ----------
def make_products(K:int=8) -> List[Product]:
letters = [chr(ord("A")+i) for i in range(K)]
prices = [round(1.0 + 0.5*i, 2) for i in range(K)]
return [Product(i, f"SKU{i+1:02d}", letters[i], prices[i]) for i in range(K)]
def reset_market(K:int, seed:int, wealth_mu:float, wealth_sd:float) -> Market:
return Market(K=K, n_agents=100, seed=seed, base_wealth=wealth_mu, wealth_jitter=wealth_sd)
# ---------- UI ----------
PERSON_PATH = (
"M1.7 -1.7h-0.8c0.3 -0.2 0.6 -0.5 0.6 -0.9c0 -0.6 "
"-0.4 -1 -1 -1c-0.6 0 -1 0.4 -1 1c0 0.4 0.2 0.7 0.6 "
"0.9h-0.8c-0.4 0 -0.7 0.3 -0.7 0.6v1.9c0 0.3 0.3 0.6 "
"0.6 0.6h0.2c0 0 0 0.1 0 0.1v1.9c0 0.3 0.2 0.6 0.3 "
"0.6h1.3c0.2 0 0.3 -0.3 0.3 -0.6v-1.8c0 0 0 -0.1 0 "
"-0.1h0.2c0.3 0 0.6 -0.3 0.6 -0.6v-2c0.2 -0.3 -0.1 "
"-0.6 -0.4 -0.6z"
)
app_ui = ui.page_fluid(
ui.h2("ABM Polya-Urn Demand (Mesa-lite)"),
ui.row(
ui.column(3,
ui.input_numeric("seed", "Seed", 7),
ui.input_slider("K", "Number of products K", min=4, max=12, value=8),
ui.input_slider("wealth_mu", "Wealth mean", min=2, max=30, value=10),
ui.input_slider("wealth_sd", "Wealth sd", min=0, max=10, value=2),
ui.input_numeric("l_draws", "Draws per tick (l)", 1, min=1, max=4),
ui.input_action_button("btn_reset", "Reset"),
ui.input_action_button("btn_step", "Step"),
ui.input_action_button("btn_10", "Run 10"),
ui.hr(),
ui.output_table("price_table"),
ui.hr(),
ui.output_table("last_tick_demand"),
),
ui.column(5,
ui.card(
ui.card_header("Agents on 10×10 grid (blue=happy, red=unhappy)"),
output_widget("grid_plot"),
),
ui.card(
ui.card_header("Demand time series by SKU (lines)"),
output_widget("demand_plot"),
),
),
ui.column(4,
ui.card(
ui.card_header("Top-4 per agent (after last tick)"),
ui.output_table("top4_table"),
),
)
),
)
# ---------- server ----------
def server(input: Inputs, output: Outputs, session: Session):
market = reactive.value(reset_market(K=8, seed=7, wealth_mu=10, wealth_sd=2))
@reactive.effect
@reactive.event(input.btn_reset)
def _reset():
market.set(reset_market(K=int(input.K()), seed=int(input.seed()),
wealth_mu=float(input.wealth_mu()), wealth_sd=float(input.wealth_sd())))
@reactive.effect
@reactive.event(input.btn_step)
def _step_once():
market().step(l=int(input.l_draws()))
@reactive.effect
@reactive.event(input.btn_10)
def _step_10():
for _ in range(10):
market().step(l=int(input.l_draws()))
@output
@render.table
def price_table():
prods = market().products
return pd.DataFrame({
"idx":[p.idx for p in prods],
"sku":[p.sku for p in prods],
"letter":[p.letter for p in prods],
"price":[p.price for p in prods],
})
@output
@render.table
def last_tick_demand():
ts = market().demand_ts
if ts.empty:
return pd.DataFrame(columns=["t","sku","letter","count"])
last_t = ts["t"].max()
return ts.loc[ts["t"]==last_t, ["t","sku","letter","count"]].sort_values(["letter"])
@output
@render.table
def top4_table():
return market().top4_df().sort_values(["happy","sum_price","agent"], ascending=[False, True, True])
@output
@render_altair
def grid_plot():
df = market().grid_df().assign(
status=lambda d: d["happy"].map({True: "Happy", False: "Unhappy"})
)
return (
alt.Chart(df)
.transform_calculate(row="ceil(datum.id/10)")
.transform_calculate(col="(datum.id - 1) % 10 + 1")
.mark_point(filled=True, size=60)
.encode(
alt.X("col:O").axis(None),
alt.Y("row:O").axis(None),
alt.ShapeValue(PERSON_PATH),
color=alt.Color(
"status:N",
scale=alt.Scale(domain=["Happy", "Unhappy"], range=["#3A86FF", "#E63946"]),
),
tooltip=["id", "status"],
)
.properties(width=420, height=420)
.configure_view(strokeWidth=0)
)
@output
@render_altair
def demand_plot():
ts = market().demand_ts
df = ts if not ts.empty else pd.DataFrame({"t": [], "count": [], "letter": []})
return (
alt.Chart(df)
.mark_line(interpolate="monotone")
.encode(
x="t:Q",
y="count:Q",
color="letter:N",
tooltip=["t", "letter", "count"],
)
.properties(width=460, height=300)
)
app = App(app_ui, server)
## file: requirements.txt
altair
anywidget
palmerpenguins
jsonschema
Citation
@online{bochman2025,
author = {Bochman, Oren},
title = {Demand from {Preferences} {Part} 1},
date = {2025-10-02},
url = {https://orenbochman.github.io/posts/2025/2025-09-10-Demand-From-Prefrences/},
langid = {en}
}