Two problems
I have thinking about two problems and it seems that they may be related in an interesting ways.
Demand from Preferences
The first involves developing a demand model based on a non-parametric model for preferences.
From ABM to MDP by inducing learning a reward function
I, along with over a million other students took an online course called model thinking. You can see my notes in the link. When I took the courses in the Reinforcement Learning specialization one of the point ,ade by Scott kept resonating again and again, which was that there is the following hierarchy for behavioral models:
- Rule based - In rule based models agent follow simple predetermined rules yet we often see how even one rule can lead to complex emergent behavior. Sugarscape is a classic example.
- Formal Models - Formal models are not explained as well in the course, but clearly they are a step up from rule based models in which the modeler has formalized using mathematical equations. The predator prey model which is governed by the Lotka-Volterra equations is a classic example. And it can be shown that the predator prey model is a special case of the more general replicator dynamics model from evolutionary game theory.
- Game Theoretic Models - in which we have agents have clearly defined actions and payoffs. The prisoners dilemma is such a game. One advantage of a game theoretic model is that we can use game theory to find the equilibria and use these for study the different possible strategies.
After learning about RL I came to believe that Game theoretical models often require infinite sets to define and that some RL algorithms can do thier magic with a very limited sample from such a game. In retrospect it is an important point though not very precise. RL requires MDP which have the same issues as Game Theoretical models at least as far as a a formal specification is involved. However the RL algorithms can find an optimal policy and at times with very limited resources. However RL was only considering single agents scenarios and I was just as interested in the strategic case of multiple agents.
The second is about how to convert an agent based model ABM into a markov decision process (MDP). If we can do that we can use an RL algorithm like Q-learning to find optimal policy for an agent and perhaps even consider how it relates to global optima that we may find by optimizing a global welfare function for many agents.
Some of the guest lectures in the RL courses I took discussed the issue of learning rewards. I later realized that both state transition and rewards were often something that were not known in advance. So this is a real problem and we may often be able to learn to approximate given sufficient samples of actual behavior.
More recently I learned about the Bayesian view of state space models. Here one can use algorithms like the Kalman Filter and DLM to learn to to predict future and past states based on noisy trajectories.
Models of demand
- In this post I will develop a Polya-Rubinstien microeconomic model of consumer demand. The model focuses on the aggregation of individual preferences.
In the next posts I will consider how to make the model more useful. So we can develop notions like substitutes and complements. For this we may want to add a clustering structure to the products perhaps extending the Polya urn to the closely related Chinese Restaurant Process. And to add another level of interest.
Although I have studied microeconomics as well as supply and demand curves, elasticity etc it seems that a view of demand based on a variance covariance matrix of demand functions or even as a precursor to a elasticity cross elasticity matrix can lead to inconsistency.
The … model of demand discusses thier model which is supposedly consistent.
I recall however that Ariel Rubinstien’s Book on microeconomic theory starts with the concept of preferences then builds it into utility. So perhaps if we can model preferences we could be on the way to a more consistent model of demand.
We will see that the Polya urn model is consistent with the Rubinstien axioms of preferences that make the preferences a rational ordering.
i.e.
- for any two products the consumer to be able to state a preference of one over the other or indifference.
- there is no order effect
- transitivity.
p.s. one issues with preferences is how do they map to price. I guess we will need to deal with that too.
- There are k (10) products in the market each with a price and some id.
- We have a square grid of side L with agents
- Agent start with an endowment and some preference over products
- Preferences are based on numbers of colored balls in an agent’s Polya urn. Initially there is just one ball per product.
- Each turn the agent draws a random colored ball from the urn but returns 2 such colored balls. This reinforces the agent’s preference for the product associated with the ball.
Ok so we have a process that creates heterogenous preferences that reinforce over time.
That is a good start but we see that if we aggregate we get a very steady uniform demand.
We may want to see the welfare of the agents. How often can they satisfy their preference ?
To understand behavior be need to map preferences to a budget allocation. We have imposed a budget constraint on the agent by setting an individual endowment. Now we want to see how they will allocate their budget based on their preferences.
One idea is that should attempt to maximize their utility given their budget constraint.
But I talk about utility informally.
The may however assume that the agents will will want to buy as products in proportion to thier their preferences. Also we may also suppose that for these agents having more is better than having less so they will want to maximize their consumption.
So here we see that the urn meets the market.
sku | count | price |
---|---|---|
1 | 4 | 30 |
2 | 3 | 20 |
3 | 2 | 2 |
they would want to buy a market basket that is the maximal largest multiple of within their budget.
3 sku_1 + 2 sku_2 + 1 sku_3
Q_i = \arg \max_k k \times \sum count_i \times price_i \leq budget
One important edge case is what will they do if they want more than they can afford. In this case preferences will be truncated to what they can afford.
say they have a budget of 30 should they buy
one of thier #1 preference or one of thier #2 preference and five of their #3 preference.
demand. If I like chicken twice as much as rice and I have some budget how much would I allocate to each product?
- Agent than buy the top preferred products and gain a happiness score in to
- number of top k-products they can afford
- Prices for
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 600
#| components: [viewer] #[editor, viewer]
_='''{.markdown}
from __future__ import annotations
## Tasks:
'''
# app.py — ShinyLive-ready ABM Polya-Urn demo (Mesa-lite core) + Altair
from dataclasses import dataclass
from typing import Dict, List, Tuple
import random
from collections import Counter
import pandas as pd
import altair as alt
from shiny import App, Inputs, Outputs, Session, reactive, render, ui
from shinywidgets import render_altair, output_widget
import mesa
from mesa import Model, Agent, space
### Two NonParametric Helper Models (Polya Urn and Hoppe Urn)
class PolyaUrn:
"""A Polya urn model for generating reinforcing preferences based on a Dirichlet process.
Note:
- Since products are generated by a CRP process (Hoppe Urn), we need to support adding new products via a draw_new_product() method.
- We may consider using Moran steps to model a drift from many products to a few products, i.e. we draw a ball replacing it with a ball of another color based on the urn's diriclet distribution.
or we may drift according to some exogenous process. e.g. advertising or social influence. We may do this once or until
we converge to some top-k products.
Attributes:
alpha (float): The reinforcement parameter.
num_products (int): The number of products (colors).
urn (Dict[int, int]): A dictionary representing the urn with product indices as keys and bead counts as values.
"""
def __init__(self, alpha: int = 1, num_products: int = 1):
self.alpha = alpha
self.num_products = num_products
self.urn = {i: 1 for i in range(num_products)} # one bead per product
def draw(self) -> int:
"""Draw a ball from the urn and reinforce."""
total = sum(self.urn.values())
probs = [self.urn[i] / total for i in range(self.num_products)]
drawn = random.choices(list(self.urn.keys()), weights=probs, k=1)[0]
self.urn[drawn] += 1 # Reinforce
return drawn
def draw_new_product(self) -> int:
"""Draw a new product (color) from the urn."""
new_product_idx = self.num_products
self.urn[new_product_idx] = 1 # Add new product with one bead
self.num_products += 1
return new_product_idx
class HoppeUrn:
"""A Hoppe urn model for generating new products.
Note:
The current model assumes all products are independent.
We will later extend this to support the addition of product categories and affinity between products, allowing us to model substitutes and complements.
We may also want to assign products to Maslow's hierarchy of needs.
Ideally, though, the structure of the product space should be easily accessible from this Model.
"""
def __init__(self, initial_products: int = 2, innovation: float = 1.0):
self.urn = {i: 1 for i in range(initial_products)} # one bead per product
self.inovation = innovation
print(f"initial hoppe urn {self.urn}")
def draw(self) -> int:
""" Draw a ball from the urn and reinforce."""
total = sum(self.urn.values())
probs = [self.urn[i] / total for i in self.urn.keys()]
drawn = random.choices(list(self.urn.keys()), weights=probs, k=1)[0]
if drawn == 0: # "new product"
new_product_idx = max(self.urn.keys()) + 1
self.urn[new_product_idx] = 1 # Add new product with one bead
#drawn = new_product_idx
else:
self.urn[drawn] += 1 # Reinforce
return drawn
### Mesa Agent code - Dirichlet Process for preferences
class DemandAgent(mesa.Agent):
"""An agent with fixed initial wealth and a Polya urn for its preferences."""
def __init__(self, model):
# Pass the parameters to the parent class.
super().__init__(model)
# Create the agent's attribute and set the initial values.
self.endowment = random.uniform(5, 15) # Random initial wealth
self.wealth = self.endowment
self.urn = PolyaUrn()
def say_hi(self):
# The agent's step will go here.
# For demonstration purposes we will print the agent's unique_id
#print(f"Hi, I am an agent, you can call me {self.unique_id!s}.")
if self.model.draw == 0:
self.urn.draw_new_product()
drawn = self.urn.draw()
print(f"Agent {self.unique_id!s:2} drew: {drawn} | prefrences: {self.urn.urn.values()}")
### Server code
class DemandModel(mesa.Model):
"""A model with some number of agents."""
def __init__(self, n,counter, innovation: float = 1.0, seed=None):
super().__init__(seed=seed)
self.num_agents = n
self.counter=counter
self.urn=HoppeUrn(innovation=innovation)
# Create n agents
DemandAgent.create_agents(model=self, n=n)
self.ticks=0
self.draw=None
self.num_products_history = [len(self.urn.urn)] # <-- track unique products over time
def step(self):
""" Advance the model by one step."""
# This function pseudo-randomly reorders the list of agent objects and
# then iterates through calling the function passed in as the parameter
self.draw = self.urn.draw()
self.ticks +=1
print(f"drew: {self.draw} | products : {self.urn.urn.values()}")
self.agents.shuffle_do("say_hi")
self.num_products_history.append(len(self.urn.urn)) # <-- record after each step
### Mesa entry points
### Shiny UI
### Step button
### Shiny APP
from shiny.express import input, render, ui
## UI
ui.input_slider("n", "Total Agants", 1, 100, 5)
ui.input_slider("theta", "Innovation", 0, 100, 1)
ui.input_action_button("btnStep", "Step")
ui.input_action_button("btnReset", "Reset")
ui.tags.br()
model_val = reactive.Value(None)
counter_val = reactive.Value(0)
def build_model():
# Build a fresh model using the current UI and counter
return DemandModel(n=input.n(), counter=counter_val.get(), innovation=input.theta()/100.0)
@reactive.calc
def current_model():
return model_val.get()
@reactive.effect
@reactive.event(input.btnReset)
def reset_sim():
print("resetting")
# Bump a counter if you want to track resets
counter_val.set(counter_val.get() + 1)
# Create and store a fresh model
model_val.set(build_model())
starter_model=model_val.get()
print(f"model version {starter_model.counter}")
# Do an initial step if desired
@reactive.effect
@reactive.event(input.btnStep)
def sim_step():
if model_val.get() is None:
model_val.set(build_model())
starter_model=model_val.get()
print(f"model version {starter_model.counter}")
model_val.get().step()
# --- helpers ---------------------------------------------------------------
def prefs_df(model) -> pd.DataFrame:
"""Rows: agent, product, count."""
if model is None:
return pd.DataFrame(columns=["agent", "product", "count"])
rows = []
for ag in model.agents:
for prod, cnt in ag.urn.urn.items():
rows.append({"agent": str(ag.unique_id), "product": str(prod), "count": cnt})
return pd.DataFrame(rows)
# --- UI slot ---------------------------------------------------------------
ui.h3("Agent preference shares")
#output_widget("pref_chart")
# --- Server render ---------------------------------------------------------
@render_altair
@reactive.event(input.btnStep)
def pref_chart():
m = current_model()
df = prefs_df(m)
if df.empty:
return alt.Chart(pd.DataFrame({"x": []})).mark_text().encode() # harmless placeholder
# normalized, stacked bars like the barley example
return (
alt.Chart(df)
.mark_bar()
.encode(
x=alt.X("sum(count):Q", stack="normalize", title="Preference share"),
y=alt.Y("agent:N", sort="-x", title="Agent"),
color=alt.Color("product:N", title="Product"),
tooltip=["agent:N", "product:N", "count:Q"]
)
.properties(height=400)
)
# --- helpers ---------------------------------------------------------------
import numpy as np
def kn_live_df(model) -> pd.DataFrame:
if model is None or not getattr(model, "num_products_history", None):
return pd.DataFrame(columns=["t", "K"])
hist = model.num_products_history
return pd.DataFrame({"t": np.arange(len(hist)), "K": hist})
def kn_theory_df(nmax: int, alphas=(0.1, 0.3, 1.0, 3.0, 10.0)) -> pd.DataFrame:
t = np.arange(1, max(2, nmax) + 1)
parts = []
for a in alphas:
parts.append(pd.DataFrame({"t": t, "alpha": str(a), "E[K_t]": a * np.log(t)}))
return pd.concat(parts, ignore_index=True)
# --- UI slot ---------------------------------------------------------------
ui.h3("Unique products over time: theory vs. live")
#output_widget("kn_chart")
# --- Server render ---------------------------------------------------------
@render_altair
@reactive.event(input.btnStep)
def kn_chart():
m = current_model()
nmax = (m.ticks if m else 200) or 200
df_theory = kn_theory_df(nmax)
base = (
alt.Chart(df_theory)
.mark_line()
.encode(
x=alt.X("t:Q", title="Steps (t)"),
y=alt.Y("E[K_t]:Q", title="Unique products K_t"),
color=alt.Color("alpha:N", title="α (innovation)")
)
)
df_live = kn_live_df(m)
live_layer = (
alt.Chart(df_live)
.mark_line(point=True)
.encode(
x="t:Q",
y="K:Q",
tooltip=["t:Q", "K:Q"]
)
.properties(title="Live K_t")
)
return (base + live_layer).properties(height=350)
## file: requirements.txt
altair
anywidget
jsonschema
mesa
Citation
@online{bochman2025,
author = {Bochman, Oren},
title = {Demand from {Preferences} {Part} 3},
date = {2025-10-02},
url = {https://orenbochman.github.io/posts/2025/2025-09-10-Demand-From-Prefrences-3/},
langid = {en}
}