Demand from Preferences Part 1 – Oren Bochman’s Blog

In this series of posts I will explore an approach to modeling consumer demand I had some years back after working on pricing.

A couple of reasons made me come back to this problem after a long hiatus. First I have gotten interested in bayesian non-parametric models like the Dirichlet process, Stick breaking, the Chinese Restaurant Process. I also got learned a lot about Agent Based Models (ABM), these are simulation in which we can easily capture heterogeneity and local interactions that are not easy to capture in more traditional equation based models.

Putting these two tools together lets you think of models that are not only more flexible and realistic but probably quite different to think about without these tools.

Challenges of Multi-arm Bandit problems

you only see the response to the arm you pull and you are the way the problem is setup you are much more likely to pull the wrong arms until you have solved the problem. If you could see the real response you would be able to learn much faster. So if we want to evaluate different predictive models or different pricing strategies, or taxation schemes in other words different hypotheses we would be better placed to evaluate them if we were operating in a closed world where we could see the response to all actions not just the ones we take.

Forecasting demand can be challenging. It is similar to a bandit problem. You only get to see the reaction to the prices you set. Also the market is reacting to everything else that is going on in the world so you don’t really know if the response you see is due to your actions or something else.

There is lots of theory on demand of course but when it comes to practice even the best models are not very good at keeping up this what the theory prescribes. The theory in economics came from practitioners in other fields like biology and seems to me rather odd. I have often wondered if there are simpler and better ways to think about demand etc. However getting back to this problem, what we often need to do is test how our methods work on a simulated market - i.e. one which we could know the ground truth and not just the response to our actions.

Simulations a closed world approach to ground truth

The problem with developing a simulation is that you have to bake in some assumption like how you simplify reality by dropping unimportant details while keeping the important ones. Problem is who can say what is truly important and what is not. I was challenged more than once by a colleague who asked me “how do you know your model is right ?”. This was a rude awakening for me. My answer was along the lines of

“When an army goes to a battle they are only as good as their practice and training. If they have bad intelligence they will struggle but if they couldn’t hit a target or reach a goal in training it too much to expect them to do so in a real battle”.

The models suffered from noisy data, inflated zeros, due to competition, stock outs, scare stories in the media, bird flue or hoof and mouth disease rummers? Frequent periods that brake with the most basic of theoretical assumption like downward sloping demand curves. E.g. rollup candy which was trending on social media and got 1000x demand or a consumer embargo on cottage cheese due to price hikes.

Sparsity in price changes for most product because most retailers only have capacity to relabel a few products each month.

Scaling challenges and causality issues. (Not being able to control for confounding factors). e.g. when price and quantity are both affected by a moderating variable like seasonality. Oh and Seasonality and changing trends.

Some products like refrigerators are consumed once in a decade. Others like chicken are consumed daily but if there is a big enough promotion you may see people stuff their freezer full of chicken and not buy any for months.

And we haven’t covered issues like income effects, cannibalization, competition, advertising, recommendation sites, social media and influencer, geophysical factors like location, macroeconomic factors like inflation, marketing, behavioral economics issues, and the list goes on and on

Put all of these in the model and wont fit for for more than a handful of products. Leave one out and you will get crazy results on many cases if you are predicting demand at scale.

Ideally one should have a model that is robust to many of these issues, that can handle different cases but more significantly one that can make the least assumption about the each case but support growing complexity if it exists in the data. And having safety rails in the form of uncertainty estimates and regularizing priors are also a useful

Game Theory

Getting back to the problem at hand. One of the modern approaches to microeconomics has been placing it on a firm theoretical foundation based on game theory. It still worth recalling that compared with economics, game theory is relatively new. In fact it was only in the 1940s that John von Neumann and Oskar Morgenstern published their book “Theory of Games and Economic Behavior” which laid the foundations for game theory as a mathematical discipline. So while it got started with economic behavior lots of aspects of game theory are still being worked out and don’t always make sense or make very good predictions. This is unfortunate because this is perhaps the best tool we have to make economic more rigorous. Anyhow my take away are:

Go for simple models they also tend to have far wider applications
Take the results with a grain of salt and use them as a guide not a bible

RL

In practice I learned game theory first but than I was introduced to reinforcement learning (RL). This generally solves the simpler field of decision theory - i.e. single agent scenarios. I found RL to be more intuitive and practical. It also has a lot of algorithms that can be used to solve problems. And while game theory introduces many challenging strategic scenarios, RL can find optimal policies for an agent in environments so complex we don’t see game theorist tackling them.

I mean chess is a classic game considered by any introductory game theory book yet they have very little to say about solving it. But RL gave us AlphaZero which can learn to play chess at a superhuman level in a few hours. Backgammon is another classic game that was gave superhuman performance by RL algorithms like TD-Gammon.

Two related problems

When it comes to RL, and when we want to understand demand based on preferences we want to understand how preferences map to actions and more importantly how individual actions aggregate to market level demand curves with the properties we observe in the real world.

But for the sake of this post I think we want to consider the most parsimonious model that can capture the essence of demand from preferences. Why start with preferences, because lots of economic theory assumptions arise from from how we aggregate individual preferences, so we need to start before that step and make aggregation depend on parameters we can control.

How does this tie into RL though ?

The problem in converting an ABM to an MDP is that we need to transition from a rule based system to one that has

a state space.
actions with probabilities of transitions.
reward signals.

In ABM we usually embody 2 using rules. The state space is often implicit and is explicitly represented in at two level, the environment and the agents.

So while it takes some work, we can extract a state space with actions and transitions. There is no guarantee that we get the Markov property i.e future states depend depend on the past condition only on the present state. But for micro-economics we should be fine. Also we can use a family of related models to learn much about the state space if we can collect data in the form of a replay buffer of past states and actions. I am referring to trifecta of HMM algorithms consiting of:

Viterbi to find most likely sequence of hidden states given the observation
Baum-Welch to estimate transition and emision probabilities and the
Forward–backward algorithm which computes the ~~posterior marginals~~ probabilities of all hidden state variables given a sequence of observations/emissions. AKA smoothing.

If one has access to the dynamics we could use a kalman filter or a a baysian filter or a dynamic linear model (DLM).

So the real challenge is the reward function. In RL we often assume that the reward function is given. But in practice it is challenging to to define a reward function.

And this is where preferences come in. If we can model preferences we can use them to define a reward function.

I have thinking about two problems and it seems that they may be related in an interesting ways.

Demand from Preferences

The first involves developing a demand model based on a non-parametric model for preferences.

From ABM to MDP by inducing learning a reward function

I, along with over a million other students took an online course called model thinking. You can see my notes in the link. When I took the courses in the Reinforcement Learning specialization one of the point ,ade by Scott kept resonating again and again, which was that there is the following hierarchy for behavioral models:

Rule based - In rule based models agent follow simple predetermined rules yet we often see how even one rule can lead to complex emergent behavior. Sugarscape is a classic example.
Formal Models - Formal models are not explained as well in the course, but clearly they are a step up from rule based models in which the modeler has formalized using mathematical equations. The predator prey model which is governed by the Lotka-Volterra equations is a classic example. And it can be shown that the predator prey model is a special case of the more general replicator dynamics model from evolutionary game theory.
Game Theoretic Models - in which we have agents have clearly defined actions and payoffs. The prisoners dilemma is such a game. One advantage of a game theoretic model is that we can use game theory to find the equilibria and use these for study the different possible strategies.

After learning about RL I came to believe that Game theoretical models often require infinite sets to define and that some RL algorithms can do thier magic with a very limited sample from such a game. In retrospect it is an important point though not very precise. RL requires MDP which have the same issues as Game Theoretical models at least as far as a a formal specification is involved. However the RL algorithms can find an optimal policy and at times with very limited resources. However RL was only considering single agents scenarios and I was just as interested in the strategic case of multiple agents.

The second is about how to convert an agent based model ABM into a markov decision process (MDP). If we can do that we can use an RL algorithm like Q-learning to find optimal policy for an agent and perhaps even consider how it relates to global optima that we may find by optimizing a global welfare function for many agents.

Some of the guest lectures in the RL courses I took discussed the issue of learning rewards. I later realized that both state transition and rewards were often something that were not known in advance. So this is a real problem and we may often be able to learn to approximate given sufficient samples of actual behavior.

More recently I learned about the Bayesian view of state space models. Here one can use algorithms like the Kalman Filter and DLM to learn to to predict future and past states based on noisy trajectories.

Models of demand

In this post I will develop a Polya-Rubinstien microeconomic model of consumer demand. The model focuses on the aggregation of individual preferences.

In the next posts I will consider how to make the model more useful. So we can develop notions like substitutes and complements. For this we may want to add a clustering structure to the products perhaps extending the Polya urn to the closely related Chinese Restaurant Process. And to add another level of interest.

Although I have studied microeconomics as well as supply and demand curves, elasticity etc it seems that a view of demand based on a variance covariance matrix of demand functions or even as a precursor to a elasticity cross elasticity matrix can lead to inconsistency.

The … model of demand discusses thier model which is supposedly consistent.

I recall however that Ariel Rubinstien’s Book on microeconomic theory starts with the concept of preferences then builds it into utility. So perhaps if we can model preferences we could be on the way to a more consistent model of demand.

We will see that the Polya urn model is consistent with the Rubinstien axioms of preferences that make the preferences a rational ordering.

i.e.

for any two products the consumer to be able to state a preference of one over the other or indifference.
there is no order effect
transitivity.

p.s. one issues with preferences is how do they map to price. I guess we will need to deal with that too.

There are k (10) products in the market each with a price and some id.
We have a square grid of side L with agents
Agent start with an endowment and some preference over products
Preferences are based on numbers of colored balls in an agent’s Polya urn. Initially there is just one ball per product.
Each turn the agent draws a random colored ball from the urn but returns 2 such colored balls. This reinforces the agent’s preference for the product associated with the ball.

Ok so we have a process that creates heterogenous preferences that reinforce over time.

That is a good start but we see that if we aggregate we get a very steady uniform demand.

We may want to see the welfare of the agents. How often can they satisfy their preference ?

To understand behavior be need to map preferences to a budget allocation. We have imposed a budget constraint on the agent by setting an individual endowment. Now we want to see how they will allocate their budget based on their preferences.

One idea is that should attempt to maximize their utility given their budget constraint.

But I talk about utility informally.

The may however assume that the agents will will want to buy as products in proportion to thier their preferences. Also we may also suppose that for these agents having more is better than having less so they will want to maximize their consumption.

So here we see that the urn meets the market.

sku	count	price
1	4	30
2	3	20
3	2	2

they would want to buy a market basket that is the maximal largest multiple of within their budget.

3 sku_1 + 2 sku_2 + 1 sku_3

Q_i = \arg \max_k k \times \sum count_i \times price_i \leq budget

One important edge case is what will they do if they want more than they can afford. In this case preferences will be truncated to what they can afford.

say they have a budget of 30 should they buy

one of thier #1 preference or one of thier #2 preference and five of their #3 preference.

demand. If I like chicken twice as much as rice and I have some budget how much would I allocate to each product?

Agent than buy the top preferred products and gain a happiness score in to
1. number of top k-products they can afford
Prices for

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 600
#| components: [viewer] #[editor, viewer]
_='''{.markdown} 
from __future__ import annotations
## Tasks:

'''

# app.py — ShinyLive-ready ABM Polya-Urn demo (Mesa-lite core) + Altair 


from dataclasses import dataclass
from typing import Dict, List, Tuple
import random
from collections import Counter

import pandas as pd
import altair as alt

from shiny import App, Inputs, Outputs, Session, reactive, render, ui
from shinywidgets import render_altair, output_widget

import mesa
from mesa import Model, Agent, space

### Two NonParametric Helper Models (Polya Urn and Hoppe Urn)

class PolyaUrn:
    """A Polya urn model for generating reinforcing preferences based on a Dirichlet process.
       
       Note: 
    
       - Since products are generated by a CRP process (Hoppe Urn), we need to support adding new products via a draw_new_product() method.

       - We may consider using Moran steps to model a drift from many products to a few products, i.e. we draw a ball replacing it with a ball of another color based on the urn's diriclet distribution.
       or we may drift according to some exogenous process. e.g. advertising or social influence. We may do this once or until
       we converge to some top-k products.


    Attributes:
        alpha (float): The reinforcement parameter.
        num_products (int): The number of products (colors).
        urn (Dict[int, int]): A dictionary representing the urn with product indices as keys and bead counts as values.

    """

    def __init__(self, alpha: int = 1, num_products: int = 1):
        self.alpha = alpha
        self.num_products = num_products
        self.urn = {i: 1 for i in range(num_products)}  # one bead per product

    def draw(self) -> int:
        """Draw a ball from the urn and reinforce."""
        total = sum(self.urn.values()) 
        probs = [self.urn[i] / total for i in range(self.num_products)]
        drawn = random.choices(list(self.urn.keys()), weights=probs, k=1)[0]
        self.urn[drawn] += 1  # Reinforce
        return drawn

    def draw_new_product(self) -> int:
        """Draw a new product (color) from the urn."""
        new_product_idx = self.num_products
        self.urn[new_product_idx] = 1  # Add new product with one bead
        self.num_products += 1
        return new_product_idx

class HoppeUrn:
    """A Hoppe urn model for generating new products.
       
       Note: 
       The current model assumes all products are independent.
       
       We will later extend this to support the addition of product categories and affinity between products, allowing us to model substitutes and complements.
       We may also want to assign products to Maslow's hierarchy of needs.

       Ideally, though, the structure of the product space should be easily accessible from this Model.
    """

    def __init__(self, initial_products: int = 2, innovation: float = 1.0):
        
        self.urn = {i: 1 for i in range(initial_products)}  # one bead per product
        self.inovation = innovation
        print(f"initial hoppe urn {self.urn}")

    def draw(self) -> int:
        """ Draw a ball from the urn and reinforce."""
        total = sum(self.urn.values())
        probs = [self.urn[i] / total for i in self.urn.keys()]
        drawn = random.choices(list(self.urn.keys()), weights=probs, k=1)[0]
        if drawn == 0:  # "new product" 
            new_product_idx = max(self.urn.keys()) + 1
            self.urn[new_product_idx] = 1  # Add new product with one bead
            #drawn = new_product_idx
        else:
            self.urn[drawn] += 1  # Reinforce
        return drawn

### Mesa Agent code - Dirichlet Process for preferences

class DemandAgent(mesa.Agent):
    """An agent with fixed initial wealth and a Polya urn for its preferences."""

    def __init__(self, model):
        # Pass the parameters to the parent class.
        super().__init__(model)

        # Create the agent's attribute and set the initial values.
        self.endowment = random.uniform(5, 15)  # Random initial wealth 
        self.wealth = self.endowment  
        self.urn = PolyaUrn()

    def say_hi(self):
        # The agent's step will go here.
        # For demonstration purposes we will print the agent's unique_id
        #print(f"Hi, I am an agent, you can call me {self.unique_id!s}.")
        if self.model.draw == 0:
            self.urn.draw_new_product()
        
        drawn = self.urn.draw()
        print(f"Agent {self.unique_id!s:2} drew: {drawn} | prefrences: {self.urn.urn.values()}")


### Server code

class DemandModel(mesa.Model):
    """A model with some number of agents."""

    def __init__(self, n,counter, innovation: float = 1.0, seed=None):
        super().__init__(seed=seed)
        self.num_agents = n
        self.counter=counter
        self.urn=HoppeUrn(innovation=innovation)
        # Create n agents
        DemandAgent.create_agents(model=self, n=n)
        self.ticks=0
        self.draw=None
        self.num_products_history = [len(self.urn.urn)]  # <-- track unique products over time

    def step(self):
        """ Advance the model by one step."""
        # This function pseudo-randomly reorders the list of agent objects and
        # then iterates through calling the function passed in as the parameter
        self.draw  = self.urn.draw()
        self.ticks +=1
        print(f"drew: {self.draw} | products : {self.urn.urn.values()}")
        self.agents.shuffle_do("say_hi")
        self.num_products_history.append(len(self.urn.urn))  # <-- record after each step

    
### Mesa entry points

### Shiny UI

### Step button



### Shiny APP

from shiny.express import input, render, ui


## UI
ui.input_slider("n", "Total Agants", 1, 100, 5)
ui.input_slider("theta", "Innovation", 0, 100, 1)


ui.input_action_button("btnStep", "Step")
ui.input_action_button("btnReset", "Reset")
ui.tags.br()

model_val = reactive.Value(None)
counter_val = reactive.Value(0)

def build_model():
    # Build a fresh model using the current UI and counter
    return DemandModel(n=input.n(), counter=counter_val.get(), innovation=input.theta()/100.0)

@reactive.calc
def current_model():
    return model_val.get()


@reactive.effect
@reactive.event(input.btnReset)
def reset_sim():
    print("resetting")
    # Bump a counter if you want to track resets
    counter_val.set(counter_val.get() + 1)
    # Create and store a fresh model
    model_val.set(build_model())
    starter_model=model_val.get()
    print(f"model version {starter_model.counter}")   
    # Do an initial step if desired

@reactive.effect 
@reactive.event(input.btnStep)
def sim_step(): 
    if model_val.get() is None:
        model_val.set(build_model())
    starter_model=model_val.get()
    print(f"model version {starter_model.counter}")   
    model_val.get().step()

# --- helpers ---------------------------------------------------------------
def prefs_df(model) -> pd.DataFrame:
    """Rows: agent, product, count."""
    if model is None:
        return pd.DataFrame(columns=["agent", "product", "count"])
    rows = []
    for ag in model.agents:
        for prod, cnt in ag.urn.urn.items():
            rows.append({"agent": str(ag.unique_id), "product": str(prod), "count": cnt})
    return pd.DataFrame(rows)

# --- UI slot ---------------------------------------------------------------
ui.h3("Agent preference shares")
#output_widget("pref_chart")

# --- Server render ---------------------------------------------------------

@render_altair
@reactive.event(input.btnStep)
def pref_chart():
    m = current_model()
    df = prefs_df(m)
    if df.empty:
        return alt.Chart(pd.DataFrame({"x": []})).mark_text().encode()  # harmless placeholder

    # normalized, stacked bars like the barley example
    return (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=alt.X("sum(count):Q", stack="normalize", title="Preference share"),
            y=alt.Y("agent:N", sort="-x", title="Agent"),
            color=alt.Color("product:N", title="Product"),
            tooltip=["agent:N", "product:N", "count:Q"]
        )
        .properties(height=400)
    )


# --- helpers ---------------------------------------------------------------
import numpy as np

def kn_live_df(model) -> pd.DataFrame:
    if model is None or not getattr(model, "num_products_history", None):
        return pd.DataFrame(columns=["t", "K"])
    hist = model.num_products_history
    return pd.DataFrame({"t": np.arange(len(hist)), "K": hist})

def kn_theory_df(nmax: int, alphas=(0.1, 0.3, 1.0, 3.0, 10.0)) -> pd.DataFrame:
    t = np.arange(1, max(2, nmax) + 1)
    parts = []
    for a in alphas:
        parts.append(pd.DataFrame({"t": t, "alpha": str(a), "E[K_t]": a * np.log(t)}))
    return pd.concat(parts, ignore_index=True)

# --- UI slot ---------------------------------------------------------------
ui.h3("Unique products over time: theory vs. live")
#output_widget("kn_chart")

# --- Server render ---------------------------------------------------------
@render_altair
@reactive.event(input.btnStep)
def kn_chart():
    m = current_model()
    nmax = (m.ticks if m else 200) or 200

    df_theory = kn_theory_df(nmax)
    base = (
        alt.Chart(df_theory)
        .mark_line()
        .encode(
            x=alt.X("t:Q", title="Steps (t)"),
            y=alt.Y("E[K_t]:Q", title="Unique products K_t"),
            color=alt.Color("alpha:N", title="α (innovation)")
        )
    )

    df_live = kn_live_df(m)
    live_layer = (
        alt.Chart(df_live)
        .mark_line(point=True)
        .encode(
            x="t:Q",
            y="K:Q",
            tooltip=["t:Q", "K:Q"]
        )
        .properties(title="Live K_t")
    )

    return (base + live_layer).properties(height=350)


## file: requirements.txt
altair
anywidget
jsonschema
mesa

Citation

BibTeX citation:

@online{bochman2025,
  author = {Bochman, Oren},
  title = {Demand from {Preferences} {Part} 1},
  date = {2025-10-02},
  url = {https://orenbochman.github.io/posts/2025/2025-09-10-Demand-From-Prefrences-1/},
  langid = {en}
}

For attribution, please cite this work as:

Bochman, Oren. 2025. “Demand from Preferences Part 1.” October 2, 2025. https://orenbochman.github.io/posts/2025/2025-09-10-Demand-From-Prefrences-1/.