Appendix N — Appendix: Wold’s theorem

Appendix

This appendix explains Wold’s theorem, which provides a representation of stationary time series.
Probability and Statistics
Author

Oren Bochman

Keywords

Wold’s theorem, stationary time series, ARMA representation, causal representation, orthogonal innovations, linear algebra

N.1 Wold’s theorem - (extra curricular) circa 1939

Whenever we talk about innovation in the TS course we are alluding to Wold’s theorem, which is a fundamental result in time series analysis that provides a representation of stationary time series as a sum of deterministic and stochastic components.

I researched this appendix to understand the Moving Average (MA) representation of the AR(p) process, which is a key result in this course. I think that this short appendix help provide a historical perspective on the development of time series analysis and helps other students put this and some other more esoteric concepts within a historical rather than seeing them in a purely mathematical context.

I found that this approach helped me to understand and remember many results in a number of fields like geometry Ostermann and Wanner (2012), analysis Hairer and Wanner (2008), Ebbinghaus and Ewing (1991), complex analysis Remmert and Burckel (2012) integration Bressoud (2008) and algebra Golan (2012)

In the 1920s George Udny Yule and Soviet economists Eugen Slutsky were researching time series and they came up with two different ways to represent a time series.

  • Yule’s researches led to the notion of the autoregressive scheme. \begin{aligned} Y_{t} & = \sum _{j=1}^{p} \phi _{j} Y_{t-j} + u_{t} \end{aligned} \tag{N.1}

  • Slutsky’s researches led to the notion of a moving average scheme. \begin{aligned} Y_{t} & =\sum _{j=0}^{q} \theta _{j} u_{t-j} \end{aligned} \tag{N.2}

we can use the two schemes together and get the ARMA(p,q) model:

\begin{aligned} Y_{t} & = \sum _{j=1}^{p} \phi _{j} Y_{t-j} + u_{t} + \sum _{j=0}^{q} \theta _{j} u_{t-j} \end{aligned} \tag{N.3}

where:

The following is due in part to Wikipedia contributors (2025)

Wold’s decomposition AKA called the Wold representation theorem states that:

Every covariance-stationary time series Y_{t} can be written as the sum of two time series, one deterministic and one stochastic.

Formally:

\begin{aligned} Y_{t} & =\sum _{j=0}^{\infty } \underbrace{b_{j}\varepsilon _{t-j}}_{\text{stochastic}} + \underbrace{\eta _{t}}_{\text{deterministic}} \\ &= \sum _{j=0}^{\infty } b_{j}\varepsilon _{t-j} + \phi_{j} y_{t-j} \end{aligned}

where:

  • {Y_{t}} is the time series being considered,
  • {\varepsilon _{t}} is an white noise sequence called innovation process that acts as an input to the linear filter {\{b_{j}\}}.
  • {b} is the possibly infinite vector of moving average weights (coefficients or parameters)
  • {\eta _{t}} is a “deterministic” time series, in the sense that it is completely determined as a linear combination of its past values It may include “deterministic terms” like sine/cosine waves of {t}, but it is a stochastic process and it is also covariance-stationary, it cannot be an arbitrary deterministic process that violates stationarity.

The moving average coefficients have these properties:

  1. Stable, that is, square summable \sum _{j=1}^{\infty } \mid b_{j}|^{2} < \infty
  2. Causal (i.e. there are no terms with j < 0)
  3. Minimum delay
  4. Constant (b_j independent of t)
  5. It is conventional to define b_0=1

Any stationary process has this seemingly special representation. Not only is the existence of such a simple linear and exact representation remarkable, but even more so is the special nature of the moving average model.

This result is used without stating its name in the course when we are show the AR(p) representation in terms of moving averages.