Appendix K — Appendix: Yule-Walker Equations & Durbin-Levinson Recursion

Durbin-Levinson recursion is a method in linear algebra for computing the solution to an equation involving a Toeplitz matrix AKA a diagonal-constant matrix where descending diagonals are constant. The recursion runs in O(n^2) time rather then O(n^3) time required by Gauss-Jordan elimination.

In the course on Bayesian time series analysis, the Professor mentions the Durbin-Levinson recursion several times without explaining what this is. It is a shame as it is a very elegant bit of linear algebra for solving the Yule-Walker equations more efficiently. I tried to find a good explanation in the context of the course, However I wrote some notes that can help you understand this topic. One final point is that Durbin-Levinson recursion is not the last word on solving this system of equations. There are today numerous improvements which are both faster and more numerically stable!

K.1 Durbin-Levinson recursion

Like me, you might be curious about the Durbin-Levinson recursion mentioned above. This is not covered in the course, and turned out to be an enigma wrapped in a mystery.

I present my finding in the note below - much of it is due to (Wikipedia contributors 2024b) and (Wikipedia contributors 2024a)

In (Yule 1927) and (Walker 1931), Yule and Walker proposed a method for estimating the parameters of an autoregressive model. The method is based on the Yule-Walker equations which are a set of linear equations that can be used to estimate the parameters of an autoregressive model.

Due to the autoregressive structure of the model, the matrix for these equations is sparse and of a well known form called a Toeplitz matrix. \begin{array}{c} {\displaystyle \qquad {\begin{bmatrix}a&b&c&d&e\\f&a&b&c&d\\g&f&a&b&c\\h&g&f&a&b\\i&h&g&f&a\end{bmatrix}}.} \end{array} \tag{K.1}

In the 1930s, Yule and Walker would have had to solve these equations using Gauss-Jordan elimination which has an O(n^3) time complexity.

This is where Durbin and Levinson come in. A decade or two later in (Levinson 1946) and (Durbin 1960) the authors came up for with a weakly stable yet more efficient algorithm for solving these autocorrelated system of equations which requires only O(n^2) in time complexity. Later their work was further refined in (Trench 1964) and (Zohar 1969) to just 3\times n^2 multiplication.

A cursory search reveals that Toeplitz matrix inversion is still an area of active research with papers covering parallel algorithms and stability studies. Not surprising as most of the more interesting deep learning models, including LLMs are autoregressive!

So the Durbin-Levinson recursion is just an elegant bit of linear algebra for solving the Yule-Walker equations more efficiently.

Here is what I dug up:

K.1.1 Durbin-Levinson and the Yule-Walker equations (Off-Course Reading)

The Durbin-Levinson recursion is a method in linear algebra for computing the solution to an equation involving a Toeplitz matrix AKA a diagonal-constant matrix where descending diagonals are constant. The recursion runs in O(n^2) time rather then O(n^3) time required by Gauss-Jordan elimination.

The recursion can be used to compute the coefficients of the autoregressive model of a stationary time series. It is based on the Yule-Walker equations and is used to compute the PACF of a time series.

The Yule-Walker equations can be stated as follows for an AR(p) process:

\gamma_m = \sum_{k=1}^p \phi_k \gamma_{m-k} + \sigma_\varepsilon^2\delta_{m,0} \qquad \text{(Yule-Walker equations)} \tag{K.2}

where:

\gamma_m is the autocovariance function of the time series,
\phi_k are the AR coefficients,
\sigma_\varepsilon^2 is the variance of the white noise process, and
\delta_{m,0} is the Kronecker delta function.

when m=0 the equation simplifies to:

\gamma_0 = \sum_{k=1}^p \phi_k \gamma_{-k} + \sigma_\varepsilon^2 \qquad \text{(Yule-Walker equations for m=0)} \tag{K.3}

for m > 0 the equation simplifies to:

\begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \gamma_3 \\ \vdots \\ \gamma_p \end{bmatrix} = \begin{bmatrix} \gamma_0 & \gamma_{-1} & \gamma_{-2} & \cdots \\ \gamma_1 & \gamma_0 & \gamma_{-1} & \cdots \\ \gamma_2 & \gamma_1 & \gamma_0 & \cdots \\ \vdots & \vdots & \vdots & \ddots \\ \gamma_{p-1} & \gamma_{p-2} & \gamma_{p-3} & \cdots \end{bmatrix} \begin{bmatrix} \phi_{1} \\ \phi_{2} \\ \phi_{3} \\ \vdots \\ \phi_{p} \end{bmatrix}

and since this matrix is Toeplitz, we can use Durbin-Levinson recursion to efficiently solve the system for \phi_k \forall k.

Once \{\phi_m \qquad m=1,2, \dots ,p \} are known, we can consider m=0 and solved for \sigma_\varepsilon^2 by substituting the \phi_k into Equation K.3 Yule-Walker equations.

Of course the Durbin-Levinson recursion is not the last word on solving this system of equations. There are today numerous improvements which are both faster and more numerically stable.

The Yule-Walker equations are a set of p linear equations in the p unknowns \phi_1, \phi_2, \ldots, \phi_p that can be used to estimate the parameters of an autoregressive model of order p. The Yule-Walker equations are derived by setting the sample autocorrelation function equal to the theoretical autocorrelation function of an AR(p) model and then solving for the unknown parameters. The Yule-Walker equations are given by:

\begin{aligned} \gamma(0) & = \phi_1 \gamma(1) + \phi_2 \gamma(2) + \ldots + \phi_p \gamma(p) \\ \gamma(1) & = \phi_1 \gamma(0) + \phi_2 \gamma(1) + \ldots + \phi_p \gamma(p-1) \\ \gamma(2) & = \phi_1 \gamma(1) + \phi_2 \gamma(0) + \ldots + \phi_p \gamma(p-2) \\ \vdots \\ \gamma(p) & = \phi_1 \gamma(p-1) + \phi_2 \gamma(p-2) + \ldots + \phi_p \gamma(0) \\ \end{aligned}

where \gamma(k) is the sample autocorrelation function at lag k. The Yule-Walker equations can be solved using matrix algebra to obtain the estimates of the AR parameters \phi_1, \phi_2, \ldots, \phi_p.

--- title: "Appendix: Yule-Walker Equations & Durbin-Levinson Recursion" subtitle: "Appendix" description: "This appendix explains the Yule-Walker equations and the Durbin-Levinson recursion, which is a method for solving these equations efficiently using linear algebra." categories: - "numerical linear algebra" keywords: - "Durbin-Levinson recursion" - "Yule-Walker equations" - "Toeplitz matrix" - "Diagonal-constant matrix" - "autoregressive model" - "linear algebra" --- Durbin-Levinson recursion is a method in linear algebra for computing the solution to an equation involving a *Toeplitz matrix* AKA a *diagonal-constant matrix* where descending diagonals are constant. The recursion runs in $O(n^2)$ time rather then $O(n^3)$ time required by Gauss-Jordan elimination. In the course on Bayesian time series analysis, the Professor mentions the Durbin-Levinson recursion several times without explaining what this is. It is a shame as it is a very elegant bit of linear algebra for solving the [Yule-Walker equations](https://w.wiki/9gVB#Estimation_of_AR_parameters) more efficiently. I tried to find a good explanation in the context of the course, However I wrote some notes that can help you understand this topic. One final point is that Durbin-Levinson recursion is not the last word on solving this system of equations. There are today numerous improvements which are both faster and more numerically stable! ## Durbin-Levinson recursion {#sec-durbin-levinson} Like me, you might be curious about the Durbin-Levinson recursion mentioned above. This is not covered in the course, and turned out to be an enigma wrapped in a mystery. I present my finding in the note below - much of it is due to [@enwiki-LevinsonRecursion] and [@enwiki-YuleWalkerEquations] In [@yule1927periodicities] and [@walker1931periodicity], Yule and Walker proposed a method for estimating the parameters of an autoregressive model. The method is based on the Yule-Walker equations which are a set of **linear equations** that can be used to estimate the parameters of an autoregressive model. Due to the autoregressive structure of the model, the matrix for these equations is sparse and of a well known form called a Toeplitz matrix. $$ \begin{array}{c} {\displaystyle \qquad {\begin{bmatrix}a&b&c&d&e\\f&a&b&c&d\\g&f&a&b&c\\h&g&f&a&b\\i&h&g&f&a\end{bmatrix}}.} \end{array} $$ {#eq-toeplitz-matrix} In the 1930s, Yule and Walker would have had to solve these equations using [Gauss-Jordan elimination](https://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination) which has an $O(n^3)$ time complexity. This is where Durbin and Levinson come in. A decade or two later in [@Levinson1947Wiener] and [@Durbin1960Fitting] the authors came up for with a weakly stable yet more efficient algorithm for solving these autocorrelated system of equations which requires only $O(n^2)$ in time complexity. Later their work was further refined in [@Trench1964ToeplitzMI] and [@Zohar1969ToeplitzMI] to just $3\times n^2$ multiplication. A cursory search reveals that Toeplitz matrix inversion is still an area of active research with papers covering parallel algorithms and stability studies. Not surprising as most of the more interesting deep learning models, including LLMs are autoregressive! So the [Durbin-Levinson recursion](https://en.wikipedia.org/wiki/Levinson_recursion) is just an elegant bit of linear algebra for solving the [Yule-Walker equations](https://w.wiki/9gVB#Estimation_of_AR_parameters) more efficiently. Here is what I dug up: ### Durbin-Levinson and the Yule-Walker equations (Off-Course Reading) The Durbin-Levinson recursion is a method in linear algebra for computing the solution to an equation involving a *Toeplitz matrix* AKA a *diagonal-constant matrix* where descending diagonals are constant. The recursion runs in $O(n^2)$ time rather then $O(n^3)$ time required by Gauss-Jordan elimination. The recursion can be used to compute the coefficients of the autoregressive model of a stationary time series. It is based on the [Yule-Walker equations](https://w.wiki/9gVB#Estimation_of_AR_parameters) and is used to compute the PACF of a time series. The Yule-Walker equations can be stated as follows for an AR($p$) process: $$ \gamma_m = \sum_{k=1}^p \phi_k \gamma_{m-k} + \sigma_\varepsilon^2\delta_{m,0} \qquad \text{(Yule-Walker equations)} $$ {#eq-yule-walker} where: - $\gamma_m$ is the autocovariance function of the time series, - $\phi_k$ are the AR coefficients, - $\sigma_\varepsilon^2$ is the variance of the white noise process, and - $\delta_{m,0}$ is the Kronecker delta function. when $m=0$ the equation simplifies to: $$ \gamma_0 = \sum_{k=1}^p \phi_k \gamma_{-k} + \sigma_\varepsilon^2 \qquad \text{(Yule-Walker equations for m=0)} $$ {#eq-yule-walker-m-0} for $m > 0$ the equation simplifies to: $$ \begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \gamma_3 \\ \vdots \\ \gamma_p \end{bmatrix} = \begin{bmatrix} \gamma_0 & \gamma_{-1} & \gamma_{-2} & \cdots \\ \gamma_1 & \gamma_0 & \gamma_{-1} & \cdots \\ \gamma_2 & \gamma_1 & \gamma_0 & \cdots \\ \vdots & \vdots & \vdots & \ddots \\ \gamma_{p-1} & \gamma_{p-2} & \gamma_{p-3} & \cdots \end{bmatrix} \begin{bmatrix} \phi_{1} \\ \phi_{2} \\ \phi_{3} \\ \vdots \\ \phi_{p} \end{bmatrix} $$ \index{Toeplitz matrix} \index{Durbin-Levinson recursion} and since this matrix is **Toeplitz**, we can use **Durbin-Levinson** recursion to efficiently solve the system for $\phi_k \forall k$. Once $\{\phi_m \qquad m=1,2, \dots ,p \}$ are known, we can consider m=0 and solved for $\sigma_\varepsilon^2$ by substituting the $\phi_k$ into @eq-yule-walker-m-0 Yule-Walker equations. Of course the **Durbin-Levinson** recursion is not the last word on solving this system of equations. There are today numerous improvements which are both faster and more numerically stable. The Yule-Walker equations are a set of p linear equations in the p unknowns $\phi_1, \phi_2, \ldots, \phi_p$ that can be used to estimate the parameters of an autoregressive model of order p. The *Yule-Walker* equations are derived by setting the sample autocorrelation function equal to the theoretical autocorrelation function of an *AR($p$)* model and then solving for the unknown parameters. The **Yule-Walker** equations are given by: $$ \begin{aligned} \gamma(0) & = \phi_1 \gamma(1) + \phi_2 \gamma(2) + \ldots + \phi_p \gamma(p) \\ \gamma(1) & = \phi_1 \gamma(0) + \phi_2 \gamma(1) + \ldots + \phi_p \gamma(p-1) \\ \gamma(2) & = \phi_1 \gamma(1) + \phi_2 \gamma(0) + \ldots + \phi_p \gamma(p-2) \\ \vdots \\ \gamma(p) & = \phi_1 \gamma(p-1) + \phi_2 \gamma(p-2) + \ldots + \phi_p \gamma(0) \\ \end{aligned} $$ where $\gamma(k)$ is the sample autocorrelation function at lag k. The Yule-Walker equations can be solved using matrix algebra to obtain the estimates of the AR parameters $\phi_1, \phi_2, \ldots, \phi_p$.