107  Q&A for Chapter 2

Time Series: Modeling, Computation, and Inference

Author

Oren Bochman

107.1 Chapter 2 Problems

Chapter 2 of the book (Prado, Ferreira, and West 2023) is called “traditional time domain models” and deals with primarily with AR(p) and ARMA. Problems from p.84-95 of the book. This is a much longer problem set. The next chapter is deals with frequency domain models.

One of the sources of complexity in the course is the convention and nomenclature of variables used in modeling and later in programming that is both puzzling yet suggestive of referencing some sort of implicitly agreed “canonical” Bayesian formulation that is either omitted or not clearly defined.

Some like m and C are initialism for mean and covariance, often referenced as moments of Normal and student t distributions others like Q, QQ, etc. are not so obvious.

Q, QQ etc are frequently used in the code. These quantities appear in the initial bayesian formulation of the AR(1) model, and in the Gibbs step for the variance in the Metropolis-Hastings algorithm for the AR(1) model.

Q^*(\phi) = y_1^2 (1-\phi^2) + \sum_{t=2}^T (y_t - \phi y_{t-1})^2 \tag{107.1}

We also have a conditional sum of squares, defined as: Q(\phi) = \sum_{t=1}^T (y_t - \phi y_{t-1})^2 \tag{107.2}

Interpretations:

  1. The conditional sum of squares Q(\phi) arises in the conditional likelihoods, which the authors of (Prado, Ferreira, and West 2023 example (1.4)) say approximates the unconditional likelihood based on (Box et al. 2015)
  2. These quantities are also mentioned in terms of Maximum Likelihood Estimation (MLE) Least Squares (LS) and Maximum A Posteriori (MAP) estimation where again both conditional likelihoods might be approximated via conditional likelihoods in the respective estimators.

Exercise 107.1 Consider the process:

y_t = \phi y_{t-1} + \varepsilon_t , \quad \varepsilon_t \sim \mathcal{N} (0, v).

If |\phi| < 1 then y_t = \sum_{j=0}^\infty \phi^j \varepsilon_{t-j}.

Use this fact to prove that y_1 \sim \mathcal{N} (0, v/(1 - \phi^2 )) and that, as a consequence, the likelihood function has the form Equation 107.3.

p(y_{1:T} \mid \boldsymbol{\theta}) = \frac{(1-\phi^2)^{\frac{1}{2}}}{\sqrt{2\pi \nu}} \exp \left\{\frac{-Q^*(\phi)}{2\nu}\right\} \tag{107.3}

where Q^*(\phi) is the unconditional sum of squares defined in Equation 107.1.

Exercise 107.2 Consider the AR(1) process y_t = \phi y_{t-1} + \varepsilon_t , with \varepsilon_t \sim \mathcal{N} (0, v). Show that the process is nonstationary when \phi = \pm 1.

y_t = \phi^t y_0 + \sum_{s=1}^t \phi^{ t-s} \varepsilon_s

Hence

\mathbb{E}[y_t]=\phi^t \mathbb{E}[y_0], \qquad \mathbb{V}ar(y_t) =\phi^{2t}\mathbb{V}ar(y_0)+v\sum_{i=0}^{t-1}\phi^{2i}.

  • If \phi=1: \mathbb{E}[y_t]=\mathbb{E}[y_0] (constant) but \displaystyle\mathbb{V}ar(y_t)=\mathbb{V}ar(y_0)+v t, which grows with t.

  • If \phi=-1: \mathbb{E}[y_t]=(-1)^t\mathbb{E}[y_0] (oscillates) and \displaystyle\mathbb{V}ar(y_t)=\mathbb{V}ar(y_0)+v t, again unbounded in t.

In both cases the variance (and in the \phi=-1 case the mean) depends on t, so the process cannot be (weakly) stationary.

Exercise 107.3 Suppose y_t follows a stationary AR(1) model with AR parameter \phi and innovation variance v. Define x = (y_1 , \ldots , y_n )^\top. We know that x \sim \mathcal{N} (0, s\Phi_n ) where s = v/(1 - \phi^2 ) is the marginal variance of the y_t process and the correlation matrix \Phi_n has (i, j) element \phi^{|i-j|}, i.e.

\Phi_n= \begin{pmatrix} 1 & \phi & \phi^2 & \cdots & \phi^{n-1} \\ \phi & 1 & \phi & \cdots & \phi^{n-2} \\ \phi^2 & \phi & 1 & \cdots & \phi^{n-3} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ \phi^{n-1} & \phi^{n-2} & \cdots & 1 \end{pmatrix} \tag{107.4}

Find the precision matrix K_n = s^{-1} \Phi^{-1}_n and comment on its form.

Hint: One way to find this is “brute-force” matrix inversion using induction; but, that is just linear algebra that–in particular–ignores the probability model that defines \phi_n . There is a simpler and more instructive way to identify K_n based on reflecting on the probability model.

The precision matrix is tri-diagonal. We can see (e.g. by Sherman-Morrison or by writing down the Gaussian Markov factorization) that:

K_n \;=\; s^{-1}\,\Phi_n^{-1} = \frac{1- \phi^2}{v} \begin{pmatrix} 1 & -\phi & 0 & \cdots & 0\\ -\phi & 1+\phi^2 & -\phi & \cdots & 0\\ 0 & -\phi & 1+\phi^2 & \ddots & \vdots\\ \vdots & \vdots & \ddots & \ddots & -\phi\\ 0 & 0 & \cdots & -\phi & 1 \end{pmatrix} \tag{107.5}

The diagonal is

K_{11}=K_{nn}= \frac{1-\phi^2}{v},\quad K_{ii}=\frac{1-\phi^2}{v}\,(1+\phi^2)\;(2\le i\le n-1),

The only nonzero off-diagonals are

K_{i,i+1}=K_{i+1,i}=-\,\frac{1-\phi^2}{v}\,\phi.

Comment. Since all other entries are zero, K_n is sparse (bandwidth 1): each y_i is conditionally independent of all others given its two neighbors—exactly the Gaussian Markov property of an AR(1).

Exercise 107.4 Consider an AR(2) process with AR coefficients \phi = (\phi_1 , \phi_2 )^\top .

  1. Show that the process is stationary for parameter values lying in the region -1 < \phi_2 < 1, \phi_1 < 1 - \phi_2 , and \phi_1 > \phi_2 - 1.
  2. Show that the partial autocorrelation function of this process is \phi_1 /(1- \phi_2 ) for the first lag, \phi_2 for the second lag, and equal to zero for any lag h with h \ge 3.

(a) Stationarity region via characteristic roots The AR-polynomial is

1 - \phi_1 z - \phi_2 z^2 = 0,

with roots

z_{1,2} = \frac{\phi_1 \pm \sqrt{\phi_1^2 + 4\phi_2}}{2\phi_2}.

Stationarity ⇔ both roots lie outside the unit circle, |z\_{1,2}|>1. One shows (e.g. by requiring the polynomial to be positive on z=\pm1 and have discriminant conditions) that this is equivalent to

-1<\phi_2<1,\quad \phi_1<1-\phi_2,\quad \phi_1> \phi_2-1.


(b) Partial autocorrelations Let \rho_h = \frac{Cov(y_t,y_{t-h})}{\mathbb{V}ar(y_t)}.
From the Yule–Walker equations for AR(2):

\begin{aligned} \rho_1 &= \phi_1 + \phi_2 \rho_1, &&\text{(for lag 1)}\\ \rho_2 &= \phi_1 \rho_1 + \phi_2, &&\text{(for lag 2)}. \end{aligned}

Hence

\rho_1(1-\phi_2)=\phi_1\quad \implies \quad \rho_1=\frac{\phi_1}{1-\phi_2},

and

\rho_2=\phi_1\frac{\phi_1}{1-\phi_2}+\phi_2.

The PACF at lag h is the last coefficient in the regression of y_t on (y_{t-1},\dots,y_{t-h}). In particular:

\begin{aligned} \alpha_{11}&=\rho_1 = \frac{\phi_1}{1-\phi_2}, &&\text{(lag 1)}\\ \alpha_{22} &=\frac{\rho_2-\rho_1^2}{1-\rho_1^2} =\phi_2, &&\text{(lag 2)}\\ \alpha_{hh}&=0,\quad h\ge3, &&\text{(zero beyond the AR order).} \end{aligned}

Exercise 107.5 This question concerns a time series model for continuous and positive outcomes y_t. Suppose a series x_t follows a stationary AR(1) model with parameters \phi, v and the usual normal innovations. Define a transformed time series y_t = \exp(\mu + x_t) for each t for some known constant \mu.

  1. Show that y_t is a first-order Markov process.
  2. Is y_t a stationary process?
  3. Find \mathbb{E}(y_t \mid y_{t-1}) as a function of y_{t-1} and show that it has the form \phi for some positive constant a. Give an expression \mathbb{E}(y_t \mid y_{t-1}) = ay_{t-1} for a in terms of \mu, \phi, v.
  4. Can you imagine applied time series contexts that might utilize this simple model as a component? Comment on potential uses.

Exercise 107.6 Show that the eigenvalues of the matrix G given by (Prado, Ferreira, and West 2023, sec. (2.7)) correspond to the reciprocal roots of the AR(p) characteristic polynomial.

G = \begin{pmatrix} \phi_1 & \phi_2 & \cdots & \phi_{p-1} & \phi_p \\ 1 & 0 & \cdots & 0& 0 \\ 0 & 1 & \cdots & 0& 0 \\ \vdots & \vdots & \ddots & 0 & \vdots \\ 0 & 0 & \cdots & 1 & 0 \end{pmatrix}

\begin{aligned} \det\bigl(G - \lambda I\bigr) &= \det \begin{pmatrix} \phi_1 - \lambda & \phi_2 & \cdots & \phi_p \\ 1 & -\lambda& \cdots & 0 \\ 0 & 1 & \ddots & \vdots \\ \vdots & \vdots & \ddots & -\lambda \end{pmatrix} &\text{(companion-matrix form)}\\ & = (-1)^p\bigl(\lambda^p - \phi_1\lambda^{p-1} - \phi_2\lambda^{p-2} - \cdots - \phi_p\bigr) &\text{(by cofactor expansion)}\\ & = 0 \quad\Longleftrightarrow\quad \lambda^p - \sum_{i=1}^p\phi_i \lambda^{p-i}=0\\ & \Longleftrightarrow 1 - \sum_{i=1}^p\phi_i \lambda^{-i}=0 & \bigl(\text{divide by }\lambda^p\bigr)\\ & \Longleftrightarrow 1 - \sum_{i=1}^p\phi_i z^i=0 \quad\text{with }z=\tfrac1\lambda \end{aligned}

Thus the eigenvalues \lambda of G satisfy the reversed-polynomial equation, so z=1/\lambda are exactly the roots of the AR(p) characteristic polynomial 1-\phi_1 z-\cdots-\phi_p z^p=0,.

Exercise 107.7 Consider the AR(2) series y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \varepsilon_t with \varepsilon_t \sim \mathcal{N}(0, \nu). Following (Prado, Ferreira, and West 2023, sec. 2.1.2), rewrite the model in the standard DLM form y_t = \mathbf{F}^\top \mathbf{x}_t and \mathbf{x}_t = \mathbf{G}x_{t-1} + \mathbf{F}\varepsilon_t where \mathbf{F} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad \mathbf{x}_t = \begin{pmatrix} y_t \\ y_{t-1} \end{pmatrix}, \quad \mathbf{G} = \begin{pmatrix} \phi_1 & \phi_2 \\ 1 & 0 \end{pmatrix}

We know that this implies that, for any given t and over k \ge 0, the forecast function is E(y_{t+k} \mid \boldsymbol{\Lambda}_t ) = \mathbf{F}^\top \mathbf{G}^k \mathbf{x}_t .

  1. Show that the eigenvalues of \mathbf{G} denoted by \lambda_1 and \lambda_2 are the roots of the quadratic in \lambda given by \lambda^2 -\phi_1 \lambda-\phi_2 = 0. Deduce that \phi_1 = \lambda_1 +\lambda_2 and \phi_2 = -\lambda_1 \lambda_2.

  2. Suppose that the eigenvalues \lambda_1 , \lambda_2 are distinct, whether they be real or a pair of complex conjugates. Define \begin{aligned} \boldsymbol{\Lambda} = \begin{pmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{pmatrix} & \boldsymbol{E} = \begin{pmatrix} \lambda_1 & \lambda_2 \\ 1 & 1 \end{pmatrix} \tau \end{aligned}

for any nonzero \tau. Note that \boldsymbol{E} is non-singular since \lambda_1 \ne \lambda_2. Verify that \mathbf{GE} = \mathbf{E} \boldsymbol{\Lambda}, so that \boldsymbol{G} = \boldsymbol{E} \Lambda \boldsymbol{E}^{-1}, that is, \boldsymbol{E} has columns that are eigenvectors of \boldsymbol{G} corresponding to eigenvalues (\lambda_1, \lambda_2).

  1. We can take \tau = 1 with no loss of generality as \tau cancels in the identity \mathbf{G} = \mathbf{E} \boldsymbol{\Lambda} \mathbf{E}^{-1}; do so from here on. Show that \begin{aligned} \boldsymbol{\Lambda} ^k \mathbf{E}^{-1} =\frac{1}{\lambda_1-\lambda_2} \begin{pmatrix} \lambda_1^k & -\lambda_1^k\lambda_2 \\ -\lambda_1^k & \lambda_1 \lambda_2^k \end{pmatrix} \end{aligned}

  2. Deduce that \mathbf{E}(y_{t+k} |x_t ) = a_k y_t + b_k y_{t-1} with lagged coefficients a_k = \frac{(\lambda_1^{k+1} - \lambda_2^{k+1})}{(\lambda_1 - \lambda_2 )} and b_k = \frac{(-\lambda_1{k+1} \lambda_2 + \lambda_1 \lambda_{k+1})}{(\lambda_1 - \lambda_2 )}

  3. Verify that this resulting expression \mathbf{E}(y_{t+k} \mid x_t ) = a_k y_t + b_k y_{t-1} gives the known results in terms of \phi_1 , \phi_2 when k = 0 and k = 1.

  4. Consider now the special case of complex eigenvalues \lambda_1 = re^{i\omega} and \lambda_2 = re^{-i\omega} for some real-valued modulus r > 0 and argument \omega > 0. Show that the lagged coefficients a_k, b_k become a_k = r^k \frac{sin((k + 1)\omega)}{sin(\omega)} and b_k = -r^{k +1} \frac{sin(k\omega)}{sin(\omega)}

  5. Continuing in the case of complex eigenvalues, use simple trigonometric identities to show that the forecast function can be reduced to \mathbf{E}(y_{t+k} \mid x_t) = r^k h_t \cos(k\omega + g_t ), \quad k = 0, 1, \ldots, a damped cosine form in k (in stationary models with 0 < r < 1). Give explicit expressions for the time-dependent amplitude h_t > 0 and phase g_t in terms of \omega and y_{t-1} , y_t .

Exercise 107.8 Show that the general solution of the homogeneous difference Equation for The autocorrelation structure of an AR(p) is given in terms of the solution of the homogeneous difference equation \rho(h) - \phi_1 \rho(h - 1) - \cdots - \phi_p \rho(h - p) = 0, h > 0.

has the form:

\rho(h) = \alpha_1^h p_1(h) + \alpha_2^h p_2(h) + \cdots + \alpha_r^h p_r(h), h > 0

  • where:
    • \alpha_r denote the reciprocal roots of the characteristic polynomial \Phi(u)
    • with each root \alpha_j having multiplicity m_j,
    • p_j(h) is a polynomial of degree m_j - 1.

Use the back-shift operator B, so that B\rho(h)=\rho(h-1). Then the recurrence

\rho(h)-\phi_1\rho(h-1)-\cdots-\phi_p\rho(h-p)=0

can be written as

L(B) \rho(h) = (1-\phi_1B-\cdots-\phi_pB^p) \rho(h) = 0.

Let the characteristic polynomial factor as

L(z) = \prod_{j=1}^r (1-\alpha_j z)^{m_j},

where the \alpha_j are the (reciprocal) roots, with multiplicity m_j. Then

L(B)=\prod_{j=1}^r(1-\alpha_jB)^{m_j},

so each factor (1-\alpha_jB)^{m_j} must annihilate \rho(h). But one checks easily that

(1-\alpha_jB) \bigl [\alpha_j^h \bigr ]=0

and more generally, by taking finite differences, that

(1-\alpha_jB)^{m_j}\Bigl[h^k \alpha_j^h\Bigr]=0 \quad\text{for each }k=0,1,\dots,m_j-1.

Hence the general solution is the linear combination of those basis-solutions:

\rho(h) =\sum_{j=1}^r\sum_{k=0}^{m_j-1}c_{j,k} h^k \alpha_j^h =\sum_{j=1}^r\alpha_j^h p_j(h),

where each p_j(h) is an arbitrary polynomial of degree \le m_j-1. This is exactly the stated form.

Exercise 107.9 Show that, when the characteristic roots are all different, f_t(h) the forecast function of an AR(p) process has the representation given in Equation 107.6

f_t(h) =\sum_{j=1}^p c_{tj} \alpha_j^h , \tag{107.6}

  • where
    • c_{tj} are (possibly complex-valued) constants depending on \phi and the current state \mathbf{x}_t , and
    • the \alpha_j s are the p distinct eigenvalues/reciprocal roots.

Define the h-step forecast

f_t(h)=\mathbb{E}[y_{t+h}\mid \mathbf x_t]

which for an AR(p) satisfies the homogeneous recurrence (since the innovations have mean zero):

f_t(h)=\phi_1 f_t(h-1) + \cdots+ \phi_p f_t(h-p) \qquad h\ge1,

with “initial” values

f_t(0)=y_t, f_t(-1)=y_{t-1}, \dots, f_t(-p+1)=y_{t-p+1}

When the characteristic polynomial

u^p - \phi_1 u^{p-1}-\cdots-\phi_{p-1}u - \phi_p

has p distinct roots \alpha_1,\dots,\alpha_p, the general solution of the recurrence is a linear combination of the root-powers. Concretely,

\begin{aligned} f_t(h) &= \phi_1 f_t(h-1) + \cdots + \phi_p f_t(h-p) &&\text{(forecast recurrence)}\\ &= \sum_{j=1}^p c_{tj} \alpha_j^h && \text{(general solution of a p th-order linear recurrence)} \end{aligned}

where the constants c_{tj} are determined by matching the “initial” conditions

f_t(k)=\sum_{j=1}^p c_{tj} \alpha_j^k,\quad k=0,1,\dots,p-1,

i.e. by solving the p\times p Vandermonde system

\begin{pmatrix} 1 & 1 & \cdots & 1\\ \alpha_1 & \alpha_2 & \cdots & \alpha_p\\ \vdots & \vdots & & \vdots\\ \alpha_1^{ p-1} & \alpha_2^{ p-1} & \cdots & \alpha_p^{ p-1} \end{pmatrix} \begin{pmatrix} c_{t1}\\ c_{t2}\\ \vdots\\ c_{tp} \end{pmatrix} = \begin{pmatrix} y_t\\y_{t+1}\\\vdots\\y_{t+p-1} \end{pmatrix}

Thus, under distinct roots, the h-step forecast takes the form

\boxed{% f_t(h)=\sum_{j=1}^p c_{tj} \alpha_j^h}

as required.

Exercise 107.10 Show that if an AR(2) process has a pair of complex roots given by r\sqrt{\exp(\pm i\omega)}, they can be written in terms of the AR coefficients as r =\sqrt{- \phi_2} and cos(\omega) = \phi_1 /2r.

\begin{aligned} \phi(L)&=1-\phi_1L-\phi_2L^2 \\ &=(1 - r e^{i\omega}L)(1 - r e^{-i\omega}L) &&\text{by assumption of roots }r e^{\pm i\omega}\\ &=1 - (r e^{i\omega} + r e^{-i\omega}) L + (r e^{i\omega})(r e^{-i\omega}) L^2 &&\text{expanding the product}\\ &=1 - 2r\cos(\omega) L + r^2 L^2 &&\bigl(re^{i\omega} + re^{-i\omega}= 2r\cos\omega, e^{i\omega} e^{-i\omega}=1\bigr) \end{aligned}

Matching coefficients with 1-\phi_1L-\phi_2L^2 gives:

\phi_1=2r\cos\omega,\qquad \phi_2= -r^2

Solving these,

r=\sqrt{-\phi_2}, \qquad \cos(\omega) = \frac{\phi_1}{2r}

Exercise 107.11 Plot the corresponding forecast functions for the AR(2) processes considered in Example 2.1.

  1. \phi_1 = 0.1, \phi_2 = 0.8
  2. \phi_1 = 1.8, \phi_2 = -0.81
  3. \phi_1 = 1.2, \phi_2 = -0.9

TODO

Exercise 107.12 Verify that the expressions for the conditional posterior distributions in (Prado, Ferreira, and West 2023, sec. 2.4.1) are correct.

TODO

Exercise 107.13 Show that a prior on the vector of AR(p) coefficients \phi of the form \mathcal{N} (\phi_1 \mid 0, w/\delta_1 ) and \mathcal{N} (\phi_j \mid \phi_{j-1} , w/ \delta_j) for 1 < j \le p can be written as p(\phi) = \mathcal{N} (\phi \mid 0, A^{-1} w), where A = H^\top \Delta H with H and \Delta defined in (Prado, Ferreira, and West 2023, sec. 2.4.2).

TODO

Exercise 107.14 Verify the ACF of a MA(q) process given in (2.33).

TODO

107.1.1 ARMA Models

The questions on ARMA are mostly beyond the scope of the courses I took, I might get to them later. Because there are extra chapters on NDLM which we did cover in the course which I plan to solve first.

Exercise 107.15 Find the ACF of a general ARMA(1,1) process.

Let’s find the autocorrelation function (ACF) of the ARMA(1,1) process


An ARMA(1,1) process is given by:

y_t = \phi y_{t-1} + \varepsilon_t + \theta \varepsilon_{t-1}, \quad \varepsilon_t \overset{i.i.d.}{\sim} \mathcal{N}(0, \sigma^2) \tag{107.7}

  • where
    • \phi is the AR parameter,
    • \theta is the MA parameter, and
    • \varepsilon_t are white noise innovations.

I assume the process is stationary.


  • We want to compute the autocovariance function

\gamma(h) = \operatorname{Cov}(y_t, y_{t-h})

  • the ACF is:

\rho(h) = \frac{\gamma(h)}{\gamma(0)}

So let’s start with \gamma(0), then \gamma(1), and then find a recurrence for \gamma(h).


107.1.2 Step 3: Compute \gamma(0)

We’ll compute the variance of y_t:

y_t = \phi y_{t-1} + \varepsilon_t + \theta \varepsilon_{t-1} \tag{107.8}

To do this, let’s define the Wold representation (if we write y_t as a moving average):

y_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j} \tag{107.9}

where

\psi_0 = 1,\quad \psi_1 = \phi + \theta,\quad \psi_j = \phi \psi_{j-1} \text{ for } j \ge 2

So:

\begin{array}{c} \psi_j = \begin{cases} 1 & \text{if } j = 0 \\ (\phi + \theta)\phi^{j-1} & \text{if } j \ge 1 \end{cases} \end{array}

Then the variance is:

\begin{aligned} \gamma(0) &= \operatorname{Var}(y_t) \\ &= \sigma^2 \sum_{j=0}^{\infty} \psi_j^2 \\ &= \sigma^2 \left[ \psi_0^2 + \sum_{j=1}^\infty \psi_j^2 \right] \\ &= \sigma^2 \left[ 1 + \sum_{j=1}^\infty (\phi + \theta)^2 \phi^{2(j-1)} \right] & \text{(substituting from above)} \\ &= \sigma^2 \left[ 1 + (\phi + \theta)^2 \sum_{j=1}^\infty \phi^{2(j-1)} \right] \\ &= \sigma^2 \left[ 1 + (\phi + \theta)^2 \sum_{k=0}^\infty \phi^{2k} \right] & \text{(a geometric series assuming } |\phi| < 1 \text{)} \\ &= \sigma^2 \left[ 1 + \frac{(\phi + \theta)^2}{1 - \phi^2} \right] \end{aligned}

Exercise 107.16 Show that Equations Equation 107.12 and Equation 107.13 hold by taking expected values in Equation 107.10 and Equation 107.11 with respect to the whole past history y_{-\infty,t}.

y_{t+h} = \sum_{j=1}^\infty \phi_j^* y_{t+h-j} + \varepsilon_{t+h} \tag{107.10}

y_{t+h} = \sum _{j=1}^\infty \theta_j^* \varepsilon_{t+h-j} +\varepsilon_{t+h} \tag{107.11}

y_{t+h} - y_{t+h}^{-\infty} = \sum_{j=0}^{h-1} \theta_j^* \varepsilon_{t+h-j} \tag{107.12}

MSE_{t+h}^{-\infty} = \mathbb{E}(y_{t+h} - y_{t+h}^{-\infty}) = v \sum_{j=0}^{h-1} (\theta_j^*)^2 \tag{107.13}

Exercise 107.17 Consider the AR(1) model given by (1 - \phi B)(y_t - \mu) = \varepsilon_t where \varepsilon_t \sim \mathcal{N} (0, \nu).

  1. Find the MLEs for \phi and \mu when \mu \ne 0.
  2. Assume that \nu is known, \mu=0, and that the prior distribution for \phi is \mathcal{U}(\phi \mid 0,1). Find an expression for the posterior distribution of \phi.

Solution (Solution:). C.f. (Brockwell and Davis 1991, sec. 8.2) (Hamilton 2020, sec. 5.2) and (Prado, Ferreira, and West 2023, sec. 1.5) MLEs coincide with OLS of y_t on (1,y_{t-1}) for t=2,\dots,T.

we can write the Likelihood function as:

Q(\phi,\mu)\;=\;\frac{1}{\nu}\sum_{t=2}^T\{y_t-\mu-\phi(y_{t-1}-\mu)\}^2, \quad \ell(\phi,\mu)\propto-\tfrac12 Q(\phi,\mu).

  1. Conditional MLEs (\mu\neq0)

Minimize Q(\phi,\mu) over (\phi,\mu). Let

\bar y=\frac{1}{T-1}\sum_{t=2}^T y_t,\qquad \bar y_-=\frac{1}{T-1}\sum_{t=2}^T y_{t-1},

S_{xx}=\sum_{t=2}^T (y_{t-1}-\bar y_-)^2,\qquad S_{xy}=\sum_{t=2}^T (y_{t-1}-\bar y_-)(y_t-\bar y).

Then the normal equations give the OLS solutions

\hat\phi=\frac{S_{xy}}{S_{xx}},\qquad \hat\alpha=\bar y-\hat\phi\,\bar y_-, \qquad \hat\mu=\frac{\hat\alpha}{1-\hat\phi}\quad(\hat\phi\neq1).

(Equivalently: regress y_t on (1,y_{t-1}) for t=2{:}T, then use \alpha=(1-\phi)\mu.)


  1. Posterior of \phi with known \nu, \mu=0, prior \phi \sim \mathcal{U}(0,1)

With \mu=0,

Q(\phi)=\frac{1}{\nu}\sum_{t=2}^T(y_t-\phi y_{t-1})^2.

Define

A=\sum_{t=2}^T y_t^2,\quad B=\sum_{t=2}^T y_t y_{t-1},\quad C=\sum_{t=2}^T y_{t-1}^2,\quad m=\frac{B}{C},\quad s^2=\frac{\nu}{C},

(with the understanding that C>0 unless y_{t-1}\equiv0). Completing the square:

\sum_{t=2}^T(y_t-\phi y_{t-1})^2 = C(\phi-m)^2+\Big(A-\frac{B^2}{C}\Big).

Thus the conditional likelihood in \phi is proportional to \exp\{-(\phi-m)^2/(2s^2)\}. With the uniform prior on (0,1),

p(\phi\mid y)\;\propto\;\exp\!\Big(-\frac{(\phi-m)^2}{2s^2}\Big)\,\mathbf 1_{(0,1)}(\phi),

i.e., a truncated normal:

p(\phi\mid y)= \frac{\displaystyle \frac{1}{\sqrt{2\pi}s}\exp\!\Big(-\frac{(\phi-m)^2}{2s^2}\Big)\,\mathbf 1_{(0,1)}(\phi)} {\displaystyle \Phi\!\Big(\tfrac{1-m}{s}\Big)-\Phi\!\Big(\tfrac{-m}{s}\Big)}.

Edge case: If C=0 (i.e., y_{t-1}=0 for all t), then Q(\phi) is constant in \phi and the posterior reduces to the prior, p(\phi\mid y)=\mathbf 1_{(0,1)}(\phi).

Exercise 107.18 Suppose you observe y_t = x_t + \nu_t where:

  • x_t follows a stationary AR(1) process with AR parameter \phi and innovation variance v, i.e., x_t = \phi x_{t-1} + \varepsilon_t with independent innovations \varepsilon_t \sim \mathcal{N} (0, v)
  • The \nu_t are independent measurement errors with \nu_t \sim \mathcal{N}(0, w)
  • The \varepsilon_t and \nu_t series are mutually independent.

It easily follows that: q = V(y_t) = s + w where s = V (x_t) = \frac{v}{(1 - \phi ^2)} .

  1. Show that y_t = \phi y_{t-1} + \eta_t where \eta_t = \varepsilon_t + \nu_t - \phi \nu_{t-1} .
  2. Show that the lag-1 correlation in the \eta_t sequence is given by the expression -\phi w/(w(1 + \phi^2) + v) .
  3. Find an expression for the lag-k autocorrelation of the y_t process in terms of k, \phi, and the signal to noise ratio s/q . Comment on this result.
  4. Is y_t an AR(1) process? Is it Markov? Discuss and provide theoretical rationalization.

Solution. Some identities I’ll use, c.f. (Prado, Ferreira, and West 2023, secs. 1.2, 1.3, 2.2):

  • Stationary AR(1): x_t=\phi x_{t-1}+\varepsilon_t, \varepsilon_t\!\sim\!\mathcal N(0,v)\gamma_x(k)=\mathrm{Cov}(x_t,x_{t-k})=s\,\phi^{|k|} with s=\dfrac{v}{1-\phi^2}.
  • If a_t and b_t are independent (across and within series), then \mathrm{Cov}(a_t+b_t,a_{t-k}+b_{t-k})=\mathrm{Cov}(a_t,a_{t-k})+\mathrm{Cov}(b_t,b_{t-k}).
  • Measurement noise \nu_t\sim\mathcal N(0,w) is i.i.d., so \gamma_\nu(k)=0 for k\neq 0.

  1. AR representation with colored innovations

\begin{aligned} y_t &=x_t+\nu_t \\ &=\phi x_{t-1}+\varepsilon_t+\nu_t & \text{subst. AR(1)}\\ &=\phi(y_{t-1}-\nu_{t-1})+\varepsilon_t+\nu_t & (y_{t-1}=x_{t-1}+\nu_{t-1})\\ &=\phi y_{t-1}+\underbrace{\big(\varepsilon_t+\nu_t-\phi\nu_{t-1}\big)}_{\eta_t} \\ &= \phi y_{t-1}+\eta_t & \blacksquare \end{aligned}


  1. \mathrm{Corr}(\eta_t,\eta_{t-1})

\begin{aligned} \eta_t &= \varepsilon_t+\nu_t-\phi\nu_{t-1},\\ \eta_{t-1} &= \varepsilon_{t-1}+\nu_{t-1}-\phi\nu_{t-2}. \end{aligned}

Independence wipes out all cross-terms except the shared \nu_{t-1}:

\begin{aligned} \mathrm{Cov}(\eta_t,\eta_{t-1})&=(-\phi)\cdot 1\cdot \mathrm{Var}(\nu_{t-1}) \\ &=-\phi w \end{aligned}

Also

\begin{aligned} \mathrm{Var}(\eta_t)&\stackrel{\triangle}{=}\mathrm{Var}(\varepsilon_t+\nu_t-\phi\nu_{t-1}) \\ &=\mathrm{Var}(\varepsilon_t)+\mathrm{Var}(\nu_t)+\phi^2\mathrm{Var}(\nu_{t-1}) & \text{(Var props)}\\ &\stackrel{\triangle}{=}v+w+\phi^2 w \\ &=v+w(1+\phi^2) & \text{(collecting terms)} \end{aligned}

Hence

\rho_\eta(1)=\frac{\mathrm{Cov}(\eta_t,\eta_{t-1})}{\mathrm{Var}(\eta_t)} =\frac{-\phi w}{v+w(1+\phi^2)} \quad \blacksquare


  1. ACF of y_t in terms of k,\phi, \frac{s}{q}

For k\ge 1:

\gamma_y(k) =\mathrm{Cov}(x_t+\nu_t,\,x_{t-k}+\nu_{t-k}) =\gamma_x(k)+0 =s\,\phi^k.

At lag 0: \gamma_y(0)=\mathrm{Var}(y_t)=s+w=q. Thus, for k\ge 1,

\rho_y(k)=\frac{\gamma_y(k)}{\gamma_y(0)}=\frac{s\,\phi^k}{q}=\Big(\frac{s}{q}\Big)\phi^k, \qquad \rho_y(0)=1.

Note: Measurement error leaves the shape of the AR(1) ACF (\phi^k) intact, yet shrinks its amplitude by the signal-to-total variance ratio s/q\in(0,1]. As SNR (s/q) drops, the sample ACF is uniformly attenuated toward 0.


  1. Is y_t AR(1)? Markov?
  • AR(1)? In general no. From (a), y_t-\phi y_{t-1}=\eta_t and \eta_t is not white (it has \rho_\eta(1)\neq 0). This is an ARMA(1,1) process with AR parameter \phi and an MA(1) part determined by the noise mix (indeed \eta_t is MA(1)). Special cases that reduce to AR(1)/white noise:

    • w=0: no measurement noise ⇒ y_t=x_t is AR(1).
    • \phi=0: signal is white ⇒ y_t is white + measurement white ⇒ white.
  • Markov? The scalar observed process \{y_t\} is not first-order Markov (nor finite-order Markov in general) because \eta_t depends on the unobserved \nu_{t-1} that is not measurable from \sigma(y_{t-1},y_{t-2},\dots). However, the state-space representation with latent state (x_t,\nu_t) is first-order Markov (linear–Gaussian). So \{y_t\} is ARMA(1,1) and non-Markov in its own filtration, but the hidden state is Markov-1.

Note: Item (d) suggests a resolution to a long standing issue in the course where by different views (AR and State Space) we get an apparently contradiction regarding the markov property. In one we have potentially infinite or at least long term memory while the other is independent of its past conditional on the present!

Exercise 107.19 You observe y_t = x_t + \mu \quad t \in \mathbb{N} where x_t follows a stationary AR(1) process with AR parameter \phi and innovation variance \nu, i.e., x_t = \phi x_{t-1} + \varepsilon_t with independent innovations \varepsilon_t \sim \mathcal{N} (0, \nu) . Assume all parameters (\mu, \phi, \nu) are known.

  1. Identify the ACF and PACF of y_t , and comment of comparisons with those of x_t .
  2. What is the marginal distribution of y_t ?
  3. What is the distribution of (y_t \mid y_{t-1}) ?
  4. What is the distribution of (y_t \mid y_1 , \ldots , y_{t-1}) ?
  5. Now consider \mu as a parameter to be estimated. As a function of \mu and conditioning on the initial value y_1, what is the likelihood function p(y_2 , \ldots , y_{T +1} \mid y_1 , \mu) ?
  6. Assume \phi, v are known. Under the reference prior p(\mu) \propto \text{constant} , show that the resulting posterior for \mu based on the conditional likelihood above is normal with precision (1 - \phi)^2 \frac{T}{v} , and give an expression for the mean of this posterior.
  7. Show that, for large T , the reference posterior mean above is approximately the sample mean of the y_t data.
  8. If \phi = 0, we have the usual normal random sampling problem. For nonzero values of \phi, the above posterior for the mean of the normal data y_t depends on \phi in the posterior variance. Comment on how the posterior changes with \phi and why this makes sense.

Solution.

Exercise 107.20 skipped question

Exercise 107.21 skipped question

Exercise 107.22 Let x_t be an AR(p) process with characteristic polynomial \Phi_x(u) and y_t be an AR(p) process with characteristic polynomial \Phi_y(u). What is the structure of the process z_t = x_t + y_t?

Exercise 107.23 skipped question

Exercise 107.24 Consider the AR(2) process: y_t = \phi_1\, y_{t-1} + \phi_2\, y_{t-2} + \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0, v) where \varepsilon_t are independent with \phi_1 = 0.9 , and \phi_2 = -0.9 . Is this process stable? If so write the process as an infinite order MA process, y_t = \sum_{j=0}^\infty \psi_j\;\varepsilon_{t-j} . Find \psi_j \quad \forall j .

Exercise 107.25 Consider a process of the form: y_t = -2t + \varepsilon_t + 0.5 \varepsilon_{t-1}, \qquad \varepsilon_t \stackrel{i.i.d.}{\sim} \mathcal{N}(0, \nu)

  1. Find the ACF of this process.
  2. Now define z_t = y_t - y_{t-1} + 2.
    • What kind of process is this?
    • Find its ACF
  1. Find the ACF of the process. We’re given:

y_t = -2t + \varepsilon_t + 0.5\; \varepsilon_{t-1}, \qquad \varepsilon_t \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, v) \tag{107.14}

Let’s break this down: we can separate the deterministic and stochastic parts

  • Deterministic trend: -2t
  • Stochastic part: x_t = \varepsilon_t + 0.5 \varepsilon_{t-1}

So we can write:

y_t = -2t + x_t

The autocovariance and autocorrelation come only from the stationary part x_t, since the trend -2t is deterministic and doesn’t contribute to the variance or autocorrelation between residuals.

So let’s focus on:

x_t = \varepsilon_t + 0.5 \varepsilon_{t-1}

This is a moving average of order 1, MA(1), with parameter \theta = 0.5. Recall the ACF of an MA(1) process:

\begin{cases} \rho(0) = 1 && h = 0 \\ \rho(1) = \dfrac{\theta}{1 + \theta^2}=\dfrac{0.5}{1 + 0.5^2}=\dfrac{0.5}{1.25}=0.4 && h = 1 \\ \rho(h) = 0 && h > 1 \end{cases}

  1. Now define

\begin{aligned} z_t &= y_t - y_{t-1} + 2 && \text{ assumption} \\ &= (-2t + x_t) - (-2(t-1) + x_{t-1}) + 2 && \text{ subst. } y_t \text{ and } y_{t-1} \\ &= -2t + 2(t-1) + x_t - x_{t-1} + 2 && \text{ simplifying } \\ &= -2 + x_t - x_{t-1} + 2 && \text{ collecting terms } \\ &= x_t - x_{t-1} && \text{ final simplification } \\ &= \varepsilon_t + 0.5 \varepsilon_{t-1} - (\varepsilon_{t-1} + 0.5 \varepsilon_{t-2}) && \text{ subst. } x_t \text{ and } x_{t-1} \\ &= \varepsilon_t - 0.5 \varepsilon_{t-2} \\ &= \varepsilon_t + \theta_0 \varepsilon_{t-1} + \theta_1 \varepsilon_{t-2} && \text{ where } \theta_0 = 0, \theta_1 = -0.5 \end{aligned}

This means z_t is a MA(2) processes.

The ACF of z_t can be computed as follows:

\begin{cases} \gamma(h) &= \nu(1+\theta_1^2+\theta_2^2),\quad && h=0 \\ \gamma(h) &= \nu(\theta_1+\theta_1\theta_2),\quad && h=1 \\ \gamma(h) &= \nu(\theta_2), && h=2 \\ \gamma(h) &= 0 && \text{ for } h > 2 \end{cases}

since

\rho(h)=\gamma(h)/\gamma(0) we get

\rho(1)=\dfrac{\nu(\theta_1+\theta_1\theta_2)}{\nu(1+\theta_1^2+\theta_2^2)}=\dfrac{\theta_1+\theta_1\theta_2}{1+\theta_1^2+\theta_2^2}=0

\rho(2)=\dfrac{\nu(\theta_2)}{\nu(1+\theta_1^2+\theta_2^2)}=\dfrac{\theta_2}{1+\theta_1^2+\theta_2^2}=\dfrac{-0.5}{1+0.5^2}= -0.4

\rho(h)=0 \quad h>2

The MA is invertible since the roots of 1+0⋅L−0.5L^2=0 are \pm2 in magnitude >1

Exercise 107.26 Figure 2.14 plots the monthly changes in the US S&P stock market index over 1965 to 2016. Consider an AR(1) model as a very simple exploratory model—for understanding local dependencies but not for forecasting more than a month or two ahead. We know there is a great deal of variation across the years in the market economy and that we might expect “change” that an AR(1) model does not capture.

To explore this, we can simply fit the AR(1) model to shorter sections of the data and examine the resulting inferences on parameters to see if they seem to vary across time.

Do this as follows. The full series has T = 621 months of data; look at many separate time series by selecting a month m and taking some number k months either side; for example, you might take k = 84 and for any month m analyze the data over the “windowed period” from m - k to m + k inclusive. Repeat this for each month m running from m = k + 1 to m = T - k. These repeated analyses will define a “trajectory” of AR(1) analyses over time, one for each sub-series.

For each sub-series, subtract the sub-series mean (to roughly center the sub-series series about zero) and then compute the summaries of the reference posterior for an AR(1) model to just those 2k+1 time points—just treating each selected sub-series separately. Using the theoretical posterior T distribution for the \phi parameter, compute and compare graphically) the exact posterior 90% credible intervals.

  1. Comment on what you see in the plot and comparison, and what you might conclude in terms of changes over time.
  2. Do you believe that short-term changes in S&P have shown real changes in month-month dependencies since 1965?
  3. How would you suggest also addressing the question of whether or not the underlying mean of the series is stable over time?
  4. What about the innovations variance?
  5. What does this suggest for more general models that might do a better job of imitating this data?