93 The AR(p) process – Bayesian Statistics

Learning Objectives

Define an AR(p), or autoregressive process of order p and use R to obtain samples from such process
Define ARIMA (autoregressive moving average) models (honors)
Perform posterior inference for the AR(p) under the conditional likelihood and the reference prior
Perform a full data analysis in R using an AR(p) including
- likelihood estimation,
- Bayesian inference,
- model order selection,
- forecasting.
Explain the relationship between the AR characteristic polynomial, the forecast function, the ACF and the spectral density in the case of an AR(p)

93.1 AR(p) Definition and State-space Representation 🎥

Figure 93.1: AR(p) process, characteristic polynomial, stability, stationarity and MA representation

The under tildes used in the slides denote a vector or a matrix rather than a statistical property. They are usually denoted via bold fonts and not by under tildes which have other meanings too so for the sake of clarity I have replaced them with bold font in the outline. I provide detailed derivation in this section but there is a shorter outline in Section 93.7 which may be easier to review once you know the derivations.

In this segment we will see two important representations of the AR(p) process. You can follow along in (Prado, Huerta, and West 2000 ch. 2)

93.1.1 AR(p) definition

AR(p), shorthand, for Auto Regressive Process of order p which generalizes the AR(1) process by defining the current time step in terms of the previous p time steps. We denote the number of parameter required to characterize the current value as p, and call it the order of the autoregressive process. The order tells us how many lags we will be considering.. Therefore the AR(1) process as a special case of the more general AR(p) process with p=1.

order p

We will assume AR(p) has the following structure:

AR(p)

\textcolor{red}{y_t} = \textcolor{blue}{\phi_1} \textcolor{red}{y_{t-1}} + \textcolor{blue}{\phi_2} \textcolor{red}{y_{t-2}} + \ldots + \textcolor{blue}{\phi_p} \textcolor{red}{y_{t-p}} + \textcolor{grey}{\varepsilon_t}, \qquad \textcolor{grey}{\varepsilon_t} \overset{\text{iid}}{\sim} \mathcal{N}(0,v) \quad \forall t \tag{93.1}

where:

\textcolor{red}{y_t} is the value of the time series at time t
\textcolor{blue}{\phi_{1:p}} are the AR coefficients
\textcolor{grey}{\varepsilon_t} \overset{\text{iid}}{\sim} \mathcal{N}(0,v) \quad \forall t is a white noise process with zero mean and constant variance v.

93.1.2 AR(p) Characteristic Polynomial

A central outcome of the autoregressive nature of the AR(p) is due to the properties the AR characteristic polynomial \Phi. This is defined as :

\Phi AR characteristic polynomial

recall the backshift operator B is defined as:

\operatorname{B} y_t = y_{t-1}

so that

\operatorname{B}^j y_t = y_{t-j}

We now use the backshift operator to rewrite the AR(p) as a inhomogeneous linear difference equation: \begin{aligned} y_t &= \phi_1 y_{t-1} + \phi_2 y_{t-2} + \ldots + \phi_p y_{t-p} + \varepsilon_t && \text{(AR($p$) defn.)} \newline y_t &= \phi_1 \operatorname{B} y_t + \phi_2 \operatorname{B}^2 y_t + \ldots + \phi_p \operatorname{B}^p y_t + \varepsilon_t && \text{(B defn.)} \newline \varepsilon_t &= y_t - \phi_1 \operatorname{B} y_t - \phi_2 \operatorname{B}^2 y_t - \ldots - \phi_p \operatorname{B}^p y_t && \text{(rearranging)} \newline \varepsilon_t &= (1- \phi_1 \operatorname{B} - \phi_2 \operatorname{B}^2 - \ldots - \phi_p \operatorname{B}^p) y_t && \text{(factoring out $y_t$)} \end{aligned} \tag{93.2}

\Phi(u) = 1 - \phi_1 u - \phi_2 u^2 - \ldots - \phi_p u^p \qquad \text{(Characteristic polynomial)} \tag{93.3}

where:

u \in \mathbb{C} i.e. complex-valued roots
\phi_j are the AR coefficients.

Replacing the Backshift operator by Z

As far as the mathematics goes it isn’t clear how we get from Equation 93.2 to the characteristic polynomial \Phi(z) in Equation 93.3.

As far as I can tell we can justify it by saying we have done three things:

We set \varepsilon_t = 0 to study the homogeneous equation. This is not about minimizing residuals but about analyzing the underlying linear recurrence, which governs the dynamics of the AR process without noise.
We assume y_t \neq 0. Since \Phi(B)y_t = 0, we can treat this as a linear operator equation. By analogy to polynomial equations, if ab = 0 and b \neq 0, then a = 0. Thus, we analyze the operator polynomial \Phi(B) = 0 as the condition for nontrivial solutions.
We replace the backshift operator B with a complex number z, leading to the characteristic polynomial \Phi(z):

This is justified by assuming exponential solutions of the form y_t = z^t, a standard method in solving linear recurrences. Substituting into \Phi(B)y_t = 0 yields \Phi(z^{-1}) = 0, which we often rewrite as \Phi(z) = 0 for convenience.
Alternatively, we view B as a placeholder in a polynomial ring of operators, and replacing B by z evaluates that polynomial. This is a standard algebraic move in analyzing recurrence relations or z-transforms.
Another point is that the backshift operator B is just a linear operator on the space of time series. I.e it can be viewed as a matrix that shifts the time series back by one step.

Diving a little deeper \operatorname{B} is represented by a nilpotent matrix, which is a matrix that shifts the time series back by one step. This is a standard algebraic move in analyzing recurrence relations or z-transforms.

B = \begin{pmatrix} 0 & 0 & 0 & \cdots & 0 \\ 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \end{pmatrix} \tag{93.4}

This matrix is nilpotent because if you multiply it by itself enough times, it will eventually become the zero matrix. This is a key property that allows us to define the characteristic polynomial in terms of the backshift operator.

Note: that the above derivation wasn’t presented in the slides but is my own attempt to clarify the steps involved I hope it is helpful not 100% sure it is correct.

This polynomial and its roots tells us a lot about the process and its properties. One of the main characteristics is it allows us to think about things like quasi-periodic behavior, whether it’s present or not in a particular AR(p) process.
It allows us to think about whether a process is stationary or not, depending on some properties related to this polynomial.
In particular, we are going to say that the process is stable if all the roots of the characteristic polynomial have a modulus greater than one.

stability condition

Why are we interested in this autoregressive lag polynomial?

\Phi(z) = 0 \iff |z| > 1 \qquad \text{(stability condition)} \tag{93.5}

For any of the roots, it has to be the case that the modulus of that root, they have to be all outside the unit circle.
If a process is stable, it will also be stationary.

We can show this as follows:

If the AR(p) has all the roots of its characteristic polynomial outside the unit circle, it is stable and stationary and can be written in terms of an infinite order moving average process:

y_t = \Psi(\operatorname{B}) \varepsilon_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j} \quad \text {with} \ \psi_0 = 1 \quad \text{ and } \sum_{j=0}^{\infty} |\psi_j| < \infty \tag{93.6}

where:

\varepsilon_t is a white noise process with zero mean and constant variance v.
\operatorname{B} is the lag operator AKA the backshift operator defined by \operatorname{B} \varepsilon_t = \varepsilon_{t-1}. This need to be applied to a time series \varepsilon_t to get the lagged values.
\Psi(\operatorname{B}) is the infinite order polynomial in \operatorname{B} that represents a linear filter applied to the noise process.
\psi_t = 1 is the weight for the white noise at time t.
the constraint \psi_0 = 1 ensures that the current shock contributes directly to y_t
the constraint on the weights \sum_{j=0}^{\infty} |\psi_j| < \infty ensures that the weights decay sufficiently fast, so that the process does not explode making it is stable and therefore stationary.

Notation Confusion

the notation with \psi a functional of operator B and \psi_i as constants is confusing in both the reuse if the symbol and the complexity.

We can also rewrite the characteristic polynomial in terms of the reciprocal roots of the polynomial.

The zeros of the characteristic polynomial are the roots of the AR(p) process.

\Phi(u) = \prod_{j=1}^{p} (1 - \alpha_j u) = 0 \implies u = \frac{1}{ \alpha_j} \qquad \text{(reciprocal roots)} \tag{93.7}

where:

\alpha_j are the reciprocal roots of the characteristic polynomial.

Here, u is any complex valued number.

93.1.3 State Space Representation of AR(p)

Figure 93.2: A state space representation of AR(p)

This material is covered in (Prado, Huerta, and West 2000, sec. 2.1.2)

Another important representation of the AR(p) process, is based on a state-space representation of the process. This representation is useful because it allows us to study some important properties of the process. We will make some connections with these representations later when we talk about dynamic linear models, is given as follows for an AR(p).

y_t = \operatorname{F}^\top \mathbf{x}_t \qquad \text{(observational equation)} \tag{93.8}

where:

\operatorname{F} is a linear mapping from the state space into the state space, so it is just vector of coefficients, specifically F = (1, 0, \ldots, 0)^\top for the AR(p) process. The rank for this operator is p since it has to match the dimension of the state vector \mathbf{x}_t.
\mathbf{x}_t is a vector with the state of the process at time t.

To demystify \operatorname{F}, it is just picking the current state from vector \mathbf{x}_t with states for the p previous time steps.

\mathbf{x}_t = G \mathbf{x}_{t-1} + \mathbf{w}_t \qquad \text{(state equation)} \tag{93.9}

where:

\mathbf{x}_t is a vector of the current state of the process.
G is a state transition matrix that describes the relationship between the current state and the previous state.
\mathbf{x}_{t-1} is a vector of the previous state of the process.
\mathbf{w}_t is a vector of innovations or noise at time t, which is assumed to be normally distributed with zero mean and constant variance. The first component is going to be the \varepsilon_t and the rest of the components are going to be zero and the dimension of this vector is going to be p.

and state transition matrix G

The G matrix in this representation is going to be a very important matrix, the first row is going to contain the AR parameters, the AR coefficients, and we have p of those. In the block below this is an identity matrix, and a zero column on it’s right.

G = \begin{pmatrix} \phi_1 & \phi_2 & \phi_3 & \dots & \phi_{p-1} & \phi_p \\ 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & 1 & 0 & \dots & 0 & 0 \\ \vdots & \ddots & \ddots & \ddots & & \vdots \\ 0 & 0 & 0 & \dots & 1 & 0 \end{pmatrix}. \tag{93.10}

This state transition matrix \mathbf{G} is important because it is related to the characteristic polynomial, in particular, is related to the reciprocal roots representation of the characteristic polynomial that we discussed before. The structure of this \mathbf{G} matrix is such that it captures the Markovian dynamics of the autoregressive process, wherein each \mathbf{x}_t is a function of \mathbf{x}_{t-1}.

The eigenvalues of this matrix correspond precisely to the reciprocal roots of the characteristic polynomial.

Prado points out that if you perform the matrix operations described in Equation 93.8 and Equation 93.9, you will recover the form of your autoregressive process from the definition in Equation 93.1 .

So we have given the state-space representation of an AR(p). One advantage of working with this representation is that we can use with it some definitions that apply to dynamic linear models or state-space models. One such definition is the so-called forecast function.

93.1.4 The forecast function of AR(p)

The forecast function, which we denote as f_t(h) is a function f that depends on the time t that you’re considering, and then you’re looking at forecasting h steps ahead in your time series. If you have observations up to today and you want to look at what is the forecast function five days later, we will set h=5 there. The forecast function is just the expected value and we can just think of this as the expected value of y_{t+h}. conditional on all the observations or all the information you have received up to time t.

\begin{aligned} f_t(h) &= \mathbb{E}[y_{t+h} \mid y_{1:t}] &\text{(defn)}\\ &= \mathbf{F}^\top \mathbb{E}[\mathbf{x}_{t+h} \mid y_{1:t}] &\text{(observation eq.)} \\ &= \mathbf{F}^\top \mathbf{G} \mathbb{E}[\mathbf{x}_{t+h-1} \mid y_{1:t}] &\text{(state eq.)} \\ &= \mathbf{F}^\top \mathbf{G}^h \mathbb{E}[\mathbf{x}_t\mid y_{1:t}], & \text{(repeat)} \\ &= \mathbf{F}^\top \mathbf{G}^h \mathbf{x}_t & h > 0, \forall t \ge p \end{aligned} \tag{93.11}

where:

y_{1:t} is the vector of all observations up to time t,
\mathbf{x}_t is the state vector at time t.
\mathbf{G}^h is the state transition matrix raised to the power h, which captures the dynamics of the AR process over h steps ahead. The eigenvalues of this matrix are the reciprocal roots of the characteristic polynomial of the AR(p) process.
In this derivation:
- We start with the expectation and rewrite it expectation using the state-space representation equations:
- We start using the observation equation, replacing y_{t+h} with F^T(\mathbf x_{t+h}) in that case.
- Next we apply the second (state) equation.
  - This introduces \mathbf{G}
  - and updates our expected value to the next time step, so we have \mathbf{X}_{t+h-1}.
  - Note: that since the w_{t+h-1} terms are independent of the past observations, their expected value is zero, so we can leave them out.
- We repeat applying the state equation for all the lags until we get to time t, ending up with a product of h \mathbf{G} matrices here, so we end up with \mathbf{G}^h and the each time we drop a lag in the expectation of \mathbf{X}_{t+h}.
- We end up with the expected value of \mathbb{E}[\mathbf{x}_t] which is just the vector \mathbf{x}_t the current state of the process.

This result is significant because it now allows us to make the connection of \mathbf{G} and its eigenstructure. One of the features of \mathbf{G} is that the eigenstructure is related to the reciprocal roots of the characteristic polynomial. So when we working with the case in which we have exactly p distinct roots. We can further simplify using by rewriting \mathbf{G} in terms of its eigendecomposition. We can rewrite \mathbf{G} as \mathbf{E}, a matrix \mathbf{\Lambda} here, \mathbf{E}^{-1}.

\mathbf{G}= \mathbf{E} \mathbf{\Lambda} \mathbf{E}^{-1} \qquad \text{(eigendecomposition)} \tag{93.12}

where:

\mathbf{\Lambda} = \operatorname{diag}(\alpha_1, \ldots, \alpha_p) is a diagonal matrix consisting of the reciprocal roots \alpha_i, from the reciprocal formulation of the characteristic polynomial in Equation 93.7.
- While the order of the roots doesn’t matter but there is a tradition of order eigenvalues them in decreasing value and this can help us to identify our model!
\mathbf{E} is a eigenvectors decomposition for the matrix \mathbf{G} E = [\mathbf{e}_1 ; \cdots ; \mathbf{e}_p], where \mathbf{e}_i is the eigenvector corresponding to \alpha_i.
- Since each root is unique the eigenvectors are all different and linearly independent.
- Note that the eigendecomposition, i.e. the eigenvectors, has to follow the order set in \Lambda

We can now rewrite \mathbf{G}^h as:

\mathbf{G}^h= \mathbf{E} \mathbf{\Lambda}^h \mathbf{E}^{-1} \qquad \text{(eigendecomposition of G)} \tag{93.13}

Why arn’t the eigendecomposition powered up

So here is an easy answer

if \mathbf{G}= \mathbf{E} \mathbf{\Lambda} \mathbf{E}^{-1}

if we multiply out the all the E and E^{-1} terms cancel out except the last and first.

Whatever elements have in the matrix of eigenvectors \mathbf{E}, they are now going to be functions of the reciprocal roots u_i=\frac{1}{\phi_j}. The power that appears here, which is the number of steps ahead that you want to forecast in your time series for prediction,

f_t(h) = E(y_{t+h} \mid y_{1:t}) = F^\top G^h x_t, \quad h > 0, \quad \forall t \ge p \tag{93.14}

where:

c_t are constants that depend on the E matrix.
\alpha_i^h are the reciprocal roots raised to the power h.

We can see from the form of Equation 93.22 that if the process is stable, i.e. all the all the moduli of my reciprocal roots are going to be below one. So it is going to decay exponentially as a function of h. And this AR(p) process is going to be stationary.

Interpreting the forecast function

I recall that in physics we often view the eigenvectors as resonances of the system’s dynamics, and the eigenvalues as the corresponding resonant frequencies. This is a good analogy to think about the reciprocal roots of the AR(p) process. The contribution of each of the roots \alpha_i to f(t) depends on how close that modulus of that reciprocal root is to 1 or -1. For roots that have relatively large values of the modulus, then they are going to have more contribution in terms of what’s going to happen in the future.

Depending on whether those reciprocal roots \alpha_i are real-valued or complex-valued, you’re going to have behavior here that may be quasiperiodic for complex-valued roots or just non-quasiperiodic for the real valued roots.

In the text book AR(p) forecasting is covered in (Prado, Huerta, and West 2000, sec. 2.2) and the mean square errors can be estimated using an algorithm by (Brockwell and Davis 1991).

Video Transcript

AR(p) process and its Order

We will now discuss the general autoregressive process of order p. As in the AR1, we’re going to think about expressing current values of the time series in terms of past values of the time series. In this case, we’re going to think about expressing the current time in terms of the past p-values of the time series process. That’s why it’s called an autoregressive model of order p. The order tells you how many lags you are going to be considering.

We’re going to then assume a process that is going to have this structure. Again, as I said, we are regressing on the past p-values. As before, in this case, we’re going to assume that the epsilon t’s are independent identically distributed random variables with a normal distribution centered at zero and variance v. This is the assumption that we are going to use here. As you see now, the number of parameters has increased.

We had one coefficient before, now we’re going to have p coefficients, and we’re going to have also the variance of the process. One thing that is very important used to characterize and understand the properties of autoregressive processes is the so-called characteristic polynomial. The AR characteristic polynomial is a polynomial, I’m going to denote it like this, and it’s going to be a function of the phi coefficients here. It’s going to look like a polynomial here, and is a polynomial of order p. Here, U is any complex valued number.

Characteristic Polynomial, Stationarity and Stability

Why do we study this polynomial? This polynomial tells us a lot about the process and a lot about the properties of this process.

One of the main characteristics is it allows us to think about things like quasi-periodic behavior, whether it’s present or not in a particular AR p process. It allows us to think about whether a process is stationary or not, depending on some properties related to this polynomial. In particular, we are going to say that the process is stable here.

This is the stability condition. If all the roots of this polynomial have a modulus, that is that they all have modulus that are greater than one, if, I’m going to write it like this, Phi of U, this polynomial is zero for a root, so for any value of U such that this happens, then we say that the process is stable. For any of the roots, it has to be the case that the modulus of that root, they have to be all outside the unit circle. When a process is stable, it’s also going to be stationary. In this case, if the process is stable, then we have a stationary process. This is going to characterize also the stationarity of the process in terms of the roots of the characteristic polynomial.

Infinite Order Moving Average Representation

Once the process is stationary, and if all the roots of the characteristic polynomial are outside the unit circle, then we will be able to write this process in terms of an infinite order moving average process.

In this case, if the process is stable, then we are going to be able to write it like this. I’m sorry, this should be epsilon t. I am going to have an infinite order polynomial here on B, the backshift operator that I can write down just as the sum, j goes from zero to infinity. Here Psi_0 is one. Then there is another condition on the Psi’s for this to happen. We have to have finite sum of these on these coefficients. Once again, if the process is stable, then it would be stationary and we will be able to write down the AR as an infinite order moving average process here. If you recall, B is the backshift operator. Again, if I apply this to y_t, I’m just going to get y_t minus j. I can write down Psi of B, as 1 plus Psi_1 B, B squared, and so on. It’s an infinite order process.

Reciprocal roots representation

The AR characteristic polynomial can also be written in terms of the reciprocal roots of the polynomial. So instead of considering the roots, we can consider the reciprocal roots. In that case, let’s say the Phi of u for Alpha 1, Alpha 2, and so on. The reciprocal roots.

Why do we care about all these roots? Why do we care about this structure? Again, we will be able to understand some properties of the process based on these roots as we will see.

State-Space Representation

We will now discuss another important representation of the AR(P) process, one that is based on a state-space representation of the process. Again, we care about this type of representations because they allow us to study some important properties of the process.

In this case, our state-space or dynamic linear model representation, we will make some connections with these representations later when we talk about dynamic linear models, is given as follows for an AR(P). I have my y_t. I can write it as F transpose and then another vector x_t here. Then we’re going to have x_t is going to be a function of x_t minus 1. That vector there is going to be an F and a G. I will describe what those are in a second. Then I’m going to have another vector here with some distribution. In our case, we are going to have a normal distribution also for that one. In the case of the AR(P), we’re going to have x_t to be y_t, y_t minus 1. It’s a vector that has all these values of the y_t process. Then F is going to be a vector. It has to match the dimension of this vector. The first entry is going to be a one, and then I’m going to have zeros everywhere else. The w here is going to be a vector as well. The first component is going to be the Epsilon t. That we defined for the ARP process. Then every other entry is going to be a zero here. Again, the dimensions are going to match so that I get the right equations here.

G-matrix

Then finally, my G matrix in this representation is going to be a very important matrix, the first row is going to contain the AR parameters, the AR coefficients.
We have p of those. That’s my first row. In this block, I’m going to have an identity matrix. It’s going to have ones in the diagonal and zeros everywhere else. I’m going to have a one here, and then I want to have zeros everywhere else. In this portion, I’m going to have column vector here of zeros.

This is my G matrix. Why is this G matrix important? This G matrix is going to be related to the characteristic polynomial, in particular, is going to be related to the reciprocal roots of the characteristic polynomial that we discussed before. The eigenvalues of this matrix correspond precisely to the reciprocal roots of the characteristic polynomial.

We will think about that and write down another representation related to this process. But before we go there, I just want you to look at this equation and see that if you do the matrix operations that are described these two equations, you get back the form of your autoregressive process. The other thing is, again, this is called a state-space representation because you have two equations here. One, you can call it the observational level equation where you are relating your observed y’s with some other model information here. Then there is another equation that has a Markovian structure here, where x_t is a function of x_t minus 1.

This is why this is a state-space representation.

One of the nice things about working with this representation is we can use some definitions that apply to dynamic linear models or state-space models, and one of those definitions is the so-called forecast function. The forecast function, we can define it in terms of, I’m going to use here the notation f_t h to denote that is a function f that depends on the time t that you’re considering, and then you’re looking at forecasting h steps ahead in your time series. If you have observations up to today and you want to look at what is the forecast function five days later, you will have h equals 5 there. It’s just the expected value. We are going to think of this as the expected value of y_t plus h. Conditional on all the observations or all the information you have received up to time t. I’m going to write it just like this.

Using the state-space representation, you can see that if I use the first equation and I think about the expected value of y_t plus h is going to be F transpose, and then I have the expected value of the vector x_t plus h in that case. I can think of just applying this, then I would have expected value of x_t plus h given y_1 up to t. But now when I look at the structure of x_t plus h, if I go to my second equation here, I can see that x_t plus h is going to be dependent on x_t plus h minus 1, and there is a G matrix here.

I can write this in terms of the expected value of x_t plus h, which is just G, expected value of x_t plus h minus 1, and then I also have plus expected value of the w_t’s. But because of the structure of the AR process that we defined, we said that all the Epsilon T’s are independent normally distributed random variables center at zero. In this case, those are going to be all zero. I can write down this as F transpose G, and then I have the expected value of x_t plus h minus 1 given y_1 up to t. If I continue with this process all the way until I get to time t, I’m going to get a product of all these G matrices here, and because we are starting with this lag h, I’m going to have the product of that G matrix h times. I can write this down as F transpose G to the power of h, and then I’m going to have the expected value of, finally, I get up to here. This is simply is going to be just my x_t vector. I can write this down as F transpose G^h, and then I have just my x_t.

Again, why do we care? Now we are going to make that connection with this matrix and the eigenstructure of this matrix. I said before, one of the features of this matrix is that the eigenstructure is related to the reciprocal roots of the characteristic polynomial. In particular, the eigenvalues of this matrix correspond to the reciprocal roots of the characteristic polynomial.

If we are working with the case in which we have exactly p different roots. We have as many different roots as the order of the AR process. Let’s say, p distinct. We can write down then G in terms of its eigendecomposition. I can write this down as E, a matrix Lambda here, E inverse. Here, Lambda is going to be a diagonal matrix, you just put the reciprocal roots, I’m going to call those Alpha 1 up to Alpha p. They are all different. You just put them in the diagonal and you can use any order you want. But the eigendecomposition, the eigenvectors, have to follow the order that you choose for the eigenvalues. Then what happens is, regardless of that, you’re going to have a unique G.

But here, the E is a matrix of eigenvectors. Again, why do we care? Well, if you look at what we have here, we have the power G to the power of h. Using that eigendecomposition, we can get to write this in this form. Whatever elements you have in the matrix of eigenvectors, they are now going to be functions of the reciprocal roots. The power that appears here, which is the number of steps ahead that you want to forecast in your time series for prediction, I’m just going to have the Alphas to the power of h. When I do this calculation, I can end up writing the forecast function just by doing that calculation as a sum from j equals 1 up to p of some constants.

Those constants are going to be related to those E matrices but the important point is that what appears here is my Alpha to the power of h. What this means is I’m breaking this expected value of what I’m going to see in the future in terms of a function of the reciprocal roots of the characteristic polynomial. You can see that if the process is stable, is going to be stationary, all the moduli of my reciprocal roots are going to be below one. This is going to decay exponentially as a function of h. You’re going to have something that decays exponentially.

Depending on whether those reciprocal roots are real-valued or complex-valued, you’re going to have behavior here that may be quasiperiodic for complex-valued roots or just non-quasiperiodic for the real valued roots. The other thing that matters is, if you’re working with a stable process, are going to have moduli smaller than one. The contribution of each of the roots to these forecasts function is going to be dependent on how close that modulus of that reciprocal root is to one or minus one. For roots that have relatively large values of the modulus, then they are going to have more contribution in terms of what’s going to happen in the future. This provides a way to interpret the AR process.

93.2 Examples 🎥

93.2.1 AR(1) Process

State-space form: X_t = \phi X_{t-1} + \omega_t
Forecast function: \mathbb{E}[y_{t+h} \mid \mathcal{F}_t] = c \cdot \phi^h
Behavior: Exponential decay (oscillatory if \phi < 0), mimicking the autocorrelation function.
Stability: |\phi| < 1 (reciprocal root 1/\phi has modulus > 1).

93.2.2 AR(2) Process

Characteristic polynomial: 1 - \phi_1 z - \phi_2 z^2
Three root types:
1. Two real distinct reciprocal roots: Forecast function:
  
  \mathbb{E}[y_{t+h} \mid \mathcal{F}_t] = c_{t1} \alpha_1^h + c_{t2} \alpha_2^h
  
  Exponential decay, dominated by root with larger modulus.
2. Two complex conjugate reciprocal roots: Let roots be $r e^{i}$. Forecast function:
  
  \mathbb{E}[y_{t+h} \mid \mathcal{F}_t] = A_t r^h \cos(\omega h + \delta_t)
  
  Behavior: Quasiperiodic with exponential envelope.
3. Repeated reciprocal root ($$ with multiplicity 2): Forecast function:
  
  \mathbb{E}[y_{t+h} \mid \mathcal{F}_t] = (\alpha^h)(a_t + b_t h)
  
  Polynomial-exponential form due to root multiplicity.

93.2.3 Key Concepts

Forecast structure mirrors the roots of the characteristic polynomial.
Stability depends on reciprocal roots (modulus < 1).
Complex roots → sinusoidal terms;
Repeated roots → polynomial multipliers.

This analysis connects forecast behavior to the algebraic properties of AR model roots.

Video Transcript

Now, that we have discussed the characteristic polynomial and the state-space representation for a general AR(P) process, we are going to discuss a few examples just to try to understand what this means in particular cases of AR processes. One of them is going to be, we’re going to go back to the AR(1) and see what is the forecast function and how that relates to what we learned before. The other example we are going to consider here is the AR(2), the autoregressive process of order 2. Let’s begin with the first case, which is again the AR(1). In the AR(1), we are assuming that as usual the model looks like this. Now if I think about the state-space representation for these process is going to be, remember we need to have something that looks like this X_t, X_t G X_t minus 1 and then plus Omega t. In this case, because I have order 1, my X_t is going to be a one-dimensional scalar component, the same thing with F, and then G is going to be just my Phi. Here F is going to be one, X_t is going to be y_t, and then G is going to be Phi. If you think about how we define the forecast function, we said the forecast function is just the suspected value of y_t plus h given everything that we have observed. I think about predicting h steps ahead given all the information I have up to time t. We said that the state-space representation allows us to have something that looks like this. Given that these are all scalars, what we are going to have is that we can write down the forecast function in terms of the reciprocals root of the characteristic polynomial. In the case of the characteristic polynomial, I’m going to have for the AR(1), so before I write down the expression here, is going to be an order 1 polynomial. This is the form of the polynomial. If I want to find the roots of the characteristic polynomial and this implies that I have one over Phi. That’s my root of the characteristic polynomial. For the process to be stable, I need this root to have modulus greater than one or the reciprocal root, which is going to be phi, to have modulus smaller than one. I have that this is, we are going to make here the assumption that we are working with processes that have Phi values that are between minus 1 and one. Again, going back to the forecast function, since the G matrix is Phi, I can write these down as just something that is the power of Phi. It’s going to be, I’m going to have some constant here, whatever that is that comes from this expression. Then I’m going to have my Phi to the power of h. That’s the expression we get for the AR(1), and we can see that it has the same form that we discussed for the auto-correlation function of an AR process. If you remember, the auto-covariance and the auto-correlation function of an AR process is decaying exponentially as a power of h just in the same form that this forecast function is decaying exponentially on Phi. If the Phi negative, we’re going to have an oscillatory decay. If the Phi positive, we’re just going to have an exponential decay. It follows exactly the same structure as the auto-covariance function for the AR(1). There is that correspondence and here we are just thinking about how we’re going to predict the process h steps ahead. Our second example is just the AR(2) process. Let’s think about what kind of structure we have for the forecast function of this process. In an AR(2) we are going to regress on the first two values of the process. We have to AR coefficients, Phi_1 and Phi_2. The characteristic polynomial is going to be given another two polynomial in this case. Then again, if we think about stable processes, they are going to have characteristic roots that have moduli larger than 1 and the reciprocal roots have moduli smaller than 1. If I think about finding the roots of this polynomial, this is the same as the just getting 2 squared plus Phi_1u minus 1 equals to 0. Then this is a quadratic polynomial. If I want to find the roots, I can just think about this equation. Looking at this expression, we can think about three possible cases depending on what happens with the expression that is here in the square root. I can have values of Phi_1 and Phi_2 that make this expression different from zero and positive. In that case, I’m going to have two different real reciprocal roots. I can think about having values of Phi_1 and Phi_2 that make this expression equal to 0. In that case, we can again think about having a single reciprocal root with multiplicity 2. Then I can think about cases in which I’m going to have values that make this expression inside the square root negative. In that case we are going to have complex-valued roots. Let’s think about those cases. The first case we can describe this the case in which Phi_1 squared plus 4 Phi_2 is greater than 0. In that case, we are going to have two characteristic. They are both real valued. I can write them down as u_1. If I think now about the reciprocal roots, I’m going to have just these are real valued. I’m going to have Alpha 1 is going to be 2 Phi_2 over minus Phi_1. Then I’m going to have my Alpha_2 plus Phi_1. So I have these two real-valued reciprocal roots. So I can write down the forecast function again f_t(h), which is my expected value of y_t plus h, given all the observations up to time t, I can write it down. I’m going to have two components. One is going to be related to the first reciprocal root, and the other one is going to be related to the second reciprocal root. I have a power of h here for both. Then I’m going to have to let’s call it c_t1 and c_t2 here. These are constants. As you can see, I’m going to have two components in my forecast function. If one root has a modulus that is larger than the other one, that one is going to have a larger contribution to the forecast function h steps ahead than the other one. Again, we have an exponential decay assuming that the process is stable, but we have two components are contributing to the forecast function. The second case I’m going to discuss in terms of the characteristic roots here is the case in which you have values of Phi_1 and Phi_2 that lead to negative values inside this square root. So here we would have the case in which Phi_1 square plus 4. This is a negative value. Because we have something negative inside of the square root, what this means is we’re going to have two complex valued roots and therefore two complex valued reciprocal roots. Since they are complex valued, they are going to appear as a pair of complex conjugates. We can do the calculation so I can write down u1 and u2 and then think about the reciprocal of those. I’m just writing the reciprocal roots. In this case we can write them like this over two. Once you do all your calculations and take the reciprocals. This is a pair of complex conjugates. One root is a complex conjugate of the other one. These are different reciprocal roots. Because they are complex valued, we can write them in a different way. We can write them as Alpha 1 r exponential of i Omega, let’s call it, and Alpha 2 r exponential of minus i Omega. If I write these in polar coordinates, I’m going to have a modulus. The modulus is the same for each of these and then I’m going to have a frequency. The modulus r the frequency is Omega here. One is the complex pair of the other one and if you think about again, the form that this is going to have in the forecast function, we can write down the forecast function. Again, we have the contribution. You can think of this as a constant times Alpha 1 to the power of h and then plus another constant times Alpha 2 to the power of h. But because they are complex conjugates, we can write down the expression in terms of sinusoidal form as follows. If I just do this, this is what we were describing before. There is an Alpha 2 to the power of h. Using these here, we can rewrite everything and the fact that they are complex conjugates, also the eigenstructure is going to be like that. We can write down in terms of something that looks like this. This is the cosine of that frequency times h. h is the number of steps ahead, plus some random phase here that depends on the information up to time t. Then here for my amplitude, I’m going to have again, something that depends on the information I have up to time t, and then I have the modulus of the reciprocal root to the power of h. You can see here, there is also some exponential decay, assuming that we are in the stable case where r is essentially between zero and one, because that’s the modulus of the reciprocal root. Then I have an exponential decay as a power of h, but it’s a sinusoidal. It has a sinusoidal form. This is just a cosine wave and it’s going to be having this is what we call quasiperiodic behavior. The periodicity here is given by this Omega. There is a random phase and that’s why we say that the form of the forecast function is quasiperiodic. When you have an ar(2) depending on the structure of the roots, you can also have a case in which you have a quasiperiodic process. The final case for this is the case in which you have values of Phi 1 and Phi 2 that give you a zero here inside the square root. In that case, you are going to have a single root with multiplicity two. You can think about how many times the root repeats. We did not write the general expression for the forecast function in the case where you have repeated roots with multiplicity higher than one, but you can show that the forecast function of this, in this particular case, when you have a repeated root, is going to be of the form. You have your Alpha, which is that single reciprocal root that we have here to the power of h times a_t plus b_t times h. You have here a polynomial of order one in h multiplied by that Alpha h. This is the case of the process when you have a single root. In practice, we always work with cases one and two, and we will see that in the examples.

93.3 ACF of the AR(p) 🎥

For a stationary AR(p) process, the autocorrelation function (ACF) satisfies a homogeneous linear difference equation whose solution is a sum of terms involving the reciprocal roots of the characteristic polynomial. Key points:

If there are r distinct reciprocal roots \alpha_1, \ldots, \alpha_r with multiplicities m_1, \ldots, m_r such that \sum m_j = p, the ACF has the general form:

\rho(h) = \sum_{j=1}^r P_j(h)\alpha_j^h,

where each P_j(h) is a polynomial of degree m_j - 1.
For distinct reciprocal roots (common case), all m_j = 1, so \rho(h) is a linear combination of powers of the roots.
AR(1): ACF is \rho(h) = \phi^h, where \phi is the AR coefficient.
AR(2): Three cases arise:
1. Two distinct real roots → exponential decay.
2. Complex conjugate roots → damped sinusoidal behavior r^h \cos(\omega h + \delta).
3. One real root with multiplicity 2 → decay with polynomial factor.
ACF decays exponentially if all reciprocal roots lie inside the unit circle.
The Partial ACF (PACF) of AR(p) is zero for all lags > p.
PACF values can be computed recursively via the Durbin–Levinson algorithm, using sample autocorrelations.

Video Transcript

In the case of an autoregressive process of order p, the autocorrelation function can be written then as the solution of a homogeneous difference equation. Again, we are working with the stationary case. You can recall the notation we were using. Rho is a function of the lag here. For an AR(P) we can write this difference equation. The autocorrelation function has to satisfy this equation. You can show that the solution of this equation is related to the characteristic roots of the AR polynomial. In particular, if we think that the AR process that we are working with has r reciprocal roots. I’m going to call them Alpha_1 up to Alpha_r and they are all distinct. Let’s assume that each of these roots has a specific multiplicity. Then, again, for each of these roots and their multiplicities, they have to satisfy that when we add them all we get the order of the model which is P. For this case, we can write down the solution or the autocorrelation function as a function that is a sum of r components where each of those components is related to one of the roots. I’m going to have Alpha_1^h. Each of these p’s here are polynomials on h. The degree of the polynomial is going to depend on the multiplicity of the root. Each of the polynomials here on h is a polynomial of order, the multiplicity of the root minus 1, m_j minus 1. In the case in which you have exactly P different reciprocal roots, so the number of different reciprocal roots you have in the AR characteristic polynomial is exactly P. Each root is going to have multiplicity 1. Therefore, each of these polynomials is going to be a polynomial of order 0 on h, so it’s just going to be a constant. Usually we work with that case. I’m now going to talk about two examples that we have considered before. In the case of the AR(1), we saw earlier that we can write down the autocorrelation function is just, has this form, is the form Phi^h, where Phi is the AR coefficient. This form is consistent with this result. In the case of a stationary AR 1, we can show that the reciprocal root of the characteristic polynomial is Phi. So what we get here is we have a single root, multiplicity 1. We have the root^h multiplied by a constant. In this case, a constant is equal to 1. You can show that that constant is equal to 1 just using properties of autocorrelations. In the case of the AR 2, we’re also going to have this form. We discussed three cases in terms of the number of reciprocal roots you can have. You can have two real valued different reciprocal roots, each with multiplicity 1. You can have a pair of complex roots. They are different. They have a modulus and a frequency associated to them. You also have the case of one single reciprocal rule, real valued with multiplicity 2. For instance, if you have something that looks like this, so I have a pair of complex valued reciprocal roots, I can write them in this polar representation. R is the modulus of the reciprocal root and Omega here is the frequency. Using this result because each has multiplicity 1, I’m going to again have that. I can write a constant times Alpha_1^h plus another constant times Alpha_2^h, given that these are complex reciprocal roots and they appear as complex conjugates. I can simplify this equation and then get something of the form plus aother constant. Here, c and d are constants. My Omega is my frequency of the roots, r is the modulus. Then we can see again that in the case of stable, stationary processes, my modulus of the reciprocal root is smaller than one. We are going to have an exponential decay multiplied by these cosine waves. so there is going to be this oscillatory behavior. In this case again, we have exponential decay. We can see that we can find the properties of the autocorrelation function. They are related to the reciprocal roots of the characteristic polynomial. We can also think about the partial autocorrelation function. For the partial autocorrelation function is possible to use the Durbin Levinson recursion again to express the partial autocorrelation coefficients as functions of the autocovariance and the autocorrelation. In the case of an AR(P), you can show that it has to be the case that after the lag of the order of the model, all the partial autocorrelation coefficients are going to be zero. That’s a property for the partial autocorrelation for the AR(P), it is zero after lag equal to the model order P. In the case of the ACF, the autocorrelation function, we have that exponential decay assuming that all the roots have moduli smaller than one. If we want to get the sample partial autocorrelation, we can also put the sample autocorrelations into the Durbin Levinson recursion and we can get the expressions for that.

93.4 Simulating data from an AR(p) 🎥

This video goes through the code in the following sections, which simulates data from an AR(p) process and plots the sample ACF and PACF.

Characteristic Roots from AR Coefficients:
- Given AR coefficients (e.g. for AR(8)), compute characteristic roots using polyroot() on the reversed sign polynomial (first term 1, followed by negative AR coefficients).
- Reciprocal roots are obtained as 1/\text{root}.
- Use Mod() for modulus and 2\pi / \text{Arg}() for approximate periods of the reciprocal roots.
Example AR(8):
- Yields 4 complex-conjugate pairs.
- Most persistent: modulus ≈ 0.97, period ≈ 12.7.
- Others show lower modulus and shorter periods, contributing less to persistence.
Simulating AR(2) with Complex Roots:
- Reciprocal root modulus 0.95, period 12 → converted to AR coefficients (≈ 1.65, -0.902).
- Simulated data shows quasi-periodic behavior.
- ACF: decaying sinusoidal pattern.
- PACF: significant at lags 1 and 2, then drops, consistent with AR(2).
Simulating AR(2) with Real Roots:
- Roots: 0.95 and 0.5.
- AR coefficients derived from these.
- No quasi-periodic pattern in data; resembles damped random walk.
- ACF: smooth decay.
- PACF: only first two lags significant.
Simulating AR(3) with Complex + Real Root:
- Complex root pair: modulus 0.95, period 12; real root: modulus 0.8.
- Three AR coefficients derived.
- Simulated series shows quasi-periodic behavior plus extra persistence.
- ACF: still shows decaying periodicity.
- PACF: more than two significant lags, consistent with AR(3).

Key Insight:

The modulus and type (real vs. complex) of reciprocal roots determine persistence and periodicity. The ACF reflects these traits, while the PACF helps identify AR order.

93.5 Computing the roots of the AR polynomial 🗒️ \mathcal{R}

Compute AR reciprocal roots given the AR coefficients

# Assume the folloing AR coefficients for an AR(8)
phi=c(0.27, 0.07, -0.13, -0.15, -0.11, -0.15, -0.23, -0.14)
roots=1/polyroot(c(1, -phi)) # compute reciprocal characteristic roots
r=Mod(roots) # compute moduli of reciprocal roots
lambda=2*pi/Arg(roots) # compute periods of reciprocal roots

# print results modulus and frequency by decreasing order
print(cbind(r, abs(lambda))[order(r, decreasing=TRUE), ][c(2,4,6,8),])

             r          
[1,] 0.9722428 12.731401
[2,] 0.8094950  5.103178
[3,] 0.7196221  2.987712
[4,] 0.6606487  2.232193

Video Transcript

In this code, we will see how to compute the characteristic roots of the AR polynomial given a particular set of AR coefficients. I’m just going to work here with an example in which we have the AR coefficients of an AR(8). This is just setting some values for these coefficients, the first line here is just keeping these in a vector. Based on these coefficients for these AR(8), we’re going to compute the roots of the characteristic polynomial and the reciprocal roots in this case of the characteristic polynomial, so we’re going to use the function polyroot to do that. Because we want the reciprocals, is going to be one over the roots of the characteristic polynomial. For the polyroots function, you just enter the AR coefficients here, remember you have the AR characteristic polynomial, the first coefficient is one, and then you have the negative values of the AR coefficients. Just doing that, if we now want to compute the moduli of the reciprocal roots, we can use a function Mod in R. If we want to compute the periods of the reciprocal roots, we are going to use a function Arg, these gives me the arguments, but if I compute two times Pi over the argument of the reciprocal roots, I get the periods. We can now bring the results in terms of modulus and frequency, and we’re printing them by decreasing order. If you look at the results down here, you’re going to have an AR. Again, an AR(8) for this particular set of AR coefficients, we are going to have exactly four pairs of complex reciprocal roots. This here, the first line corresponds to the modulus and the period of the first root, so it’s a very persistent modulus of 0.97. Then we have a period that is around 12.7. Then the rest of the roots are also complex valued, they are in decreasing moduli, so we can see that the second one has modulus around 0.81 With period around five, the third one 0.72 with period around three, and then the last 0.66 with period around 2.23. If you remember, is going to matter in terms of the contribution to both, the autocorrelation function and to the forecast function of this process. Now, I’m just going to show you what we can do in terms of simulating from different AR processes of orders two and three, and then looking at the properties of those processes. Here, the first example I’m considering is an AR(2) that happens to have one pair of complex valued reciprocal roots the modulus 0.95 and the period is 12, so we just keep those, we transform to the Phi coefficients using the equations we have described, and then we get the AR coefficients. Now that we have the AR coefficients and this is my AR coefficient, the one corresponding to the first lag is about 1.65 and minus 0.902 for the second lag. We can now generate data from an AR(2) use the function arima.sim. I’m setting the random seed first and then I’m going to select here the number of data points that I want to simulate, in this case is going to be 300 data points. The variance of the process at the observational level here, so it’s just the innovations standard deviation and variance is one, then I’m going to use the function and I’m going to say that I’m going to keep my data here in the yt object and I’m going to use the function arima.sim so it is 300 observations. The model here is a pure auto-regressive model. The first component here corresponds to the AR coefficients, so I just pass these as a vector containing the AR coefficients for this AR(2) and then I specify the standard deviation as well. Once I do that, I can plot my simulated data is here and we can see that it has these quasi periodic behavior. The quasi periodic behavior is consistent with these. Reciprocal roots are complex valued with period 12 and a persistent modulus. Then if we look at properties of this process in terms of the autocorrelation function and partial autocorrelation function, we can plot them both, and if we zoom in here, just the first function that we’re plotting here is the sample ACF, and we can see again, I’m just plotting the first 50 lags of the sample ACF. We have that behavior that we expect for this type of process is process has a pair of complex valued roots with period 12. We can see these quasi-periodicity appears in the ACF, and then there is a decay in the ACF coefficients in the sample ACF coefficients as the lag increases, which is consistent with the fact that the root has a modulus, that is, the reciprocal root has a modulus that is below one. If we think about the partial ACF, remember for an autoregressive process of order p, the partial ACF, these are the sampled. Partial ACFs are going to be zero after the lag of the process. This is the theoretical property here for the sample. We can see that this is also the case. The first two coefficients are significant and then the other ones are within the confidence bands provided by the R function. This is for this particular example. I’m now going to simulate an AR(2) with two real reciprocal roots rather than having one pair of complex valued, reciprocal roots, we have two real reciprocal roots, one with modulus 0.95 and another one with modulus 0.5. If we do the same here, we have the Phi coefficients, so we just transform the reciprocal roots to the AR coefficients. These are now my two AR coefficients and I’m going to just do something similar to what I did before. I’m going to sample 300 data points using arima.sim function. The standard deviation of the innovation continues to be one. I just changed the AR coefficients. This is my yt, we can plot the simulated observations and we can see that the process is different. The data set is different to what we had before, so it shows different characteristics. There is no quasiperiodic behavior that is apparent in this particular plot. There is this behavior that is almost similar, a little bit like a random walk type of behavior, which is consistent with the fact that we have, two real reciprocal roots and one of them is very persistent, is the modulus is 0.95. When we look at the ACF and the PACF of this data, so this is again the sample ACF and the sample PACF, just zooming in here. We can see that for the sample ACF, we again have this decay as the lag decreases, we don’t have any quasiperiodic behavior here. This is a process that is just decaying over time. Then for the sample PACF, we have the first two coefficients that are significantly different from zero, and then the remaining coefficients are mostly within the confidence bands. This is again consistent with what the characteristics of the process that we were expecting. The next example is an AR(3). Here, I’m going to simulate from an autoregressive process of order 3. There are different alternatives that we can have in terms of the complex or real reciprocal roots. In this case, I’m going to assume that we have one pair of complex valued reciprocal roots and one real reciprocal root, the complex valued reciprocal root, I’m going to assume that that pair has modulus 0.95. Again, it’s fairly persistent and that period of 12, so in that sense is similar to the behavior that we had before when we considered the AR(2) with this particular root structure. But now, in addition to that, we are going to have another root that has modulus 0.8 and that one is real valued. Again, we use this function to obtain the Phis. If you look at the Phi coefficients here now, we have three coefficients: Phi[1], Phi[2], and Phi[3]. They go with the past three values of the yts. We are going to once again sample 300 data points using the arima.sim function to do that. We have the same standard deviation as before and we can plot the process. You can see here that this process has very many characteristics that are similar to what we observe with the first example with the AR(2). It has that same roots structure for the complex valued root. But in addition to that, we also have a real valued reciprocal root. If we look at the ACF, the sample ACF and the sample PACF, we can see now, again, the behavior is similar to what we had before. The only difference here is that we are going to have more structure. Here, we have that quasiperiodic behavior that is decaying also as a function of the lag and then for the partial ACF, we can see that it’s indicating that maybe we have more than one lag, that is more than the first two lags that are significantly different from zero. This is the case for an AR(3). Again, the process is still dominated by the root that has the highest modulus, which in this case happens to be the complex valued root. But there is an additional root here that changes a bit the structure of the process and it will change, therefore the ACF and the PACF.

93.6 Simulating data from an AR(p) 🗒️ \mathcal{R}

R code to simulate data from an AR(2) with one pair of complex-valued reciprocal roots and plot the corresponding sample ACF and sample PACF

## simulate data from an AR(2)
set.seed(2021)
## AR(2) with a pair of complex-valued roots with modulus 0.95 and period 12 
r=0.95
lambda=12 
phi=numeric(2) 
phi[1]<- 2*r*cos(2*pi/lambda) 
phi[2] <- -r^2
phi

[1]  1.645448 -0.902500

T=300 # number of time points
sd=1 # innovation standard deviation
yt=arima.sim(n=T, model = list(ar = phi), sd=sd)

par(mfrow = c(3, 1), mar = c(3, 4, 2, 1), cex.lab = 1.5)
## plot simulated data
ts.plot(yt)
## draw sample autocorrelation function
acf(yt, lag.max = 50,
    type = "correlation", ylab = "sample ACF", 
    lty = 1, ylim = c(-1, 1), main = " ")

## draw sample partial autocorrelation function
pacf(yt, lag.ma = 50, main = "sample PACF")

R code to simulate data from an AR(2) with two different real-valued reciprocal roots and plot the corresponding sample ACF and sample PACF

### Simulate from AR(2) with two real reciprocal roots (e.g., 0.95 and 0.5)
set.seed(2021)
recip_roots=c(0.95, 0.5) ## two different real reciprocal roots
phi=c(sum(recip_roots), -prod(recip_roots)) ## compute ar coefficients
phi

[1]  1.450 -0.475

T=300 ## set up number of time points
sd=1 ## set up standard deviation
yt=arima.sim(n=T,model = list(ar=phi),sd=sd) # generate ar(2)

par(mfrow = c(3, 1), mar = c(3, 4, 2, 1),  cex.lab = 1.5, cex.main = 1.5)
### plot simulated data 
ts.plot(yt)
### plot sample ACF
acf(yt, lag.max = 50, type = "correlation",  main = "sample ACF")
### plot sample PACF
pacf(yt, lag.max = 50, main = "sample PACF")

R code to simulate data from an AR(3) with one real reciprocal root and a pair of complex-valued reciprocal roots and plot the corresponding sample ACF and sample PACF

### Simulate from AR(3) with one real root 
### and a pair of complex roots (e.g., r=0.95 and lambda = 12 and real root with
### 0.8 modulus)
set.seed(2021)
r= c(0.95, 0.95, 0.8) ## modulus
lambda=c(-12, 12) ## lambda
recip_roots=c(r[1:2]*exp(2*pi/lambda*1i), r[3]) ## reciprocal roots
phi <- numeric(3) # placeholder for phi
phi[1]=Re(sum(recip_roots)) # ar coefficients at lag 1
phi[2]=-Re(recip_roots[1]*recip_roots[2] + recip_roots[1]*recip_roots[3] + recip_roots[2]*recip_roots[3]) # ar coefficients at lag 2
phi[3]=Re(prod(recip_roots))
phi

[1]  2.445448 -2.218859  0.722000

T=300 # number of time points
sd=1 # standard deviation
yt=arima.sim(n=T,model = list(ar=phi), sd = sd) # generate ar(3)

par(mfrow = c(3,1),  mar = c(3, 4, 2, 1), cex.lab = 1.5, cex.main = 1.5)
### plot simulated data 
ts.plot(yt)
### plot sample ACF
acf(yt, lag.max = 50, type = "correlation",  main = "sample ACF")
### plot sample PACF
pacf(yt, lag.max = 50, main = "sample PACF")

93.7 The AR(p): Review 🗒️

This section is based on material from the handout but we also covered it in greater detail at the beginning of the lecture.

93.7.1 AR(p): Definition, stability, and stationarity

AR(p)

A time series follows a zero-mean autoregressive process of order p, of AR(p), if:

y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \ldots + \phi_p y_{t-p} + \varepsilon_t \qquad \tag{93.15}

where \phi_1, \ldots, \phi_p are the AR coefficients and \varepsilon_t is a white noise process

with \varepsilon_t \sim \text{i.i.d. } N(0, v), for all t.

The AR characteristic polynomial is given by

\Phi(u) = 1 - \phi_1 u - \phi_2 u^2 - \ldots - \phi_p u^p,

with u complex-valued.

The AR(p) process is stable if \phi(u) = 0 only when \|u\| > 1. In this case, the process is also stationary and can be written as

y_t = \psi(B) \varepsilon_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j},

with \psi_0 = 1 and \sum_{j=0}^{\infty} |\psi_j| < \infty. Here B denotes the backshift operator, so B^j \varepsilon_t = \varepsilon_{t-j} and

\psi(B) = 1 + \psi_1 B + \psi_2 B^2 + \ldots + \psi_j B^j + \ldots

The AR polynomial can also be written as

\Phi(u) = \prod_{j=1}^{p} (1 - \alpha_j u),

with \alpha_j being the reciprocal roots of the characteristic polynomial. For the process to be stable (and consequently stationary), |\alpha_j| < 1 for all j = 1, \ldots, p.

93.7.2 AR(p): State-space representation

An AR(p) can also be represented using the following state-space or dynamic linear (DLM) model representation:

y_t = F^{\top} x_t, \tag{93.16}

x_t = G x_{t-1} + \omega_t, \tag{93.17}

with

x_t = (y_t, y_{t-1}, \dots, y_{t-p+1})^{\top} \tag{93.18}

where F is a mapping from the state vector to the observed variable:

F = (1, 0, \dots, 0)^{\top} \tag{93.19}

\omega_t = (\varepsilon_t, 0, \dots, 0)^{\top} \tag{93.20}

and state transition matrix G

Using this representation, the expected behavior of the process in the future can be exhibited via the forecast function:

f_t(h) = E(y_{t+h} \mid y_{1:t}) = F^\top G^h x_t, \quad h > 0, \quad \forall t \ge p \tag{93.22}

Where G^h is the h^-th power of the matrix G. The eigenvalues of the matrix G are the reciprocal roots of the characteristic polynomial.

Eigenvalues

The eigenvalues can be real-valued or complex-valued.
If they are Complex-valued the eigenvalues/reciprocal roots appear in conjugate pairs.

Assuming the matrix G has p distinct eigenvalues, we can decompose G into G = E \Lambda E^{-1}, with

\Lambda = \text{diag}(\alpha_1, \dots, \alpha_p),

for a matrix of corresponding eigenvectors E. Then, G^h = E \Lambda^h E^{-1} and we have:

f_t(h) = \sum_{j=1}^{p} c_{tj} \alpha_j^h.

93.7.3 ACF of AR(p)

For a general AR(p), the ACF is given in terms of the homogeneous difference equation:

\rho(h) - \phi_1 \rho(h-1) - \ldots - \phi_p \rho(h-p) = 0, \quad h > 0.

Assuming that \alpha_1, \dots, \alpha_r denotes the characteristic reciprocal roots each with multiplicity m_1, \ldots, m_r, respectively, with \sum_{i=1}^{r} m_i = p. Then, the general solution is

\rho(h) = \alpha_1^h p_1(h) + \ldots + \alpha_r^h p_r(h),

with p_j(h) being a polynomial of degree m_j - 1.

93.7.4 Example: AR(1)

We already know that for h \ge 0, \rho(h) = \phi^h. Using the result above, we have

\rho(h) = a \phi^h,

and so to find a, we take \rho(0) = 1 = a \phi^0, hence a = 1.

93.7.5 Example: AR(2)

Similarly, using the result above in the case of two complex-valued reciprocal roots, we have

\rho(h) = a \alpha_1^h + b \alpha_2^h = c r^h \cos(\omega h + d).

93.7.6 PACF of AR(p)

We can use the Durbin-Levinson recursion to obtain the PACF of an AR(p). c.f. Section K.1.

Using the same representation but substituting the true autocovariances and autocorrelations with their sampled versions, we can also obtain the sample PACF.

It is possible to show that the PACF of an AR(p) is equal to zero for h > p.