88  The AR(1) process: definitions and properties - M1L2

Time Series Analysis

This lesson we will define the AR(1) process, Stationarity, ACF, PACF, differencing, smoothing
coursera
notes
bayesian statistics
autoregressive models
time series
Author

Oren Bochman

Published

November 2, 2024

Keywords

AR(1) process, Yule-Walker equations, Durbin-Levinson recursion, R code

We will next introduce the autoregressive process of order one, or AR(1) process, which is a fundamental model in time series analysis. We will discuss the definition of the AR(1) process, its properties, and how to simulate data from an AR(1) process.

88.1 The AR(1) process 🎥

Figure 88.1: AR(1) definition
Figure 88.2: AR(1) properties

88.1.1 AR(1) Definition

The AR(1) process is defined as:

y_t = \phi y_{t-1} + \varepsilon_t \qquad \varepsilon_t \overset{iid}{\sim} \mathcal{N}(0, v) \tag{88.1}

  • where:
    • \phi is the AR(1) coefficient
    • \varepsilon_t are the innovations (or shocks) at time t, assumed to be independent and identically distributed (i.i.d.) with mean 0 and variance v.

88.1.2 AR(1) Recursive Expansion

Recursive substitution yields:

\begin{aligned} y_t &= \phi(\phi y_{t-1} )+ \varepsilon_t \\ &= \phi^2 y_{t-2} + \phi \varepsilon_{t-1} + \varepsilon_t \\ &= \phi^k y_{t-k} + \sum_{j=0}^{k-1} \phi^j \varepsilon_{t-j} \end{aligned} \tag{88.2}

For \|\phi\| < 1, as k \to \infty, this becomes:

y_t = \sum_{j=0}^{\infty} \phi^j \varepsilon_{t-j} \tag{88.3}

Interpreted as an infinite-order Moving Average \operatorname{MA}(\infty) process.

88.1.3 AR(1) Mean

Since \mathbb{E}[\varepsilon_t] = 0,

\mathbb{E}[y_t] = 0 \text{ mean of the AR(1) process} \tag{88.4}

88.1.4 AR(1) Variance

Using independence and identical distribution:

\mathbb{V}ar[y_t] = \sum_{j=0}^{\infty} \phi^{2j} v = \frac{v}{1 - \phi^2} \tag{88.5}

Requires \|\phi\| < 1 for convergence (i.e., stationarity).

88.1.5 AR(1) Autocovariance Function \gamma(h)

For lag h, the autocovariance:

\begin{aligned} \gamma(h) &= \mathbb{E}[y_t y_{t-h}] \\ &= \mathbb{E} \left[ \left( \sum_{j=0}^{\infty} \phi^j \varepsilon_{t-j}\right )\left (\sum_{k=0}^{\infty} \phi^k \varepsilon_{t-h-k}\right) \right] \\ &= \mathbb{E}[(\varepsilon_{t} + \phi \varepsilon_{t-1} + \phi^2 \varepsilon_{t-2} + \ldots ) \times (\varepsilon_{t-h} + \phi \varepsilon_{t-h-1} + \phi^2 \varepsilon_{t-h-2} + \ldots ) ] \\ &= \mathbb{E}[\phi ^h \varepsilon_{t-h} \varepsilon_{t} + \phi^{h+1} \varepsilon_{t-h-1} \varepsilon_{t} + \ldots] \\ &= v \sum_{j=0}^{\infty} \phi^{h+j} \phi^{j} \\ &= v \phi^h \sum_{j=0}^{\infty} \phi^{2j} \\ &= \frac{v \phi^{\|h\|}}{1 - \phi^2} \qquad \text { when } |\phi| < 1 \end{aligned} \tag{88.6}

We used the definition and properties of the expectation, independence of the innovations \varepsilon_t, and the fact that \mathbb{E}[\varepsilon_t^2] = v. In the cross product, only terms where lags are the same (j = k) contribute, as the others are independent, leading to the above result. In the final step, we used the formula for the sum of a geometric series.

88.1.6 AR(1) Autocorrelation Function \rho(h)

Defined by:

\rho(h) = \frac{\gamma(h)}{\gamma(0)} = \phi^{\|h\|} \tag{88.7}

88.1.7 AR(1) other properties:

  1. for any lag h:
  • \rho(h) = \phi^{\|h\|}
  • \gamma(h) = \frac{v \phi^{\|h\|}}{1 - \phi^2}
  1. Exponential decay if \|\phi\| < 1
  2. If \phi > 0: decay is monotonic
  3. If \phi < 0: decay is oscillatory (alternates signs)

88.1.8 Stationarity

  • The process is stationary when \|\phi\| < 1:
    • Mean and variance are constant over time
    • Autocovariance depends only on lag h, not on t

The first process we are going to study is the autoregressive of order 1 time series process. Here, a process is an autoregressive of order 1 process, if you can write the process in this way. If you think about y_t, you can write y_t as a linear function of the past value of the process, so y_t minus 1. Phi here is the so-called AR coefficient. There is that linear structure plus a noise. Here we’re going to assume that the Epsilon t is the noise. Here are independent, identically distributed random variables, normal zero v. That’s the assumption, that’s the basic process that we’re going to study. This is going to be, we will show a zero mean process. We are going to study the properties of this process. The first thing you notice when you look at this equation is that if you write y_t, here is a function of y_{t-1}. You can now assume that the same structure is valid for y_{t-1}. You can write down y_{t-1} as a function of y_{t-2}. If we do that, we can have Phi now I am writing y_{t-1} using the same auto-regressive structure. This is going to be y_{t-2} plus Epsilon_{t-1}, plus Epsilon t, which gives me Phi square, y_{t-2} plus Phi Epsilon{t-1} plus Epsilon t. What we see here now is that the process is now greater than as a function of y_{t-2}, the Phi appears to the power of 2. You will notice if you continue applying this recursion, that the power for the Phi goes with the lag on the y_t. Then you start having terms that are just linear combinations of the Epsilon t’s. If you were to apply this several more times and get to k steps here, you can write down the process just using this recursion over and over again. Now this is going to be a function of y_{t-k} Phi^k. Then you’re going to have all those functions of the Epsilon t’s that you will be able to write as k minus 1, Phi^j, Epsilon_{t - j}. This is the form of the process that you can write down. If you were to repeat this infinitely many times and if you assume that Phi is between minus 1 and 1, you can write down y_t as this infinite sum of functions of the Epsilon t’s. You can show essentially that we’re writing down as long as the Phi is between -1 and 1. We’re writing y_t as this infinite order moving average process. This is like a linear filter around the Epsilon_{t-j}’s. Using the distributional assumptions on the Epsilon t’s, we can find what the moments of the process are in particular, if you think about the expected value of y_t. Well, the expected value of y_t is expected value of the sum. The expected value of the sum is the sum of the expected values. This is a constant and we know that each of the Epsilon t’s has mean zero. Expected value of this process is zero. This representation gives us a zero mean autoregressive process of order 1. If I want to use the same kind of reasoning to compute the variance of the process, we will see here. Now the variance of the process is the variance of this sum. I can write the variance of the sum as the sum of the variances because the Epsilons are independent. They are all also identically distributed. I can write this down as, sum j equals zero up to infinity. This is a constant, so there is a square that appears here, because this is the variance, and the variance of each of these Epsilons is v. I can write this down as v. Now, because, again, we’re assuming that Phi is between minus 1 and 1, this summation is going to converge, and that is simply going to be one minus Phi squared. This gives me the variance of the process. As you can see, does not depend on the time. We are assuming that phi is here, which implies that the process is going to be stationary. We will talk about that too. This gives me the mean and the variance. We can also compute the ACF, and we will do that next. In the case in which the Phi is between minus 1 and 1, we can use the equation that writes down yt, the AR1, as an infinite sum of these Epsilon t values and past values of the Epsilon ts. This moving average way of writing the AR1 process, to compute the autocorrelation function and the autocovariance function. Let’s just work first with the autocovariance function. If we are working here again with Gamma h, this is just we show that the expected value of the AR1 is zero. We can write down the autocovariance of the process as the expected value of yt, y_{t-h}. Now we can write yt as that infinite sum of the Epsilon ts and past values of Epsilon ts, and we’re going to do the same for the y_{t-h}. If we were to do that, we would get that this is expected value of this is the sum j equals zero up to infinity of Phi to the power of j Epsilon_{t-j}. Then I can write down the y_{t-h} in the same way. Just to use a different index here, this would be Phi to the power of k, and then we have Epsilon_{t -h} minus k. If I think about this, the first summation. We have, when j is zero, this is Phi to the power of zero, Epsilon t is only the first term. Then I have plus Phi Epsilon_{t- 1} plus Phi square Epsilon_{t-2}, h plus, let’s do one more, h plus 1 Epsilon t minus h plus 1, which gives me minus h minus 1 and so on. That’s the first infinite sum times, and the second one I can do the same. The first term is going to be epsilon_{t-h} plus Phi Epsilon t minus h minus 1 plus Phi square Epsilon t minus h minus 2, and so on. If you look at this expression here, if I want to compute expected value, and you look at all the cross products here, Anything that has a cross product of epsilon t minus some lag times Epsilon t minus some other lag, if the time is not the same in the cross products, the expected value is going to be zero because that corresponds to the covariance and they are independently distributed. We know that the Epsilon ts are all independent. The only terms that are going to appear here in these cross products are the ones where the Epsilon t has the same lag. For example, in this case I can combine this one, that is, Phi to the power of h Epsilon t minus h with this one here, that gives me that Epsilon t minus h Square and that gives me just the variance of the process. Only those terms are going to appear in the equation. I can write down this as the expected value of, so just do that one, here’s the first one that appears. Then I can combine this one and this one have the same lag. I have Phi h plus 1 times Phi Epsilon t minus h minus one square and so on. The next one would be. We can see when we look at this expression now, if I were to compute the expected value, again, this is a constant, would come out of the expected value if you wish. Then I have the expected value of the square terms. Those are all going to be equal to v, which is the variance of the process, of the epsilon T’s. I can write this down as v, and I have the summation, let’s write it as a function of j here, of v to the power of h plus j times phi j. The variance is just coming again from the expected value of the epsilon t squares. This is the covariance function. If you simplify this expression further, you see that the phi to the power of h, comes out of the summation, and then we have these terms here. If the phi is again assumed to be between minus 1 and 1, this summation is going to converge, and I can write this down as this expression. Here I’m assuming that say h is any positive integer, and this is my auto covariance function. If I want to compute the autocorrelation function, is simply going to be the auto covariance divided by the variance of the process. But we computed before the variance of the process is that V over 1 minus phi squared. This is the other assumption here between minus 1 and 1. This is going cancel, the terms here are going to cancel and I’m going to get phi to the power of h. If you’re working with a negative integer, if h happens to be a negative integer, you’re going to have the same structure only that now the phi is going to be to the power of the absolute value of that. In general, for any h, positive or negative, we have that the auto-correlation is going to be of the form phi to the power of the absolute value of h. Similarly, if we want to write down the auto-covariance, it’s going to be phi to the power of the absolute value of h v over 1 minus phi squared. This gives me the theoretical autocovariance and autocorrelation functions for the AR1 process. We can see that assuming again that phi is between minus 1 and 1, this is a decaying function as a function of the lag. The larger the value of phi is, so if you have a phi that is positive and close to one, the decay is going to be slower than if you have a phi that is positive and close to zero. Also, if the phi is negative, you’re going to have also that exponential decay as a function of h, but it’s going to be an oscillatory decay. It’s going to be alternating between positive and negative values depending on h here.

88.2 The PACF of the AR(1) process 🗒️

It is possible to show that the PACF of an AR(1) process is zero after the first lag. We can use the Durbin-Levinson recursion to show this.

For lag n = 0 we have \phi(0, 0) = 0

For lag n = 1 we have:

\phi(1, 1) = \rho(1) = \phi \tag{88.8}

For lag n = 2 we compute \phi(2, 2) as:

\begin{aligned} \phi(2, 2) &= \frac{(\rho(2) − \phi(1, 1)\rho(1))}{ (1 − \phi(1, 1)\rho(1))} \\ &= \frac{\phi^2-\phi^2}{1- \phi^2}\\ &=0 \end{aligned} \tag{88.9}

and we also obtain:

\phi(2, 1) = \phi(1, 1) − \phi(2, 2)\phi(1, 1) = \phi. \tag{88.10}

For lag n = 3 we compute \phi(3, 3) as

\begin{aligned} \phi(3, 3) &= \frac{(\rho(3) − \sum_{h=1}^2 \phi(2, h)\rho(3 − h))}{1 − \sum_{h=1}^2 \phi(2, h)\rho(h)} \newline &= \frac{\phi^3 - \phi(2,1) \rho(2) - \phi(2,2) \rho(1)}{1 - \phi(2,1)\rho(1) - \phi(2,2)\rho(2)} \newline &= \frac{\phi^3 - \phi^3 - 0}{1 - \phi^2 } \newline &= 0 \end{aligned} \tag{88.11}

and we also obtain

\phi(3, 1) = \phi(2, 1) − \phi(3, 3)\phi(2, 2) = \phi \tag{88.12}

\phi(3, 2) = \phi(2, 2) − \phi(3, 3)\phi(2, 1) = 0 \tag{88.13}

We can prove by induction that in the case of an AR(1), for any lag n,

\phi(n, h) = 0, \phi(n, 1) = \phi and \phi(n, h) = 0 for h \ge 2 and n \ge 2.

Then, the PACF of an AR(1) is zero for any lag above 1 and the PACF coefficient at lag 1 is equal to the AR coefficient \phi

88.3 Simulate data from an AR(1) process 🎥

This video walks through the code snippet below and provides examples of how to sample data from an AR(1) process and plot the ACF and PACF functions of the resulting time series.

Prado demonstrates how to simulate AR(1) processes using arima.sim in R:

  • Simulation Setup:

    • set.seed() ensures reproducibility.
    • Simulate 500 time points from an AR(1) with \phi = 0.9 and variance = 1.
    • The process is stationary since |\phi| < 1.
  • arima.sim Function:

    • Can simulate ARIMA(p,d,q) processes; here, only AR(1) is used.
    • Model specified via a list: list(ar = phi), with sd as the standard deviation (√variance).
  • Comparative Simulation:

    • Second AR(1) simulated with \phi = –0.9 to show the impact of negative \phi.
    • The positive \phi process shows persistent values (random walk-like).
    • The negative \phi process shows oscillatory behavior.
  • ACF and PACF Analysis:

    • True ACF: Exponential decay for both cases, oscillatory when \phi < 0.

    • Sample ACF: Matches theoretical ACF for each process.

    • Sample PACF: Only lag 1 is non-negligible, aligning with AR(1) properties:

      • Positive at lag 1 for \phi = 0.9.
      • Negative at lag 1 for \phi = –0.9.
      • All other lags ≈ 0.

The demonstration confirms our theoretical results regarding ACF/PACF behavior in AR(1) processes.

[MUSIC] I will now show you how to get samples from a simulated AR process using the arima.sim function in R. So here, I’m just going to begin setting the random seed so that you can replicate the example that I’m running here. And what I’m going to do is I’m going to set the number of observations or time points that I want to sample to 500 for these different processes. And I’m going to sample from an autoregressive of order one with an AR coefficient 0.9 and variance 1. So the arima.sim function is going to take different kinds of parameters here, this is a more general function than just sampling from an AR process. You can actually sample from an autoregressive integrated moving average process. So is a process that has an AR, an autoregressive part. Integrated part, meaning how many times you difference the series and then a moving average part that tells you also what order and what parameters you are going to have for that process. We will only use this function to simulate from AR processes. So here in the model, I specify the list of components for the model, so usually the list is going to contain the AR coefficients. The information about the I part of the process, the integrated part of the process, we don’t have an integrated part here. And then also information about the moving average component that we don’t have here. And then you specify also what is the standard deviation of the process, which is going to be just the square root of the variance of the process that we defined. So I’m setting the variance to be 1, then I’m setting the standard deviation of the process to be the square root of the variance phi here. I’m setting the AR coefficient to be 0.9 and then I just simulate 500 observations. And then I’m going to also show you what happens if I were to simulate 500 observations from another AR1 process. Where now I’m going to keep the same variance 1 but instead of having a positive AR coefficient, I’m going to consider a negative 0.9 AR coefficient. Just to show you how those processes look very different in terms of their properties. So if I were to plot those two time series, the first one that I plot here is that set of simulated observations from an AR1 with AR coefficient 0.9. So you can see this kind of almost random work behavior that is consistent with having a high value of the AR coefficient. The process is still because I’m using 0.9, the process is a stable process and it’s going to be stationary. If I were to use a phi value that is above 1 or below minus 1, I would have an explosive process and you would see that the function would just tell you there is a problem with that process. The second AR process that we simulated had AR coefficient minus 0.9, so that one is consistent as we said before. You can see the behavior here is very different just looking at something that has a negative value there is this oscillatory behavior that is more quasi periodic. So I can now plot the true auto correlation function and I’m going to plot for both processes. I’m going to show you also how the sample ACF and the sample PACF look like. So here I’m going to set the maximum lag to be 50 just to show you all the way to 50 lag and computing first, the covariance of the process. So that’s the autocovariance at lag zero. So the variance of the process is just the v / 1 minus phi square, that’s the structure. And then as you know, the autocovariance is just has, you have to multiply that by the AR coefficient to the power of the lag. So this is what this line is doing. So if I were to do this and now plot that true ACF of the process, we can see what we learn because the phi is smaller than one. We are going to have this exponential decay as a function of the lag which is what this process does. In terms of just the second process that we simulated with the minus 0.9 as the AR coefficient. We can compute again the true autocovariance function, the true autocorrelation function. And it’s going to look similar in terms of the exponential behavior, the decay here the decay is exponential as a function of the lag. But you can see the difference between these two plots, is that here the exponential decay occurs, with this oscillatory behavior. That has to do with the fact that you have a negative coefficient. So now, if you look at how the ACF, the sample ACF and the sample for each of the two processes for the first process and the second process. So if you just now use the ACF function and look at the data in there, put the data in there. You can see that the sample ACF based on those 500 observations that you simulated from each of the two processes resembles the structure of the true ACF. So in here, you have that sort of exponential decay that is consistent with the true ACF of the process. And in here you have that oscillatory behavior but also exponential decay that is consistent with the minus 0.9 AR coefficient. You can then plot the PACF for the sample PACF for the two processes. What we know here is that because these are simulated observations from an AR1 process. What should happen with the partial autocorrelation function is that any lag after the first one should be zero or should be negligible in this case for the sample PACF. So let’s plot those two and we can see that this is precisely what is happening with the sample PACF based on those two data sets. You simulate for the first case you have only the first lag is not negligible, which is consistent with the fact that these are simulated observations from an AR1. And then you have that is a positive quantity that corresponds precisely actually should be close to the coefficient 0.9 that you have here. For the minus 0.9, you have a negative PACF coefficient at lag1, which should be close to that negative 0.9 value of the phi and everything else is negligible. So you can see that for these two examples, the sample PACF is also doing what you expected to do.

88.3.1 R code: Sample data from AR(1) processes 🗒️

Sample data from 2 ar(1) processes: and plot their ACF and PACF functions

set.seed(2021)
T=500

v=1.0
sd=sqrt(v)
phi1=0.9
yt1=arima.sim(
  n = T, 
  model = list(ar = phi1), 
  sd = sd)

phi2=-0.9
yt2=arima.sim(
  n = T, 
  model = list(ar = phi2), 
  sd = sd)
1
set seed for reproducibility
2
number of time points
3
innovation variance
4
innovation standard deviation
5
AR coefficient for the first process
6
Sample data from an AR(1) with coefficients \phi = 0.9 and \nu = 1
7
AR coefficient for the second process
8
Sample data from an AR(1) with coefficients \phi = -0.9 and \nu = 1

88.3.2 Plot the time series of both processes

par(mfrow = c(1, 1),mar = c(3, 4, 2, 1), cex.lab = 1.3)
plot(yt1,main=expression(phi==0.9))
par(mfrow = c(1, 1),mar = c(3, 4, 2, 1), cex.lab = 1.3)
plot(yt2,main=expression(phi==-0.9))
(a) \phi = 0.9
(b) \phi = -0.9
Figure 88.3: Simulated AR(1) processes

88.3.3 Plot true ACFs for both processes

par(mfrow = c(3, 2),mar = c(3, 4, 2, 1), cex.lab = 1.3)
lag.max=50 # max lag

cov_0=sd^2/(1-phi1^2)
cov_h=phi1^(0:lag.max)*cov_0
plot(0:lag.max, cov_h/cov_0, pch = 1, 
     type = 'h', col = 'red',
     ylab = "true ACF", 
     xlab = "Lag",
     ylim=c(-1,1), 
     main=expression(phi==0.9))
1
compute auto-covariance at h=0
2
compute auto-covariance at lag h
3
Plot autocorrelation function (ACF) for the first process
Figure 88.4: True ACF for the first AR(1) process
cov_0=sd^2/(1-phi2^2)
cov_h=phi2^(0:lag.max)*cov_0
# Plot autocorrelation function (ACF)
plot(0:lag.max, cov_h/cov_0, pch = 1, 
     type = 'h', col = 'red',
     ylab = "true ACF", 
     xlab = "Lag",
     ylim=c(-1,1),
     main=expression(phi==-0.9))
4
compute auto-covariance at h=0 for the second process
5
compute auto-covariance at lag h for the second process
6
Plot autocorrelation function (ACF) for the second process
Figure 88.5: True ACF for the second AR(1) process

88.3.4 plot sample ACFs for both processes

acf(yt1, lag.max = lag.max, type = "correlation", ylab = "sample ACF",
    lty = 1, ylim = c(-1, 1), main = " ")
acf(yt2, lag.max = lag.max, type = "correlation", ylab = "sample ACF",
    lty = 1, ylim = c(-1, 1), main = " ")
## plot sample PACFs for both processes

pacf(yt1, lag.ma = lag.max, ylab = "sample PACF", ylim=c(-1,1),main="")
pacf(yt2, lag.ma = lag.max, ylab = "sample PACF", ylim=c(-1,1),main="")
Figure 88.6: Sample ACF for the first AR(1) process
Figure 88.7: Sample ACF for the first AR(1) process
Figure 88.8: Sample ACF for the first AR(1) process
Figure 88.9: Sample ACF for the first AR(1) process