86  Stationarity, The ACF and the PCF

Time Series Analysis

This lesson we will define the AR(1) process, Stationarity, ACF, PACF, differencing, smoothing
coursera
notes
bayesian statistics
autoregressive models
time series
Author

Oren Bochman

Published

November 1, 2024

Keywords

time series, stationarity, strong stationarity, weak stationarity, lag, autocorrelation function (ACF), partial autocorrelation function (PACF), smoothing, trend, seasonality, quasi-periodicity, differencing operator, back shift operator, moving average, R code, diff function, filter function

86.1 Introduction

86.1.1 Welcome to Bayesian Statistics: Time Series

86.1.2 Introduction to R

86.2 Stationarity the ACF and the PACF 🎥

Before diving into the material here is a brief overview of the notations for timer series.

Tip 86.1: Notation
  • \{y_t\} - the time series process, where each y_t is a univariate random variable and t are the time points that are equally spaced.
  • y_{1:T} or y_1, y_2, \ldots, y_T - the observed data.
  • You will see the use of ’ to denote the transpose of a matrix,
  • and the use of \sim to denote a distribution.
  • under tildes \utilde{y} are used to denote estimates of the true values y.
  • E matrix of eigenvalues
  • \Lambda = diagonal(\alpha_1, \alpha_2, \ldots , \alpha_p) is a diagonal matrix with the eigenvalues of \Sigma on the diagonal.
  • J_p(1) = a p by p Jordan form matrix with 1 on the super-diagonal

also see (Prado, Ferreira, and West 2023, 2–3)

86.2.1 Stationarity 🎥

Figure 86.1: strong and weak stationarity

Stationarity c.f. (Prado, Ferreira, and West 2023, sec. 1.2) is a fundamental concept in time series analysis.

ImportantTL;DR – Stationarity
Stationarity

A time series is said to be stationary if its statistical properties such as mean, variance, and auto-correlation do not change over time.

  • We make this definition more formal in the definitions of strong and weak stationarity below.
Stationarity

Stationarity is a key concept in time series analysis. A time series is said to be stationary if its statistical properties such as mean, variance, and auto-correlation do not change over time.

Definition 86.1 (Strong Stationarity) Let y_t be a time series. We say that y_t is stationary if the following conditions hold:

Strong Stationarity

Let \{y_t\} \quad \forall n>0 be a time series and h > 0 be a lag. If for any subsequence the distribution of y_t, y_{t+1}, \ldots, y_{t+n} is the same as the distribution of y_{t+h}, y_{t+h+1}, \ldots, y_{t+h+n} we call the series strongly stationary.

As it’s difficult to verify strong stationarity in practice, we will often use the following weaker notion of stationarity.

Definition 86.2 (Weak Stationarity) The mean, variance, and auto-covariance are constant over time.

Weak StationaritySecond-order Stationarity

\begin{aligned} \mathbb{E}[y_t] &= \mu \quad \forall t \\ \mathbb{V}ar[y_t] &= \nu =\sigma^2 \quad \forall t \\ \mathbb{C}ov[y_t , y_s ] &= γ(t − s) \end{aligned} \tag{86.1}

  • Strong stationarity \implies Weak stationarity, but
  • The converse is not true.
  • In this course when we deal with a Gaussian process, our typical use case, they are equivalent!

I will describe the concept of stationarity. This is an important concept because many of the standard models for time series analysis such as the autoregressive and moving average processes work under the assumption of stationarity.

Notation

So first we will talk a little bit about just notation for the class. Usually when I write down {yt} like this, I’m referring to the time series process. So in this case I’m thinking of each of the yt is a random variable and we are recording this process sequentially over time when we take the observations. So here we are working as we mentioned before with equally space time series processes. And we are working with yt that are univariate did not multivariate in this course. I will also use the notation y1 up to capital T this is without brackets. This refers to the data. So here I am thinking that I have observed data from y1 all the way up to capital T. So we can also write the sequence like this. So the y1 to the capital T is just a short notation for that.

Strong stationary

So we will now talk about two concepts for stationarity one is strong stationary. This is a distributional concept and it works in the following way. So here we are going to assume again that we have a time series process yt. And the process is going to be strongly stationary if for n and it’s an integer here greater than 0 and any sequence. Of times here t1 up to the tn. So you can have as many as only one or several up to n here and any integer h>0. So the process {yt} is strongly stationary. If the distribution of this collection of random variables that I’m going to write in the form of a vector. If the distribution of this collection is the same as the distribution of this collection of random variables. So we are going to just shift by h here. So this has to hold again for any n and any h. The distribution of the vector of random variables is the same as the distribution of this vector of random variables. Then we say the process is strongly stationary. So there is this idea that the properties that characterize the process are kind of maintained over time.

Weak stationarity

Then there is another notion that is a notion of weak stationarity usually in practice it is hard to determine if a process is strongly stationary. So sometimes we work with the other concept which is weak stationarity or second order stationarity. Okay, and in this case we say again that we’re working with the time series process {yt}. So we are thinking now that the first and second moments of this process exist and are finite. And then instead of making a distributional assumption, we say now that the first and second moments of the two sequences exist and are identical. So in this case we can also write down that the expected value of yt. Is mu, meaning that the expected value of the process does not depend on the time is just constant over time. And then the variance of yt is constant over time. And the covariance here between any 2 yt and ys will depend not on what points they are not on t and s. But on the difference between the distance between t and s. So we can write this out as just the distance between t and s. So this is when the process is weakly or second order stationary.

Relationship between strong and weak stationarity

Strong stationarity implies weak stationary the reverse is not true. If we work with Gaussian processes, then the two concepts are equivalent.

Strong stationarity implies weak or second order, provided that the moments are finite.

And in the case of Gaussian processes which is usually the case that we are going to describe in this class. We are going to be having the equivalence between the strong stationarity and the weak stationarity.

Going over the transcript I have a couple of points.

Prado mentions a couple of concepts like a moving average process and a Gaussian process in this lesson but does not define them in this course. In fact neither are defined in the specialization. While a Gaussian process is a non-parametric model and is a bit more advanced a moving average process is part of the ARMA family of models and we do delve into these in both this and the next course of the specialization.

CautionCheck your understanding

Q. Can you explain with an example when a time series is weakly stationary but not strongly stationary?

86.2.2 The auto-correlation function ACF 🎥

Figure 86.2: The auto correlation function ACF

The auto correlation is simply how correlated a time series is with itself at different lags.

  • Correlation in general is defined in terms of covariance of two variables.
  • The covariance is a measure of the joint variability of two random variables.
Important

Recall that the Covariance between two random variables y_t and y_s is defined as:

\begin{aligned} \mathbb{C}ov[y_t, y_s] &= \mathbb{E}[(y_t-\mathbb{E}[y_t])(y_s-\mathbb{E}[y_s])] \\ &= \mathbb{E}[(y_t-\mu_t)(y_s-\mu_s)] \\ &= E[y_t y_s] - \mu_t \times \mu_s \end{aligned} \qquad \tag{86.2}

We get the second line by substituting \mu_t = \mathbb{E}(y_t) and \mu_s = \mathbb{E}(y_s) using the definition of the mean of a RV. the third line is by multiplying out and using the linearity of the expectation operator.

Tip 86.2: AFC notation

We will frequently use the notation \gamma(h) to denote the autocovariance for a lag h i.e. between y_t and y_{t+h}

\gamma(h) = \mathbb{C}ov[y_t, y_{t+h}] \qquad \tag{86.3}

When the time series is stationary, then the covariance only depends on the lag h = \|t-s\| and we can write the covariance as \gamma(h).

Let \{y_t\} be a time series. Recall that the covariance between two random variables y_t and y_s is defined as:

\gamma(t,s)=\mathbb{C}ov[y_t, y_s] = \mathbb{E}[(y_t-\mu_t)(y_s-\mu_s)] \qquad \tag{86.4}

where \mu_t = \mathbb{E}(y_t) and \mu_s = \mathbb{E}(y_s) are the means of y_t and y_s respectively.

\mu_t = \mathbb{E}(y_t) \qquad \mu_s = \mathbb{E}(y_s) \tag{86.5}

\text{Stationarity} \implies \mathbb{E}[y_t] = \mu \quad \forall t \qquad \therefore \quad \gamma(t,s)=\gamma(|t-s|)

If h>0 \qquad \gamma(h)=\mathbb{C}ov[y_t,y_{t-h}]

ImportantAutocorrelation Function (AFC)

\rho(t,s) = \frac{\gamma(t,s)}{\sqrt{\gamma(t,t)\gamma(s,s)}} \tag{86.6}

auto-correlation AFC

\text{Stationarity} \implies \rho(h)=\frac{\gamma(h)}{\gamma(o)} \qquad \gamma(0)=Var(y_t)

Figure 86.3: sample AFC

y_{1:T} \tag{86.7}

ImportantThe sample AFC

\hat\gamma(h)= \frac{1}{T} \sum_{t=1}^{T-h}(y_{t+h}-\bar y )(y_t-\hat y) \tag{86.8}

where \bar y is the sample mean of the time series y_{1:T}, and \hat y is the sample mean of the time series y_{1:T-h}.

\bar y = \frac{1}{T} \sum_{t=1}^{T}y_t \tag{86.9}

\hat \rho = \frac{\hat\gamma(h)}{\hat\gamma(o)} \tag{86.10}

86.2.3 The partial auto-correlation function PACF 🗒️

Definition 86.3 (Partial Auto-correlation Function (PACF)) Let {y_t} be a zero-mean stationary process, and let

\hat{y}_t^{h-1} = \beta_1 y_{t-1} + \beta_2 y_{t-2} + \ldots + \beta_{h-1} y_{t-(h-1)} \tag{86.11}

be the best linear predictor of y_t based on the previous h − 1 values \{y_{t−1}, \ldots , y_{t−h+1}\}. The best linear predictor of y_t based on the previous h − 1 values of the process is the linear predictor that minimizes

\mathbb{E}[(y_t − \hat{y}_y^{h-1})^2] \tag{86.12}

The partial autocorrelation of this process at lag h, denoted by \phi(h, h) is defined as:

partial auto-correlation PAFC

\phi(h, h) = Corr(y_{t+h} − \hat{y}_{t+h}^{h-1}, y_t − \hat{y}_t^{h-1}) \tag{86.13}

for h \ge 2 and \phi(1, 1) = Corr(y_{t+1}, y_{t}) = \rho(1).

The partial autocorrelation function can also be computed via the Durbin-Levinson recursion for stationary processes as \phi(0, 0) = 0,

\phi(n, n) = \frac{\rho(n) − \sum_{h=1}^{n-1} \phi(n − 1, h)\rho(n − h)}{1- \sum_{h=1}^{n-1}\phi(n − 1, h)\rho(h)} \tag{86.14}

for n \ge 1, and

\phi(n, h) = \phi(n − 1, h) − \phi(n, n)\phi(n − 1, n − h), \tag{86.15}

for n \ge 2, and h = 1, \ldots , (n − 1).

Note that the sample PACF can be obtained by substituting the sample autocorrelations and the sample auto-covariances in the Durbin-Levinson recursion.

The Autocorrelation Function (ACF)

We will now talk about the auto-covariance function as it allows us to characterize a lot of the properties of a stationary time series process. In general, even if the process is not stationary, we can talk about the auto-covariance function of the process.

Again, we’re working with y_t here. We’re going to assume that the process has first and second moments that are finite, so we’re going to be able to compute expected values and covariances. We define the auto-covariance function here between, this is the covariance, any two points.

We’re going to have this as the covariance between y_t and y_s. This is just by definition expected value of y_t minus Mu_t times y_s minus Mu_s. Here Mu_t is the expected value of y_t, and Mu_s is the expected value of y_s. If the process is second-order stationary or strongly stationary then we know that the expected value is really not dependent on time and the covariance is also going to be a function of the distance between t and s. When we have stationarity, we have the expected value of y_t is Mu for all t. We can write down the covariance is really a function of the distance between t and s. In particular, we can write down the covariance as a function of the lag. If you have an h, here is an integer greater than zero. We can write down the covariance as a function of that lag h. This just we can think of it as the covariance between y_t and y_t minus h, and it’s just a function of the lag. This is just for the auto-covariance function.

We can also talk about the autocorrelation or in this case again, if this is the covariance we are using this Gamma function for the covariance, we can use the autocorrelation, this is usually called the ACF. In the general case is going to be a function of t and s is just going to be given by we have the auto-covariance in the numerator and then we’re going to standardize by dividing by the standard deviation of the process at time t. Which is just Gamma t, t gives me the variance of the process at time t and if I take the square root I get the standard deviation. Then again, Gamma s, s gives me the variance of the process at time s. If I take the square root I get the standard deviation. This is the standardized version, this is now an autocorrelation and because it’s a correlation function it’s always going to be between minus one and one.

When we have stationarity, then we can also write in the same way that we wrote the auto-covariance as a function of a lag, we can write down the autocorrelation, the ACF as, now I’m going to use rho here of the lag and this is just, we take the auto-covariance for lag h and divide by the variance, the variance is when I have h equals to zero here. Here we have again, the autocorrelation function written as a function of the lag when the process is stationary. Here again, this is just the variance of the process y_t which we know is going to be a constant over time. It doesn’t depend on time.

We just described the autocovariance function of a time series process.

Now, we can obtain estimates of this function by using the data that is available. Here we’re going to assume that we collect the data from one up to capital T, and we are going to use these to obtain what is called the sample ACF, so you can get the sample autocovariance function and the sample autocorrelation function as well.

Here assuming stationarity, we can obtain Gamma hat h, so this is an estimate for the function Gamma h which is the auto-covariance function of the process y_t and we’re using the hat to indicate that this is an estimate. Here if we have the estimate is going to be based on these observed data y1 up to capital T. This is just the sum for t equals 1 to t minus h of y_t plus h minus y-bar times y_t minus y-bar. Here y-bar is simply the average of all the observations. We can write that down as just the average. This is our estimate that gives us the sample ACF based on my data.

Sample ACF

I can also get the sample auto-correlation. This is the sample autocovariance, I can get the ACF usually refers to the autocorrelation functions, so I can get the sample auto-correlation function by simply plugging in the estimate for the autocovariance and divide that by the estimate for the variance. This gives me my estimate for the sample ACF.

What we will do in the class is talk about some classes of models like the autoregressive processes. They have a particular characterization of the auto-covariance and the autocorrelation functions. Once we observe some data we can get estimates for those functions and see whether or not those estimates are consistent with the theoretical properties of a particular model. For example, the case of the autoregressive process.

86.3 Differencing and smoothing 🗒️

Differencing and smoothing are techniques used to remove trends and seasonality in time series data. They are covered in the (Prado, Ferreira, and West 2023, sec. 1.4).

Many synthetic time series models are built under the assumption of stationarity. However, in the real world time series data often present non-stationary features such as trends or seasonality. These features render such a time series non-stationary, and therefore, not suitable for analysis using the tools and methods we have discussed so far. However practitioners can use techniques for detrending, deseasonalizing and smoothing that when applied to such observed data transforms it into a new time series that is consistent with the stationarity assumption.

We briefly discuss two methods that are commonly used in practice for detrending and smoothing.

86.3.1 Differencing

Differencing, is a method which removes the trend from a time series data. The first difference of a time series is defined in terms of the difference operator, denoted as D, that produces the transformation

differencing operator D

Dy_t \doteqdot y_t - y_{t-1} \tag{86.16}

Higher order differences are obtained by successively applying the operator D. For example,

D^2y_t = D(Dy_t) = D(y_t - y_{t-1}) = y_t - 2y_{t-1} + y_{t-2} \tag{86.17}

Differencing can also be written in terms of the so called back-shift operator B, with

back-shift operator B

By_t \doteqdot y_{t-1}, \tag{86.18}

so that

Dy_t \doteqdot (1 - B) y_t \tag{86.19}

and

D^dy_t \doteqdot (1 - B)^d y_t. \tag{86.20}

this notation lets us write the differences in by referencing items backwards in time, which is often more intuitive and also useful, for example, when we will want to write the differencing operator in terms of a polynomial.

86.3.2 Smoothing

Moving averages, which is commonly used to “smooth” a time series by removing certain features (e.g., seasonality) to highlight other features (e.g., trends).

A moving average is a weighted average of the time series around a particular time t. In general, if we have data y_{1:T}, we could obtain a new time series such that

moving average

z_t = \sum_{j=-q}^{p} w_j y_{t+j} \qquad \tag{86.21}

for t = (q + 1) : (T − p), with weights w_j \ge 0 and \sum^p_{j=−q} w_j = 1

We will frequently work with moving averages for which

p = q \qquad \text{(centered)}

and

w_j = w_{−j} \forall j \text{(symmetric)}

Assume we have periodic data with period d. Then, symmetric and centered moving averages can be used to remove such periodicity as follows:

  • If d = 2q :

z_t = \frac{1}{d} \left(\frac{1}{2} y_{t−q} + y_{t−q+1} + \ldots + y_{t+q−1} + \frac{1}{2} y_{t+q}\right ) \tag{86.22}

  • if d = 2q + 1 :

z_t = \frac{1}{d} \left( y_{t−q} + y_{t−q+1} + \ldots + y_{t+q−1} + y_{t+q}\right ) \tag{86.23}

Example 86.1 (Seasonal Moving Average) To remove seasonality in monthly data (i.e., seasonality with a period of d = 12 months), we use a moving average with p = q = 6, a_6 = a_{−6} = 1/24, and a_j = a_{−j} = 1/12 for j = 0, \ldots , 5 , resulting in:

z_t = \frac{1}{24} y_{t−6} + \frac{1}{12}y_{t−5} + \ldots + \frac{1}{12}y_{t+5} + \frac{1}{24}y_{t+6} \tag{86.24}

86.4 ACF PACF Differencing and Smoothing Examples 🎥

This video walks us through the code snippets in Figure 86.4 and Figure 86.5 below and provides examples of how to compute the ACF and PACF of a time series, how to use differencing to remove trends, and how to use moving averages to remove seasonality.

  • We begin by simulating data using the code in Section 86.6
  • We simulates white noise data using the rnorm(1:2000,mean=0,sd=1) function in R
  • We plot the white noise data which we can see lacks a temporal structure.
  • We plot the ACF using the acf function in R:
    • we specify the number of lags using the lag.max=20
    • we shows a confidence interval for the ACF values
  • We plot the PACF using the pacf function in R
  • Next we define some time series objects in R using the ts function
    • we define and plot monthly data starting in January 1960
    • we define and plot yearly data with one observation per year starting in 1960
    • we define and plot yearly data with four observations per year starting in 1960
  • We move on to smoothing and differencing in Section 86.3
  • We load the CO2 dataset in R and plot it
  • we plot the ACF and PACF of the CO2 dataset
  • we use the filter function in R to remove the seasonal component of the CO2 dataset we plot the resulting time series highlighting the trend.
  • To remove the trend we use the diff function in R to take the first and second differences of the CO2 dataset
    • the diff function takes a parameter differences which specifies the number of differences to take
  • we plot the resulting time series after taking the first and second differences
  • the ACF and PACF of the resulting time series are plotted, they look different, in that they no longer have the slow decay characteristic of time series with a trend.

We will now talk about how to do a few things related to time series. In particular, I will show you how to plot the auto covariance and the correlation function, the sample autocorrelation function and the sample autocovariance function as well as the partial autocorrelation functions.

Simulating White Noise

Here, the first thing I’m going to do is just going to simulate some data that doesn’t have a temporal structure, just to begin at the very basic so-called white noise time series. But just to show you a few things related to time series objects. Here I’m just simulating some white noise, which means just 200 observations here from a normal distribution with mean zero, and standard deviation one. I’m going to plot what I have simulated and then you can see that if in this case, because these series are not temporally related, I should not see anything very structured in the autocorrelation function and in the partial autocorrelation function.

Sample ACF and PACF

These are the sample ACF and PACF. There are two functions in R, one is the ACF, the other one is the PACF. Then you can specify your data here. You can specify the maximum number of lags you want to show in the plot. Here I just showed you the 200 samples that I got from that white noise. If I click on the ACF of that, it’s going to show me the values of the sample ACF for as many lags as I told the function to plot. In this case, you see that the first value is the only one that is relevant, is one here because is just the correlation, the variance divided by itself. You’re going to have that and then you’re going to have all the other sample ACFs for the different lags here. The function shows you a confidence interval for the values.

We see that all the sample ACF values that we get for this white noise process are negligible as expected and then for the PACF, you again can specify the maximum number of lags.

Here it starts at lag 1 and then again, all the values here are within these confidence intervals showing that there is really no temporal structure in these time series. Another thing that may be handy when you’re working with data that are temporarily recorded, is just to define the time series data as a time series object in R. In this case there is a function ts that allows you to just, if you read your data from somewhere else, you can specify what kind of time series object the data is. For example, if I take the 200 observations, those 200 observations that I sampled from the normal distribution, they’re all identically distributed, normal 0 1. I can define a time series that is just saying, well, this is a time series of monthly observations. I’m going to have a frequency 12, and I’m going to specify here that the year that that time series begins is in January, month 1 of 1960. When I plot the time series object, I automatically get here the timestamp of each of the observations. This is monthly data and I’m representing it like that. I could say that this is; I have yearly data, so I have one observation per year. In that case, I just specify that the start is 1960, the frequency is one. I’m going to have one observation per year. If I were to plot, it’s the same data so I’m assuming it has a different frequency here. You can see that I have 200 years of observations here because I changed the scale. Then I have another example.

You can also have yearly data where you have four observations per year. If you have quarters there, you can see how the number of years and you just have again four observations per year. This is just in terms of defining a time series object, if you want to work with that.

Differencing and Smoothing

We explained that in many cases, we want to do differencing and smoothing by moving averages just to either get rid of some components in your time series or you may want to highlight some components in your time series.

CO2 Dataset

I’m going to just do an example here with some dataset that is available in R. I’m going to call the data, is the so-called CO2 dataset that shows the atmospheric concentrations of CO2 in parts per million. These are monthly data in a particular period, you can find more information about the data in R, but this is a time series. You can see that the time series is showing two main features. One is this increasing trend that says, it’s telling you that the atmospheric concentrations of CO2 are increasing over time here. The other is a very marked periodic process. Again, these are monthly data.

I’m going to do two things. I’m going to get rid of the seasonal component in the time series to highlight the trend. Then we can also try to get rid of the trend by using differencing.

If I were to just plot the ACF and PACF for these time series just to see how they look. This here is first the auto-correlation function. You can see two things.

First, it’s decaying very slowly, which is consistent with that upward trend that we see in the data.

The other thing you can see here is this quasi-periodicity.

Here there is a periodicity in the data that is very marked that we saw in the data as well.

You can see that periodicity in the ACF. If you plot the PACF, you’re going to see also this kind of periodic type of behavior. You have some partial ACF values that are large, and then eventually they decay as the lag goes by. If I wanted to remove the effects of a given periodic component with Period 12, I could run the filter that we described before, and then I can plot now what happens when I apply these filters. The filter function in R allows you to specify your filter here. I’m defining these moving average with these weights, I have one 1/24. Then I have 1/12 for 11 of those weights, and then I have a 1/24. Then the sides, tells me whether I want to do moving averages on positive and negative values around the time series.

This creates a new time series.

If I were to plot the time series, I get this resulting one. After removing the seasonal component, I only highlight the trend in the data. If I were to look at the ACF and PACF now, again, this is the ACF. After removing that periodic component, we can see that we still have that slow decay on the ACF coefficients. But the periodicity has disappeared, that slow decay is consistent with that trend that is still in the smooth data. For the PACF, we see that essentially we have a very high value for the partial auto-correlation function at Lag 1 and then everything else is negligible.

If I want to now remove the trend from these already smoothed by moving average data I can use differences. I can take the first difference of the time series, I can take the second difference of the time series. Let’s just do that using the difference function, by default if you don’t say anything, it’s just going to take the first differences. But you can specify how many times you want a difference, a particular time series. Here I’m using the first differences, the second differences. I’m going to plot both resulting time series. You can see this is what results after first removing the seasonal component using moving averages and then looking at the first differences. Then the next is the time series that results after taking differences again.

Or just looking at second differences of the moving average of the time series that already was removed, had the seasonality removed by moving averages.

Here we can see now that the trend has disappeared. When we look at the ACF of the second differences, for example, it looks very different to what we had before. It doesn’t have that very slow decay with lots of coefficients different from zero. That is characteristic of time series that have a trend. Same thing with the PACF, it looks different for the second differences. This is how you can use again both the function diff to look at differences of a given time series. Then the function filter to just remove components using moving averages.

86.5 Code for Differencing and filtering via moving averages 🗒️ \mathcal{R}

data(co2)
co2_1stdiff = diff(co2,differences=1)
co2_ma = filter(co2,filter=c(1/24,rep(1/12,11),1/24),sides=2)

#par(mfrow = c(3,1),  mar   = c(3, 4, 2, 1),  cex.lab=1.2, cex.main=1.2)

plot(co2)
plot(co2_1stdiff)
plot(co2_ma)
1
Load the CO2 dataset in R
2
Take first differences to remove the trend
3
Filter via moving averages to remove the seasonality
4
plot the original data
5
plot the first differences (removes trend, highlights seasonality)
6
plot the filtered series via moving averages (removes the seasonality, highlights the trend)
(a) the original data
(b) the first difference (TS - trend), highlightes the seasonality
(c) the moving averages (TS - seasonality), highlights the trend
Figure 86.4: Differencing and filtering via moving averages

86.6 Code: Simulate data from a white noise process 🗒️ \mathcal{R}

set.seed(2021)
T=200
t =1:T
y_white_noise=rnorm(T, mean=0, sd=1)

yt=ts(y_white_noise, start=c(1960), frequency=1)


#par(mfrow = c(3, 1), mar = c(3, 4, 2, 1),  cex.lab = 1.3, cex.main = 1.3) 

plot(yt, type = 'l', col='red',
     xlab = 'time (t)', 
     ylab = "Y(t)")

acf(yt,
    lag.max = 20, 
    xlab = "lag",
    ylab = "Sample ACF",
    ylim=c(-1,1),main="")

pacf(yt, lag.max = 20,
     xlab = "lag",  ylab = "Sample PACF",
     ylim=c(-1,1),main="")
1
Define a time series object in R - Assume the data correspond to annual observations starting in January 1960
2
Plot the simulated time series,
3
Plot the sample ACF
4
Plot the sample PACF
(a) Simulate data with no temporal structure (white noise)
(b) Sample AFC
(c) Sample PACF
Figure 86.5: Simulate white noise data