100 Bayesian Inference in the NDLM: Part 2

100.1 Filtering, Smoothing and Forecasting: Unknown observational variance 🎥

Recap:

In Section 98.5.1 we performed Bayesian inference on Equation 98.13, an NDLM with known observational variance \nu_t and system variance \boldsymbol{\omega}_t c.f.and determined that under the assumption of a normal prior (\theta_0 \mid \mathcal{D}_0) \sim \mathcal{N}(\mathbf{m}_{0}, \mathbf{C}_{0}), with filtering leads to a posterior p(\boldsymbol{\theta}_t|\mathcal{D}_t) \sim N (\mathbf{m}_t, \mathbf{C}_t) and smoothing leads to normal posterior distributions

In this lesson we will explore the case of the NDLM, where we assume That the observational variance \nu_t=v i.e. is constant over time.

100.1.1 Inference in the NDLM with unknown constant observational variance:

Let \nu_t = v\ \forall t with \nu unknown and consider a DLM with the following structure: \begin{aligned} y_t &= \mathbf{F}_t^\top \boldsymbol{\theta}_t + \nu_t, &\nu_t &\sim \mathcal{N} (0, \nu) &\text{(obs)}\\ \boldsymbol{\theta}_t &= \mathbf{G}_t \boldsymbol{\theta}_{t-1} + \boldsymbol{\omega}_t, & \boldsymbol{\omega}_t &\sim \mathcal{N} (0, \nu \mathbf{W}^*_t) &\text{(sys)} \end{aligned} \tag{100.1}

where we have the usual suspects:
- An observation equation, relating the observed y_t to the hidden state \theta_t plus some sensor noise \nu_t.
- A state equation, which describes how the hidden state \theta_t evolves plus some process noise \boldsymbol{\omega}_t,
- \mathbf{F}_t is a known observation operator,
- \mathbf{G}_t is a known state transition operator,
- \boldsymbol{\theta}_t is the hidden state vector, and
- \boldsymbol{\omega}_t is the process noise, e.g. friction or wind.
- \mathbf{W}_t is a known covariance matrix, for process noise.

Question: why does \nu appear in the w_t distribution equation?

Prado says is also assuming that the system noise is conditional on the variance \nu. Yet this seems to be an assumption that should be justified.

In fact, What we are missing is a logical Model for this model i.e. simple use case

with conjugate prior distributions: \begin{aligned} (\boldsymbol{\theta}_0 \mid \mathcal{D}_0, \nu) &\sim \mathcal{N} (\mathbf{m}_0, \nu\ \mathbf{C}^*_0) \\ (\nu \mid \mathcal{D}_0) &\sim \mathcal{IG}\left(\frac{n_0}{2}, \frac{d_0}{2}\right ) \\ d_0 &= n_0 s_0 \end{aligned} \tag{100.2}

100.1.2 Filtering

We now consider filtering updating uncertainty. Which lets us infer current time step parameters or observation based on the previous time step’s information and \nu as being normal distributed.

we will be interesting the following distributions:

p(\boldsymbol{\theta}_t \mid \mathcal{D}_{t−1}) the prior density for the state vector at time t given information up to the preceding time;
p(y_t \mid \mathcal{D}_{t−1}) the one-step-ahead predictive density for the next observation;
p(\boldsymbol{\theta}_t \mid \mathcal{D}_t) the posterior density for the state vector at time t given \mathcal{D}_{t−1} and y_t;
the h-step-ahead forecasts p(y_{t+h} \mid \mathcal{D}_t) and p(\boldsymbol{\theta}_{t+h} \mid \mathcal{D}_t);
p(\boldsymbol{\theta}_t \mid \mathcal{D}_T) the smoothing density for \boldsymbol{\theta}_t where T > t.

Assuming:

(\boldsymbol{\theta}_{t-1} \mid \mathcal{D}_{t-1}, \nu) \sim \mathcal{N} (\mathbf{m}_{t-1}, \nu\ \mathbf{C}^*_{t-1})

By filtering we obtain the next time step parameters conditional on the previous time step’s information and the variance \nu as being normal distributed.

\begin{aligned} (\boldsymbol{\theta}_t \mid \mathcal{D}_{t-1}, \nu) &\sim \mathcal{N} (\mathbf{a}_t, \nu \mathbf{R}^*_t) \\ \mathbf{a}_t &= \mathbf{G}_t \mathbf{m}_{t-1} \\ \mathbf{R}^*_t &= \mathbf{G}_t \mathbf{C}^*_{t-1} \mathbf{G}'_t + \mathbf{W}^*_t \end{aligned} \tag{100.3}

We can also marginalize \nu by integrating it out, we have parameters that are unconditional on \nu:

\begin{aligned} (\boldsymbol{\theta}_t \mid \mathcal{D}_{t-1}) &\sim T_{n_{t-1}} (\mathbf{a}_t, \mathbf{R}_t) \\ \mathbf{R}_t &\equiv s_{t-1} \mathbf{R}^*_t \end{aligned} \tag{100.4}

with s_t\ \forall t given in Equation 100.5.

(y_t \mid \mathcal{D}_{t-1}, \nu) \sim \mathcal{N} (f_t, \nu q^*_t)
- with
  - f_t = F'_t \mathbf{a}_t
  - q^*_t = (1 + F'_t \mathbf{R}^*_t F_t)

and unconditional on \nu we have

(y_t \mid \mathcal{D}_{t-1}) \sim T_{n_{t-1}} (f_t, q_t)
- with
  - q_t = s_{t-1} q^*_t
(v \mid \mathcal{D}_t) \sim \mathcal{IG}\left( \tfrac{n_t}{2}, \tfrac{s_t}{2}\right )
- with
- n_t = n_{t-1} + 1 s_t = s_{t-1} + \frac{s_{t-1}}{n_t} \left ( \frac{e^2_t}{q^*_t} - 1 \right ), \tag{100.5}
here e_t = y_t - f_t
(\theta_t \mid D_t, v) \sim N (m_t, vC^*_t),
- with
  - m_t = a_t + A_t e_t, and
  - C^*_t = R^*_t - A_t A'_t q^*_t.

Similarly, unconditional on \nu we have

(\boldsymbol{\theta}_t\mid \mathcal{D}_t) \sim T_{n_t} (\mathbf{m}_t, \mathbf{C}_t)
- with
  - \mathbf{C}_t = s_t \mathbf{C}^*_t

100.1.3 Forecasting

Similarly, we have the forecasting distributions:

(\boldsymbol{\theta}_{t+h} \mid \mathcal{D}_t) \sim T_{n_t} (\mathbf{a}_{t}(h), \mathbf{R}_{t}(h)),
(y_{t+h} \mid \mathcal{D}_t) \sim T_{n_t} (f_{t}(h), q_{t}(h)),
with
- \mathbf{a}_{t}(h) = \mathbf{G}_{t+h} \mathbf{a}_{t}(h-1),
- \mathbf{a}_{t}(0) = \mathbf{m}_t, and

\begin{aligned} \mathbf{R}_{t}(h) &= \mathbf{G}_{t+h} \mathbf{R}_{t}(h-1) \mathbf{G}'_{t+h} + \mathbf{W}_{t+h} \\ \mathbf{R}_{t}(0) &= \mathbf{C}_t \end{aligned}

f_{t}(h) = \mathbf{F}'_{t+h} \mathbf{a}_{t}(h) \qquad q_{t}(h) = \mathbf{F}'_{t+h} \mathbf{R}_{t}(h) \mathbf{F}_{t+h} + s_{t}

100.1.4 Smoothing

Finally, the smoothing distributions have the form:

(\boldsymbol{\theta}_t \mid \mathcal{D}_T) \sim T_{n_T} (a_T (t - T), R_T (t - T) s_T / s_t)

where -a_T (t - T) and R_T (t - T) with a_T (t - T) = m_T - B_T [a_{T+1} - a_T (t - T + 1)]

R_T (t - T) = C_T - B_T [R_{T+1} - R_T (t - T + 1)] B^\top_T

with

B_T = C_T G^\top_{T+1} R^{-1}_{T+1}, \quad a_T (0) = m_T , \quad R_T (0) = C_T.

Video Transcript

We will now discuss the filtering, and smoothing, and forecasting distributions. In the case of the normal dynamic linear model, where we assume that the observational variance is constant over time. So we’re going to call that variance v and so vt is equal to v for all t, but v is unknown. So we’re going to place a prior distribution on that variance. We are going to write down the model in terms of again, an observation equation and a system equation. The observation equation has the same form that we discussed before. We’re now the observational variance vt is changed to be always v, but now v is unknown. And then the system equation we are going to write down as an equation that is conditional on the variance v. So we have that say theta t is equal to Gt theta t minus one plus the system noise. And we are assuming that this is normally distributed with mean zero and now the variance co-variance matrix is v times Wt star. So here we assume then that Wt star is a known quantity for all the times t from one and so on. And then we’re also assuming that Ft and Gt are given. So what kind of prior distribution can we use now? The structure of the prior in this case that is conjugated with the model that we are proposing is a normal inverse gamma prior that means for theta zero given the zero we condition on v. We’re going to use a normal prior distribution with mean m0 and variance co-variance matrix v times C0 star. And then the distribution for v given D0 is an inverse gamma with parameters n0 half and d0 half. d0 can be written in terms of n0 times s0, so here n0 is the degrees of freedom parameter. So the higher the value of the degrees of freedom is the more information you are inputting in this prior. And then s0 is a prior estimate of the variance v before you obtain any observations, so this is how you can view the prior. So this is a conditional prior structure, if you use this prior structure, you can obtain the filtering equations in closed form. The smoothing equations also in closed form. So we are going to discuss those equations now, so again, we are assuming that at time t- 1. We have this conditional structure in which the prior the posterior at time t -1, given the t -1, and v has this normal structure conditional on v. And then we given the t minus one, has this inverse gamma structure. Now, my parameters are indexed it in time, do I have nt- 1/2, and dt- 1/2, and the dt- 1 has the same structure, nt- 1 times St- 1. So, if I have this condition structure, we can show, and this structure, as we said, is true for t = 0, as we wrote down before. We can show that the prior distribution at time t given Dt- 1 and conditional on v is a normal at vRt star. And now I have equations just to compute the value of the mean at is Gt * mt- 1. So this is exactly the same equation we had before when we were discussing the case of the DLM with known variances at both the observational and system level. And now we have thi Rt star that has the structure Gt Ct- 1, Gt transpose plus Wt star. And then we can obtained as well the one step ahead forecast distribution for yt given Dt- 1, conditional on v is given by this equation here. It’s a normal Ft vqt star, where Ft is Ft transports t, and qt star is Ft transpose, t, Rt star Ft + 1. So we have again, conditional structures on v and then we can obtain the posterior distribution for the variance after I get my first observation, yt. And this is going to be given in terms of an inverse gamma nt over to dt Over two. And the degrees of freedom parameter is updated in such a way that it was the number of degrees of freedom you had at step t- 1 + 1. This plus one has to do with the observation you just collected. And then you can compute the dt as nt * St where St is given by this equation here, so it depends on St- 1. It depends on the number of degrees of freedom, it depends on et which is the given by yt- ft, so that’s the prediction error. So yt is the new observation you just obtained and then qt is st- 1qt star. Finally, you can then obtain the posterior distribution for thetat given Dt conditional on v that’s a normal mt vCt star. And you can obtain the equations for the moments of that normal distribution using this equation for the mean. And then we have Ct star this is the equation that allows us to update the co-variance matrix. So this is the distributions we have conditional on v for theta t and yt. We can also write down, we can integrate out the variance here. And when we integrate out the variance, given that the variance has an inverse gamma distribution, we obtain for theta t now unconditional on v. So it only depends on the information up to t- 1, that’s a student t distribution with nt- 1 degrees of freedom and at and Rt are. The location and scale parameters for that distribution, so Rt is St- 1Rt star. Similarly, we get students distributions for the one step ahead prediction distribution for yt given Dt- 1 and for theta t given Dt the posterior for theta t at time, t is a student T nt mt Ct. So we can just write everything conditional on v as we did before. Or if we want to just look at the marginal distributions and conditional on v, we obtained student t distributions. We now discuss the smoothing equations in the case of the unknown variance at the observational level. So we have the same type of notation we’re assuming here we’re looking at the distribution of theta t given D capital T. So capital T here is greater or equal than the lower case t. And before we have a normal distribution now we’re going to have a student T distribution. The number of degrees of freedom is going to be n, capital T. And then we’re going to have mean a capital T, lowercase t- capital T. And there is going to be a recursion for the means of this distribution. And then we have for the scale parameter here we have R capital T(t- T). And then that’s multiplied by s capital T, divided by s lower case t. So in this case we have if you recall this is going to be the estimate that we’re going to get for the variance after receiving capital T observations. While this one is the estimate for the observational variance that we get from the model after receiving lower case t observations. So the recursions are going to work in the same way. I’m not writing down the equations, they are just exactly the equations that we had before. And then for forecasting if I want to look at the distribution of theta t + h. So h here is greater equal than zero given the information up to Dt, that one is going to be a student T with nt degrees of freedom. And then I have my recursions also for at(h) and Rt(h) as we had before. So as in the case with the known observational variance, we are going to need the Gt + h here for all the values of h to compute these moments and also the Wt + h for all the values of h. So we need to specify those matrices in there. Everything else stays the same, and now we have that for the predictive distribution for h steps ahead for yt. Given Dt is also now a student T with nt degrees of freedom, we have ft(h) and qt(h). The ft(h) is as before, the qt(h) is slightly different now because we don’t have vt + h. We’re assuming that v is constant and unknown. So what goes in here, you can see is the estimate that we have for that variance v at time t. So this allows us to compute the forecasting distributions.

100.2 Summary of Filtering, Smoothing and Forecasting Distributions, NDLM unknown observational variance 🗒️

100.3 Specifying the system covariance matrix via discount factors 🎥

Figure 100.1: sys covariance via discount factors 1

Figure 100.2: estimating the discount factors

100.3.1 Recap and overview

So far we discussed \{F_t,G_t,v_t,w_T\} with:
- \underbrace{v_t}_{known}, \underbrace{w_t}_{known}
- v_t=\underbrace{v}_{unknown},\qquad \underbrace{w_t}_{known} \forall t
We move on to the case where:
- v_t=\underbrace{v}_{known}, \forall t i.e. known observational variance

100.3.2 Known observational variance with a discount factor

var(\theta_t\mid\mathcal{D}_t)=R_t=\underbrace{G_t C_{t-1}}_{P_t} G^\top_t + W_t

where P_t = G_t C_{t-1} corresponds to the prior variance in the model with zero system variance i.e. \{F_t,G_t,v_t,0\} with constant parameters that do not evolve over time!
R_t=\frac{P_t}{\delta} \qquad \delta \in (0,1]
R_t P_t + W_t = \frac{P_t}{\delta} \implies W_t = \frac{1-\delta}{\delta}P_t

we estimate P_1=G_1C_0G_1^T

100.3.3 Unknown constant observational variance with a discount factor

W_t^{*} = \frac{1-\delta}{\delta} P_t^{*}

where:

P_T^*=G_1C_0^*G_1^T

100.3.4 Estimating \delta

\delta\in(0,1] \text{ usually }\geq 0.8

\underbrace{\log(p(y_{1:T})\mid \mathcal{D}_0,\delta))}_{l(\delta)} = \sum log(p(y_t\mid \mathcal{D}_{t-1},\delta))

we pick the optimal delta by maximizing \mathcal{l}(\delta)

\textrm{MSE}(\delta) = \arg\max_{\delta} \mathcal{l}(\delta)) = \frac{1}{T}\sum_{t=1}^T (y_t - f_t(\delta))^2

optimal \delta is found by minimizing MSE(\delta)

but there are a number of methods to estimate \delta

we can have one delta for each component.

we can include it in the model and estimate it from the data.

Video Transcript

We have discussed different cases in terms of the specification of the vt, the observational variance. Now, I will talk about how to specify the system variance, assuming that that system variance is not known just to recall the cases we have discussed, suppose that we have a DLM. That looks like this, so we have talked about the case in which vt and Wt, the observation and system variances are known for all the values of t. And we said that we can perform filtering and smoothing in that case using a normal prior distribution. We get results that are for the filtering distributions, we also get normal distributions for the smoothing distribution, we get normal distributions as well. Then we also discussed the case in which, so I guess, let me just clarify here, so we’ve done vt and Wt known for all t. Okay, we also said discussed the case in which we assume that vt is v, so it’s constant over time v known and sorry, v unknown and Wt known for all t. Again, we can use a conjugated structure for v unknown, we impose a normal inverse gamma prior structure. And then we can obtain closed form filtering equations and smoothing equations in that particular case. But we’re assuming here that all the Wts, the system covariance matrices are unknown. Now we are going to discuss the case in which the Wt is unknown, so how do we deal with this? So I’ll begin with assuming that vt is known for all t and we will discuss then what happens if the vt is unknown and now I want to specify my Wt. So if you recall when we were working with the equations of the normal dynamic linear model. There was the variance of the state vector theta_t that the given the information we have up to t- 1. And if you remember the equations, we had two components in here, we had called that Rt and that Rt has Gt, Ct- 1, Gt transpose plus Wt. I’m going to call this first component here in this nation, I’m going to call it Pt and I can think of this Pt as the variance, the prior variance of theta_t given Dt- 1. If I were to consider a model in which this Wt is just zero, in other words, a model in which there is no additional stochastic component in each time at the system level. So in this case, if I call Pt ,Gt, Ct- 1, Gt transpose corresponds to the prior variance in a model of the form Ft, Gt, vt and then I have a zero here. So there is no so fast, there are no stochastic changes which is equivalent to saying that there are no changes over time in the parameters of the model. In the state, parameters of the model. A way to specify then my WT here, which is what I want to do in this particular case is to assume that my Rt, my prior variance in the general model. The variance of theta_t given Dt-1 is this Pt component but I’m going to increase the uncertainty when I think about that variance. So I’m going to say that my Rt is just going to be Pt divided by delta with delta. some factor here that I’m going to call discount factor that is between zero and one. And I’m including the one here when delta is exactly one, what I’m doing is I’m saying there is no stochastic component. I’m going to be having essentially my variance for theta_t given Dt-1 corresponds to the variance of model that looks like this. And in that case, the parameters of the model are not changing over time, I don’t have any changes over time, every, all the parameters stay the same over time. That’s the case in which delta is equal to one and if delta is smaller than one, what I’m doing here by dividing by this delta factor is. I am increasing the uncertainty with respect to whatever I had in t- 1, so the closer delta is to zero the more uncertainty I’m going to have. And therefore the more variation I’m going to have in the parameters of the model, the closer delta is to one, the less variation I’m going to have in the model parameters. So if I were to now combine this expression with also this expression, right, what I get, so if I say Rt is Pt plus Wt but it’s also Pt divided by delta. What I get here is that Wt is fully specified as 1 minus delta divided by delta Pt, so I have an expression now for Wt that is given in terms of this discount factor and my Pt. If you look at the expression for Pt, Pt always depends on Gt Ct-1 and Gt transpose here, so you only need the Gt and Ct-1. The first time you compute that is when you consider P1, which is just, you’re going to use your G1, your C0, which is your prior variance for the state vector. And that one you provide and this is something that the modeler provides, the user provides and then you have the G. So once you have this P1, you plug it in here and you’re going to have the W1 and you can continue going through this process, you continue with your filtering equations. All you need is this C0 and the discount factor to fully specify the Wt, in the case, so this is again for the case in which Vt is known for all time t, for all the times t. And you are specifying wt in this way, you can also consider the case in which the W, the vt is constant and unknown. So there is the case in which vt is v with v unknown and the structure in this case for the model is going to be very similar, you can just specify the discount factor in the same way. Now, instead of having Wt, if you go back to the equations that we discussed, you’re going to have a Wtstar. And the Wt star, you can write down in terms of a P star, where the P star is going to be Gt, Ct minus one star Gt. So again, everything comes here with a star, that is just writing that model in terms of the conjugate structure as we described before. So, here, you can fully specify Wt star as one minus delta, divided by delta and then you have that Pt star with and Pt star is going to be given by Gt, Ct- 1 star, Gt transpose. And again, this provides the structure for the Wt star, now, you write down your model, your equations conditional on v. You have the normal distributions for the state vector and then you also impose a prior distribution of v that is an inverse comma. And then go through your equations again for the filtering and smoothing and as we described before. So once again, all you need here is you need your CO star and your discount factor and you have your specification for the Wt star. So how do we choose that discount factor in practice, we have delta in (0,1], so that’s what the theory says and I have to pick a particular value in practice. How do we proceed if I have a model and what is the ideal discount factor value that I can use to specify that Wt matrix. Again, in practice, the first thing that happens is we usually consider discount factor values that are above .8. So usually we just work, it’s not always the case but usually we consider discount factor values that are above .8. So you can think of just considering a grid of values in the interval .8 to 1, including the one and then find a method that allows you to pick that value optimally. You could use different kinds of properties of the model to do this. In particular, you can use the one step ahead predictive distributions to give you an idea of what is the best discount factor value you can take. So you can write down if you think about the predictive distributions, you can write down p. So you have all your data here and then you have the information at, time zero and then you have specified a particular DLM with an Ft, Gt and so on. And then when you write this down, you can write this down as a product of those one step ahead densities here. So if you take the log and this, let me just write down that this is also going to depend on whatever discount factor you have for that particular model specification. So if I take the log of this and you can write this p here inside as a product of the individual wise. And so when you take the log, you’re going to have the sum of the logs from one or 2 capital T of the log of these p, yt, Dt minus one. And then this also depends on delta, so this is the one step ahead predictive distribution. So you have those densities and it’s going to be a function of delta in the case of a model in which you know, vt and Wt for all the values of t. You know that those are normal in the case here, we’re specifying Wt using a discount factor, so I’m going to have a normal density if I have vt known. So I can look at this as a function of, let’s call it l of delta as a function of delta and I can view this as a likelihood function for delta, so if I evaluate this in a grid of values. I can pick the so called optimal delta by maximizing and depending on whether or not you’re using a model with the vt known or a model in which vt is v. But v is unknown and you have the conjugate prior structure, you’re going to have normal distributions in the first case here and students distributions in the second case. But you can compute this for each value of delta, if you have a grid of values, you just pick the optimal delta to be this, the one that maximizes this function. This is not the only function you could use, you could also consider other alternatives, you could consider, so this is one option. You have another option in which you can write down a mean square error again as a function of the discount factor for each model that you specify. And you can write down this as, over T, so here, I’m writing et as you remember, it’s just that whatever you observe minus the prediction you get from your model is usually ft. But that prediction depends on delta, so that’s why I’m writing down et as a function of this delta. So here you could pick the optimal discount factor by minimizing this mean square error, so here the optimal delta is chosen by minimizing this function of delta. You could also use other measurements, like the mean absolute deviation and then again minimize this. There are other approaches that could be using practice, you could consider that delta is another parameter in your model. You could put a prior on the delta and you can try to get inference on that value and then again, another thing you could do is if you have multiple components in your model. You could use a discount factor that is dependent on the components, so you could have as many discount factors as components in your model. And use the same idea to specify the W matrix in this block diagonal form, if you were to do that instead of having a function that is just, is univariant on delta. You will have multiple deltas and you will have to pick the optimal values by maximizing that function. Or in the case of the MSE, you will have to pick the optimal values of those deltas by minimizing these functions, so there is multiple alternatives. In this course, we are going to be working only with a single discount factor and we can pick the optimal either using this method or this other method that I just described.

100.4 NDLM, Unknown Observational Variance: Example 🎥

This is a walk though of the R code in ?lst-NDLM-unknown-variance

Video Transcript

We now discuss and illustrate some R code that allows you to obtain the filtering, smoothing, and forecasting distributions of a normal dynamic linear model that assumes that the observational variance is unknown. We call that V and the system covariance matrix Wt is assumed either known or specified using a discount factor. The code allows you to consider both options. The code also allows you to work with time-varying Fts and Gts, and state parameters that incorporate different components such as polynomial trends, regression, periodic components via Fourier representations, and so on. I’m going to go first through the functions, and then we will illustrate these analyzing a dataset that is available in R. As before, I’m putting all the functions into a single file that you can source. The first function that we have is a function that setups the DLM matrices that you specify, assuming that the observational variances is unknown, so you can pass Ft and Gt for all the times that you are considering. Then you have the option also of passing Wt_star, which is just in the case where you assume that you are going to specify the structure of the covariance of the noise at the system level, then you would pass this Wt_star. If you don’t pass it, then the function assumes that you’re working with a discount factor. Here, the function just returns, puts all those matrices into a single object. Similarly, there is a function that puts all the prior parameters into a single list. This is again an object that you can then pass to the filtering, smoothing, and forecasting functions. Here, if you remember the conjugate structure conditional on the observational on variance V, the state vector, Theta0 is distributed as normal m0, vC0*. Then the prior distribution for the variance has a degrees of freedom parameter that we call in n0, and then you have these S0 that you can think of as an initial estimate for the variance. This is just a structure of the prior distribution that completes this normal inverse Gamma structure. Then you have the forward filtering function. Again, here you pass the data, the matrices that define the structure of the model, the initial states that you created with these functions. Then you have the option of passing a Delta or a grid of values. We will also talk about this, but this is just a value Delta for the discount factor. If you don’t specify the Delta, the function assumes that you are working with a case in which you know the structure of the Wt_ star. If you pass the Delta, then the function assumes that you are using a discount factor to specify that Wt matrix. If you look into the function, you can see that it has a similar structure to the functions we have discussed before now, the Fts are allowed to be time-varying. Then you have a different prior structure, and then you can obtain the moments for the prior distribution of Theta t given Dt minus 1. There are two options if you look into the Code, one for the case in which you don’t have a discount factor and one for the case in which you have a discount factor. Then you can continue with the moments of the one-step ahead forecast. Again, these are the equations and the moments of the posterior distribution for Theta t given Dt. The function returns an object that contains not only mt, Ct, at, Rt, ft, Qt and et, as we had before, it also has the degrees of freedom parameters and the Sts for each time t. All these is contained into this object that the function returns. Then there is a function to compute the smoothing distributions, the function receives as input, the data, the matrices that define the model structure, and this object that is the posterior states. You first have to run the filtering function, keep whatever the function returns into an object and then pass that object. This is the posterior states object. Then again, you have the option of providing a discount factor or not. If you look into the equations, you will see again, that you have the smoothing equations are updated in terms of the equations we discussed for the case in which there is no delta. Then in that case, we are assuming that the Wt structure is provided by the user, and in the case in which a discount factor is specified. Then we have also the list that returns. Essentially this function returns a list with the moments for the smoothing distribution of the state parameter vector, and also for the mean response. We also have then a function that allows you to compute the k steps ahead forecast distribution. The function receives the posterior states. k here is the number of steps ahead you want to compute. You have the matrices and then again, you may or may not have a delta, which is a discount factor. It computes the moments of that forecast distribution and it returns the moments for, again, at and Rt that correspond to the state vector, and then for the mean response or the k steps ahead, forecast function here. Going now to the example, I’m going to show you how to use these functions using a dataset that R provides. We will first source the functions that we just discussed and then we’re going to use these dataset that I’m plotting right here. This corresponds to 100 observations. They are annual measurements of the Nile river level in a particular location. The units are 10^8 square meters in a period of time that goes from 1871-1970. We’re going to use our first order polynomial DLM, which is a structure that we have used before. But now we are going to assume that the variance of the observational level is unknown, and constant over time. We’re going to also estimate this variance. The data contains 100 observations. We’re going to break these into two components. We are going to use 95 observations to fit the model, and then we’re going to predict the last five observations and see what we obtain. Here we’re just specifying the data, what I call the test data here it shows what you want to forecast. We set up the matrices for the first order polynomial. You notice here that I’m defining arrays that have the dimension of whatever is the model that you are using. In this case, a first order polynomial model has only one parameter. We have to specify Ft for all the values of t. Here, n is the total number of observations. I have to specify that for all the time points that I’m going to use to fit the model, and then also the ones that I’m going to predict the five last observations. That’s why the dimension is n, the Gt in the first order polynomial is constant equal to one. We specify that. Then we have this Wt star that I’m going to assume is just also one for all the values of t. After that we can specify the prior parameters for theta0 given V and D0. Here I’m assuming a priori, the mean level is 800 and the C0star is 10. The degrees of freedom parameter, I’m setting it to be one, and then the initial estimate for the observational variance is 10. Once I define this structure, I use those functions to set up all these matrices and initial parameters in terms of objects to pass through the filtering distribution, this smoothing distributions and so on. Here the first thing, we compute the moments of the forward filtering distributions, assuming that V is unknown, so we pass the data, the matrices, and the initial states. We can also compute 95 percent credible intervals for the state vectors. In this case is a one-dimensional vector so we just pass the results that we obtain using the filtering equations. These are the moments, the expected value of the distribution, and the variance of that distribution as well as the degrees of freedom parameter. Remember that when we integrate out V, assuming this distributional structure that we have for V, we obtain that the distribution of Thetat given Dt is a student t-distribution, so you have to pass the degrees of freedom parameter here. We then can proceed with the smoothing. The smoothing function is going to use the data, the matrices, and the object that contains all the filtering results in here. Again, we can compute credible intervals for, now it would be the distribution of the state parameter vector at each time t, considering that we have received capital T observations. We then proceed with the one step ahead or k steps ahead forecast. In this case, k is going to be equal to 5. Then again, I can compute credible intervals for the distribution of yt plus k given Dt, in this case the ts we’re going to go all the way to capital T. We then plot the results. Here we first plot the dots here correspond to the actual observations and then the first thing that I’m going to plot is the filtering distribution, meaning the mean of the filtering distribution as well as the 95 percent credible interval. This corresponds to Theta t given Dt as I receive observations here. Then you can plot also the smoothing distribution so once you compute the filtering, you are going to go and revisit these distributions. You can see that the mean of this distribution is smoother than the one we computed with the filtering equations. Also you can see the uncertainty here also is smaller, so these are the smooth distributions. Because this is a first-order polynomial, if we think about the forecast function is going to be constant after that last data point, capital T so it’s going to have that structure over time. Then I can also compute credible intervals associated to that distribution. You can see that the pattern that as the number of steps ahead increases, the uncertainty increases. This results in the the posterior distributions again for filtering distributions for Theta t given Dt and then the smoothing distributions and the forecast distributions.

100.5 code: NDLM, Unknown Observational Variance Example 🗒️ \mathcal{R}

This code allows time-varying F_t, G_t and W_t matrices and assumes an unknown but constant \nu. It also allows the user to specify W_t using a discount factor \delta \in (0,1] or assume W_t known.

100.5.1 Nile River Level Filtering Smoothing and Forecasting Example

source('all_dlm_functions_unknown_v.R')
source('discountfactor_selection_functions.R')

## Example: Nile River Level (in 10^8 m^3), 1871-1970 
## Model: First order polynomial DLM
plot(Nile)

n=length(Nile) #n=100 observations 
k=5
T=n-k
data_T=Nile[1:T]
test_data=Nile[(T+1):n]
data=list(yt = data_T)


## set up matrices for first order polynomial model 
Ft=array(1, dim = c(1, 1, n))
Gt=array(1, dim = c(1, 1, n))
Wt_star=array(1, dim = c(1, 1, n))
m0=as.matrix(800)
C0_star=as.matrix(10)
n0=1
S0=10

## wrap up all matrices and initial values
matrices = set_up_dlm_matrices_unknown_v(Ft, Gt, Wt_star)
initial_states = set_up_initial_states_unknown_v(m0, 
                                      C0_star, n0, S0)

## filtering 
results_filtered = forward_filter_unknown_v(data, matrices, 
                                            initial_states)

Forward filtering is completed!

ci_filtered=get_credible_interval_unknown_v(results_filtered$mt, 
                                    results_filtered$Ct, 
                                     results_filtered$nt)

## smoothing
results_smoothed=backward_smoothing_unknown_v(data, matrices, 
                                             results_filtered)

Backward smoothing is completed!

ci_smoothed=get_credible_interval_unknown_v(results_smoothed$mnt, 
                                         results_smoothed$Cnt, 
                                         results_filtered$nt[T])

## one-step ahead forecasting
results_forecast=forecast_function_unknown_v(results_filtered, 
                                                k,  matrices)

Forecasting is completed!

ci_forecast=get_credible_interval_unknown_v(results_forecast$ft, 
                                          results_forecast$Qt, 
                                     results_filtered$nt[T])

## plot results
index=seq(1871, 1970, length.out = length(Nile))
index_filt=index[1:T]
index_forecast=index[(T+1):(T+k)]

plot(index, Nile, main = "Nile River Level ",type='l',
     xlab="time",ylab="feet",lty=3,ylim=c(400,1500))
points(index,Nile,pch=20)

lines(index_filt,results_filtered$mt, type='l', col='red',lwd=2)
lines(index_filt,ci_filtered[, 1], type='l', col='red', lty=2)
lines(index_filt,ci_filtered[, 2], type='l', col='red', lty=2)
lines(index_filt,results_smoothed$mnt, type='l', col='blue',lwd=2)
lines(index_filt, ci_smoothed[, 1], type='l', col='blue', lty=2)
lines(index_filt, ci_smoothed[, 2], type='l', col='blue', lty=2)

lines(index_forecast, results_forecast$ft, type='l', 
      col='green',lwd=2)
lines(index_forecast, ci_forecast[, 1], type='l', 
      col='green', lty=2)
lines(index_forecast, ci_forecast[, 2], type='l', 
      col='green', lty=2)
legend("topright",
       legend = c("Observed", "Filtered mean", "Filtered CI", 
                  "Smoothed mean", "Smoothed CI", 
                  "Forecast mean", "Forecast CI"),
       col = c("black", "red", "red", "blue", "blue", "green", "green"),
       lty = c(NA, 1, 2, 1, 2, 1, 2),
       pch = c(20, NA, NA, NA, NA, NA, NA),
       lwd = c(NA, 2, 1, 2, 1, 2, 1),
       bty = "n",
       ncol = 2)

Figure 100.3: Nile River Level Filtering, Smoothing and Forecasting Results

Overthinking the Nile River Data

We use the Nile dataset throughout the course, despite the Nile data being notorious for having long term dependencies and being highly non-linear, which might present many challenges for the NDLM framework. c.f. the joint work of two giants Benoit Mandelbrot and Harold Edwin Hurst. Which led to the development of the Hurst exponent and the fractional Brownian motion, i.e. fractional Gaussian noise and fractional ARIMA ARFIMA which are more suitable for modeling the Nile data but not covered in this course.

Just a couple of concepts here:

In Fractional Brownian Motion (fBm) draws from the Normal need not be independent. With the concept of Hurst exponent H which is a measure of the long-term memory of a time series. The Hurst exponent is defined as follows: \mathbb{E}[B_{H}(t)B_{H}(s)]={\tfrac {1}{2}}(|t|^{2H}+|s|^{2H}-|t-s|^{2H}), \tag{100.6}

where:
- B_{H}(t) is the fractional Brownian motion, and
- H is the Hurst exponent.

in ARFIMA, the autocovariance we allow the exponent of the lag k to be a real number d rather than an integer, i.e. we allow for fractional differencing via a formal binomial series expansion of the form:

\begin{array}{c} {\displaystyle {\begin{aligned}(1-B)^{d}&=\sum _{k=0}^{\infty }\;{d \choose k}\;(-B)^{k}\\&=\sum _{k=0}^{\infty }\;{\frac {\prod _{a=0}^{k-1}(d-a)\ (-B)^{k}}{k!}}\\&=1-dB+{\frac {d(d-1)}{2!}}B^{2}-\cdots \,.\end{aligned}}} \end{array} \tag{100.7}

100.6 Case Study EEG data 🎥

### EEG example ##################
source('all_dlm_functions_unknown_v.R')
source('discountfactor_selection_functions.R')

eeg_data <- scan("data/eeg.txt")
length(eeg_data)

[1] 400

eeg_data_gh <- scan("data/eeg_gh.txt")
length(eeg_data_gh)

[1] 400

plot(eeg_data, type='l', main="EEG Data", xlab="Time", ylab="EEG Signal")

head(eeg_data)

[1] -42.565610 -56.345140  -7.788712 -41.253280  33.549870 147.066900

head(eeg_data_gh)

[1] -69.553802 -83.333328 -34.776901 -68.241470   6.561679 120.078735

T <- length(eeg_data)
data <- list(yt = eeg_data[])

Video Transcript

We know discuss our first case study type of example. In this case, we’re going to use a time-varying autoregressive model to analyze some EEG data. EEG stands for electroencephalogram data. This is a brain signal of a particular channel that was recorded on a subject that received electroconvulsive therapy. It’s just an electroencephalogram recorded at a particular location of the patient’s scalp. There is a lot of information that you can get about these data. This dataset was recorded at Duke University. There are multiple papers where we analyze these data. You can also look at a full analysis and discussions and references to the papers in the book, Prado Ferreira and West 2021. The idea here is also to provide a little bit of comparison and relationship between what we are learning here with dynamic linear models and what we learned earlier in the course with autoregressive processes. Let me just show you the data. Then before we do that, we will need here the file that contains all the functions to compute the filtering, smoothing, and forecasting distributions using an unknown v at the observational level. Then I’m putting all the functions to compute the optimal discount factor also in this other file that you will have to source this. If you open this file, you will see that it contains the functions we described before to obtain the optimal discount factors. Let’s just source those in. Here I’m providing the EEG data. It corresponds to the middle portion of an EEG that was recorded at a sampling rate of 256 hertz or 256 observations per second. The data was subsampled every fifth observation to make it a little smaller. Here I’m just plotting that dataset. What you look at here is the voltage level for the EEG data. We have these 3,600 observations. We’re going to use a model that is a time-varying autoregressive model, meaning you have an autoregressive structure in terms of a particular model order. Here we’re going to use a model order of 12. We’re going to regress yt on the 12th past values, of yt. What happens here is the AR coefficients are going to be time-varying. That’s why we are using a dynamic linear model for this particular example. The total number of observations is 3,600, I’m just going to do the analysis with the first 3,200 only and then we’ll see what the results are. I’m just going to use this first portion here of this EEG. When you look at the picture of the data, again, if you were to look at the characteristics in terms of the frequency and the amplitude, you would see that the data has a higher frequency or lower quasi-periodicity at the beginning and a higher quasi-periodicity towards the end. Then also the amplitude changes over time. Here we are just defining the data and now we are setting up the matrices. If you remember how we set up the matrices here, the Gt is an identity, and then the Ft is going to contain the past values of the yts. For each time t we are going to have 12 past values. We build that and get the Ft matrices over time and then the Gt matrix. The Gt is constant over time, the Ft changes over time. We set up the initial state matrices. The model order is 12, so we are going to have a distribution for the state parameter vector of dimension 12 that is a normal with m0 v times C0_star. Here we have just that matrix. Here this is a vector of zeros. I’m just setting everything at zero. Then the C_star matrix is a diagonal matrix of dimension 12 times 10. Each of the components has a variance of 10 here. This is conditional, again, this would be multiplied by v. The full covariance structure will be given by v times C0_star is a conditional structure. Then the parameters for the prior distribution, the number of degrees of freedom I’m assuming here is 1. Then the estimate, a priori for the observational variance is 100. We set up those. We also keep those in objects that then we can pass to the functions that compute the filtering, smoothing, and forecasting distributions. Here I’m going to specify the Wt using a discount factor so we are going to define a set of values here. I’m going to have a lower bound that is 0.95 and an upper bound that is one and then I’m going to have increments of 0.001 to define that grid of values. Just to compute the optimal discount factor. Then I’m going to use the mean square error as the measure to pick that optimal discount factor. We run this. It takes a little bit just because we have a fairly large range of grid of values. Once we compute the optimal discount factor, the function, if you remember, is going return the filtering moments or the moments for the filtering distribution, the moments for the smoothing distributions, as well as the forecasting distributions with the optimal value. If you change, you may choose a different measurement. There are, if you recall, four different measurements to compute the optimal value of the discount factor. Once this is done, I will show you the results of the analysis. Let me just show you here. The optimal discount factor that was selected here is 0.994. It’s a fairly large discount factor, meaning that the variations in the state parameters are occurring smoothly over time, but there are changes. We can now retrieve the results in terms of the filtering. distributions. We can compute the credible intervals for that. We can compute the results for the smoothing distributions and obtain credible intervals for that as well. I’m just going to plot here the results of these. This would be my data and these would be the blue that I superimpose on that, is just the mean of the smoothing distribution based on the model and this discount factor that we chose. It’s capturing the main characteristics of the data. But in this particular case, this information is not necessarily what we want to look at. We want to try to interpret what results this is giving us in terms of the time-frequency characteristics of the data. One way of obtaining that information is to think back in terms of the characteristic polynomial and the reciprocal roots of that characteristic polynomial. We are going to have a characteristic polynomial for each time t now, because we have AR coefficients that change over time. For each of those roots and for each polynomial we know that we can also obtain a spectral representation that changes over time. If you think about this spectral density that is associated to each of those autoregressive processes at time t, you can just represent that in terms of different spectral density. Let’s look at what the results are. The next piece of code is just computing those reciprocal roots of the characteristic polynomial at each time t. Then looking at what is the modulus of the reciprocal roots and what is the period of the reciprocal roots. Here we have a model order of 12. As you recall, you can have complex valued or real valued reciprocal roots, as we did in the case of the the standard auto-regressive model where the parameters are not changing over time. Because you can have a different number of roots in terms of how many pairs of complex roots you have and how many real roots you have, this model is time-varying. You could potentially have also changing number of complex valued and real valued reciprocal roots. What I’m going to plot here, is I’m going to look at the root that has the largest modulus. So the reciprocal rule that has the largest modulus over time. Then you will see here that root happens to be complex so you do get a pair of complex valued roots. There is a period that is associated to that dominant root in terms of the modulus. I’m going to plot the modulus of that root and also the periodicity. What you see here, once you run the code. This is that trajectory based on the mean of this smooth distribution for the state parameter vector. The AR coefficients at each time t here. You can see it starts very high, is very persistent at the beginning of the time period and then it oscillates essentially between 0.9. One, and you can see that it decreases towards the end of the seizure here or towards the end of the EEG recording. But this is a fairly persistent group. When you look at the corresponding frequency, and here I’m plotting that frequency in a scale that is hertz, so if you remember the original data was recorded at 256 hertz, but it was sub-sampled every fifth observation. So if I want to plot the results in terms of hertz here for that corresponding frequency, I have to do this transformation. What you see here, is that this dominant quasiperiodic component has a frequency that starts a little bit higher than five hertz, and then it decreases towards the end of the EEG recording which is something we saw also when we were plotting the data. If you were to look at the data and zoom in into different portions at the beginning of the EEG and at the end you would see that this is consistent with that behavior. In particular, neuroscientists like to talk about different frequency bands in the brain, so we see here that this dominant component is mostly in the delta frequency band for the period of recording, at the beginning is in the Theta frequency band. At the beginning, when the seizure is starting, it begins in the Theta band and it decreases to the delta band. Just to again, remind you of how this works, if I think about the spectral representation of this process over time, you can specify some time points here, and then look at what is the estimate of the spectral density that corresponds to that autoregressive process for that particular time. I’m just going to use the function, arma.spec from the library astsa, we have used this function before, and I’m just going to plot. Here we’re just plotting the spectral density estimated using again the results from the smooth distribution, so this is based on that mean of the smooth distribution at each time t, and we’re going to use the estimate of the observational variance that we get after fitting the entire data set. What we get from the dynamic linear model. Here the results, so this corresponds to just one spectral density, you see that if you were to translate this, so now the frequency appears plotted between 0.5, if you were to do the transformation to the original scale, you would see that the peak of these spectral density is precisely at that value that was around those that went from essentially five hertz down towards the end of the seizure. We can see, I’m now going to plot the different spectral densities here, so just going back just to show you the plot a little bit larger here, so I’m picking again different times during the EEG recordings, so this corresponds to the 1.96 seconds, 9.77 seconds and so on, so this is towards the beginning of the recording, and then this is towards the end of the recording. You can see that at the beginning of the recording, I have a power that is dominated mostly by a single quasiperiodic component, that is that one that we just plotted at the beginning, that begins in the Theta band. Then if you look at the location of the peak, you see that the location of this dominant peak is decreasing, so the frequency is decreasing towards the end of the recording. Also, we see that the power of these spectral densities tends to decrease also as the recording gets to the end. There are other components that are also contributing in terms of the spectral characteristics. This is the nice feature of using a time-varying autoregressive model. You can capture what’s happening with the data and what is the spectral representation, and what is the interpretation that we can give to these analyses in terms of something that neuroscientists can understand.

100.7 Case Study: Google Trends 🎥

Video Transcript

So in this example we are going to use a dynamic linear model to analyze some data that we are downloading from Google Trends. So you have two ways of downloading the data in Google Trends you can go to the website and download the data on your own for a particular query. So the idea here is you’re going to look for how many times or a proxy for how many times people search for a particular term or you can use the library gtrendsR just to download the data directly from R. So this is what I’m going to do here, I’m just going to use the library and then we will talk a little bit about the structure of the dynamic linear model and proceed with the model fitting and smoothing and all the components here. So just to illustrate first how you can download data from Google Trends here using the function gtrends and there is the help in R provides. All you need to do in terms of just specifying the time framework that you want to use the location in the world that you want to use. So in this example that I have here, I’m looking at the two terms, one is the term NFL how many searches for NFL there were and then the other one is the term NBA and then I’m using two different locations. One is California only and the other one is the US, but you can specify the entire world or any other location and during a particular time period. So here if I run this and I plot the data, you can see a time series in this case the line corresponds to the term nba. The queries or the search hits for that particular term in the US and then you have this nfl only in California. So this is just to illustrate how you can look at these different data sets. I’m going to show you the example with the terms time series and we’re going to use the time framework for just the entire time that google has data available for this. So here I’m interested in thinking about the time series of the searches for the term time series in Google. So I download the data, plot the data here. Takes a little bit and here is my data set. You have the observed time series here for the term time series over the world. If you were to zoom a little bit you will see that the interest over time in terms of the hits is a time series that presents a couple of main features. One feature is we can see a decreasing trend from the beginning that occurs in the year 2004. So this is monthly data and then we’re going to see an increase after that initial period and the largest peak here corresponds to 2018. And then the other component that we see here or the other feature that is important in this data set is that it has some periodicity. So it looks like the data is presenting a periodic behavior. You would see that the larger peaks every year occur in November, which coincides with the US Elections and then there is a smaller peak that occurs in the Spring. So that also has to do with the elections. So just to try to build a model around this, I’m going to fit a model that has two components. One component is a polynomial trend model of order two. You can incorporate higher orders for the polynomial trend if you think that this should be quadratic or cubic or so on. Here, I’m assuming that in terms of the forecast function, I’m going to have a linear trend that allows me to capture either increasing or decreasing linear trends and they change over time. So the model would adapt to those changing structures and then I’m going to use a Fourier representation with a fundamental period of 12 to fit this data and I’m not going to use all the harmonics. We’re going to use a subset of the harmonics. Just in terms of the data, the data provides the interests over time, interest by countries. So if you want to look at other components for this data the entire dataset provides that we are just going to be modeling the hits. So I define the data in terms of the hits and in terms of defining my structure for the dynamic linear model, you can specify the structure by just using defining the matrices yourself or you can use the DLM package that has built in functions to specify these components. So here I am using the DLM package again, you can use any, you can just write down the matrices and specify the matrices on your own. So as I said, the model is going to have two components, is going to have a trend component model using polynomial model of order 2 by default. That’s the model order and just specifying here, that is an order-2 and then also specifying here the m0 for that component. So the mean level here I’m assuming is a priori is 40 for the level and then 0 for the rate of change. The other components in the model are not going to be, we’re not going to use them and am just defining some of those but because we are going to use our code to fit these models and we’re going to also use the discount factor. The only important components here is just the matrices and the prior distributions. So I’ll talk about this a little bit. So the model, the seasonal component, we are using this dlmModTrig, which allows us to specify the Fourier representation of dynamic linear model where the fundamental period here is 12. We’re going to only use the first four harmonics, we’re going to include only before the first four harmonics in the model. And then again you can put the variance for the component and the W for that component, you can specify them. I’m just setting them to these values, but as you will see in a minute we are not going to use that specification, we only care about the F, the G and the prior parameters. So let me just go ahead here I’m changing the model. You can do things like add before we go into to prior, you can add the two components. They are additive and then you have a model that has all this structure into a single object. So you have an m0, a C0 an F, a V, a G, a W and these Js are just for the cases in which the F or the G are changing over time, which is not going to be the case for this particular model. So I’m going to change, you can do things then like change the C0 by specifying a particular value or you can pass that value also through the dlmModPoly either way is possible. So I’m setting up the n0 and S0 that I’m going to use for the prior. We’re going to assume that we have a degrees of freedom of one and the initial estimate for the observational variance is 10. The model, it’s going to have, so if you think about what I’m building here, I’m using a seasonal model with period 12, I’m using the first four harmonics. Each harmonic in this case is going to require a vector of dimension 2 in terms of the state space representation. So, if I have four of those, I’m going to have eight components that are needed for the seasonal component of the model and then for the polynomial trend, my dimension is 2. So the dimension of the resulting final State Parameter vector is going to be 10. So, I now extract the F, the G from that model structure that I defined using the DLM package and then I’m going to pass that structure to my functions that the functions that we’ve been using, just setting up the matrices. So I’m calling here all the code that we need. So the matrices are going to be my Ft Gt. I’m not specifying a Wt_star because I’m going to use the discount factor to specify the Wt structure. And then for the initial states, I’m just passing the m0, that C0 is that we are passing corresponds to the C0 star in this function and then I have n0 and S0. Now, I specify a range for the discount factor. So I’m going to use discount factors that are between 0.9 and 1 and then we’re going to go by increments of 0.005 to define that grid of values. So once I do that, I’m also going to choose the optimal model order using the mean square error. And once I run this function adaptive_dlm with the structure of the DLM that we built the initial values for the prior distributions and the data in that grid of values it returns the optimal discount factor. In this case, it’s selecting a discount factor of 0.94. So we’re going to use a single discount factor for specifying the structure for all the components of the model. And then once we do that we can retrieve the results corresponding to the filtering distributions, the smoothing distributions and you may also do forecasting if you want. So here we are just retrieving those and computing 95% credible intervals and I’m going to go ahead and plot the results of this. The dotted line corresponds to the data here the blue line corresponds to the mean of the smoothing distribution for the mean response here. So that’s what we obtain and then the credible intervals the 95% credible intervals are that that are illustrated with a dotted blue lines here. So the model is capturing the structure of the time series here. We can then isolate different components. So as we said, we have a trend component and we have a seasonal component in the model and the seasonal component has multiple components in terms of the different frequencies associated to those four harmonics that we included. So you can plot the contribution of each of those components. So if I think about the smoothing distributions and I look at the first component of the smoothing distribution, that is the estimate for that baseline. So the grey dots correspond to the data and you can see how this is capturing that trend that we described in the data that decrease at the beginning of the time period, then it increases and then decreases a little bit and then continues increasing towards the end. I can also plot because this is the second order polynomial model, I can also plot the second component of the state vector. So here the second component would be the rate of change and if I look at the smooth distributions for the rate of change parameter, you can see here that this rate of change is kind of decreasing towards the beginning. So it’s just the rate of change here, you have to compare this with 0. So it’s below 0 for this period corresponding to decreasing in the trend and then it goes above 0 for this period and the rest of the period just corresponding to increasing trends and then it’s close to 0 here. Just indicating that there aren’t too many changes in that trend. You can then plot the seasonal components and the first two components of the state vector are corresponding to the second order polynomial trend. And then components 3, 5, 7 and 9 are the components that correspond to each of the harmonics with the fundamental frequency of 12. So monthly period, so every 12 observations you’re going to have a period and then the remaining harmonic. So let me just plot those and show you. So I’m again plotting only the results based on the mean of the smooth distribution here. So you can see how these components are capturing again those characteristics that we observed in the data. So first there is a this is the component that corresponds to period 12. So we see that there is a variation in the amplitude with the largest variation occurring in year 2018. That corresponds to the presidential US election in the the fall of 2018. Same thing happens with the period six and the period four and the period three. So you can see that the components contribute differently to the main seasonality that we observed in the data. Finally we can also get estimates and we can actually get some samples from the posterior distribution of the variance given all the data here, I’m just going to plot just going to show you what is the estimate that we obtained here for these filtering distribution. So this is about 18.76 is the estimate of the variance at the observation level based on this model, using this particular discount factor and this model structure. So you can also plot the forecasting distributions for whatever number of steps ahead. You can do just using the functions as we did before. So this illustrates how again you can analyze data from different sources. Just here is an example with Google Trends data with a model that has different kinds of component. [MUSIC]