This document provides extensive details about the object that is
returned by `statespacer()`

. In order to do so, we start with
introducing the form of the general linear Gaussian state space model,
following the notation used by Durbin and Koopman
(2012). Obtaining a grasp of the notation used will help to get
the most out of the statespacer package!

There are many ways to write down the form of the general linear Gaussian state space model. We use the form used by Durbin and Koopman (2012):

\[ \begin{aligned} y_t ~ &= ~ Z_t\alpha_t ~ + ~ \varepsilon_t, &\varepsilon_t ~ &\sim ~ N(0, ~ H_t), \\ \alpha_{t+1} ~ &= ~ T_t\alpha_t ~ + ~ R_t\eta_t, &\eta_t ~ &\sim ~ N(0, ~ Q_t), \\ & &\alpha_1 ~ &\sim ~ N(a_1, ~ P_1), \end{aligned} \]

where \(y_t\) is the *observation
vector*, a \(p ~ \times ~ 1\)
vector of dependent variables at time \(t\), \(\alpha_t\) is the unobserved *state
vector*, a \(m ~ \times ~ 1\)
vector of state variables at time \(t\), and \(\varepsilon_t\) and \(\eta_t\) are disturbance vectors of
respectively the observation equation, and the state equation. To
initialise the model, \(a_1\) is used
as the initial guess of the state vector, and \(P_1\) is the corresponding uncertainty of
that guess. The matrices \(Z_t\), \(H_t\), \(T_t\), \(R_t\), and \(Q_t\) are called the *system
matrices* of the state space model. Different specifications of
these system matrices, lead to different interpretations of the model at
hand.

Having obtained a better understanding of the notation used, it is
easier to find our way in the object that is returned by
`statespacer()`

. Let’s say we store the object of statespacer
in a variable called `fit`

, that is,
`fit <- statespacer(...)`

. `fit`

is then a
list, containing many items, including other lists. This section
describes the items that are included in `fit`

one by
one.

`function_call`

is a list that contains, as the name
suggests, the call to the `statespacer()`

function, including
default values for the input arguments that were not specified. For
details about the various input arguments, check out
`?statespacer`

.

`system_matrices`

is a list containing all of the system
matrices of each of the components. For the variance - covariance
matrices \(H\) and \(Q\), it also contains 2 decompositions,
namely the Cholesky \(LDL^{\top}\)
decomposition, where \(L\) is the
loading matrix and \(D\) is the
diagonal matrix, and the correlation / standard deviation decomposition.
The initial guess for the state vector, `a1`

, is also
included, together with the corresponding uncertainty split out by its
diffuse component, `P_inf`

, and its stationary component
`P_star`

. Further, it contains `Z_padded`

, which
is a list containing the \(Z\) matrices
of the components augmented with zeroes, such that its dimension is
\(p ~ \times ~ m\). These matrices are
useful to extract individual components (which is already done for you),
or to extract standard deviations of the components. There’s also a
vector called `state_label`

, which labels the state vector to
indicate which state parameters belongs to which components. If
components are specified that introduce parameters into the system
matrices, then these parameters are also included here. At the moment,
these parameters are `lamba`

(frequency) and `rho`

(dampening factor) for the cycles, `AR`

and `MA`

for the ARIMA components, `SAR`

and `SMA`

for the
SARIMA components, and `self_spec`

for the self specified
component. Note that coefficients of explanatory variables are put into
the state vector, so these are treated as state parameters, and readily
returned by the Kalman filter.

`predicted`

is a list that contains the one-step ahead
predicted (predicting time \(t\) using
data up to time \(t ~ - ~ 1\)) objects
as returned by the Kalman filter:

`yfit`

is the predicted value of \(y\).`v`

is the prediction error.`Fmat`

is the uncertainty of the prediction.`a`

is the predicted state.`P`

is the uncertainty of the predicted state.`P_inf`

is the diffuse part of`P`

.`P_star`

is the non-diffuse part of`P`

.`a_fc`

is the predicted state for time \(N ~ + ~ 1\) (\(N\) being the last observed time point).`P_fc`

is the uncertainty of`a_fc`

.`P_inf_fc`

is the diffuse part of`P_fc`

.`P_star_fc`

is the non-diffuse part of`P_fc`

.

Further, the contributions of the components to the predicted values are extracted separately.

`filtered`

is a list that contains the filtered (estimates
for time \(t\) using data up to time
\(t\)) objects as returned by the
Kalman filter. Here, `a`

is the filtered state,
`P`

the uncertainty of the filtered state, `P_inf`

is the diffuse part of `P`

, and `P_star`

is the
non-diffuse part of `P`

. Further, the filtered values of the
components are extracted separately.

`smoothed`

is a list that contains smoothed (estimates for
time \(t\) using all of the time
points) objects as returned by the Kalman smoother:

`a`

is the smoothed state.`V`

the uncertainty of the smoothed state.`eta`

the smoothed state disturbance.`eta_var`

the uncertainty of`eta`

.`epsilon`

the smoothed observation disturbance.`epsilon_var`

the uncertainty of`epsilon`

.

Further, the smoothed values of the components are extracted separately.

`diagnostics`

is a list that contains items useful for
diagnostic tests and model selection:

`initialisation_steps`

is the number of timesteps required before initialisation was achieved of the diffuse elements of the state vector.`loglik`

is the loglikelihood value at the estimated parameters.`AIC`

is the Akaike Information Criterion for the model.`BIC`

is the Bayesian Information Criterion for the model.`r`

is the scaled smoothed state disturbance.`N`

is the uncertainty of`r`

.`param_indices`

is a list containing the indices of the parameters in the parameter vector for each of the components.`hessian`

is the hessian of the loglikelihood evaluated at the estimated parameters.

The following objects are only returned if
`diagnostics = TRUE`

:

`e`

is the smoothing error.`D`

is the uncertainty of`e`

.`Tstat_observation`

is the T-statistic for testing whether deviations from the observation equation are significant.`Tstat_state`

is the T-statistic for testing whether deviations from the state equation are significant.`v_normalised`

is the normalised prediction error.`Skewness`

is the skewness of`v_normalised`

.`Kurtosis`

is the Kurtosis of`v_normalised`

.`Jarque_Bera`

is the Jarque-Bera statistic for testing for normality.`Jarque_Bera_criticalvalue`

is the critical value of the Jarque-Bera test.`correlogram`

is the correlogram of`v_normalised`

.`Box_Ljung`

are the Box-Ljung statistics for testing for serial correlation.`Box_Ljung_criticalvalues`

are the critical values of the Box-Ljung tests.`Heteroscedasticity`

are statistics for testing for heteroscedasticity.`Heteroscedasticity_criticalvalues`

are the critical values of the heteroscedasticity tests.

`optim`

is the list as returned by
`stats::optim`

or `optimx::optimr`

, depending on
if you have optimx installed. See `?stats::optim`

and
`?optimx::optimr`

for details. Only returned if
`fit = TRUE`

.

`loglik_fun`

is the loglikelihood function that takes
`param`

as its only argument. It returns the loglikelihood at
the specified parameters.

`standard_errors`

is a list that contains the standard
errors for the transformed parameters. Its structure mimicks the
structure from `system_matrices`

, but only representing those
system matrices that depend on the parameters. Only returned if
`standard_errors = TRUE`

.

This section provides details about the parameter vector that’s
supplied to `statespacer()`

. It clarifies which elements are
used for what components.

Most components use a variance - covariance matrix, which are constructed using the Cholesky \(LDL^{\top}\) decomposition. The parameters supplied to build the variance - covariance matrix are ordered as follows: First, parameters are used for the Diagonal matrix \(D\) and transformed by \(exp(2x)\). Second, the remaining parameters are assigned columnwise to the Loading matrix \(L\), so first the \(1_{st}\) column, then the \(2_{nd}\) column, and so on.

The parameters are assigned to the components in the following order:

- The variance - covariance matrix, \(H\), of the observation equation. Unless the \(H\) matrix is self-specified!
- The Local Level component.
- The Local Level component + Slope in that order.
- The BSM components, in the order of the specified
`BSM_vec`

. - Explanatory Variables, if the coefficients are time-varying. The coefficients themselves go into the state vector, so they don’t need any parameters!
- Local Level + Explanatory Variables in the Level. First the parameters go to the variance - covariance matrix of the Level, after which the remaining parameters go to the variance - covariance matrix of the Explanatory Variables (if time-varying).
- Local Level + Slope + Explanatory Variables in the Level. First the parameters go to the variance - covariance matrix of the Level, then they go to the variance - covariance matrix of the Slope, after which the remaining parameters go to the variance - covariance matrix of the Explanatory Variables (if time-varying).
- The Cycle components, in the order of the specified cycles. The
first parameter is used for the frequency, \(\lambda\), of the cycle. The second
parameter is used for the damping factor, \(\rho\), of the cycle, but only if
`damping_factor_ind = TRUE`

. The remaining parameters are used for the variance - covariance matrix. - ARIMA, in the order of the specified ARIMA components. First, the parameters are used for the variance - covariance matrix. Then, the remaining parameters are first used for the AR coefficients, and then the MA coefficients.
- SARIMA, in the order of the specified SARIMA components. First, the
parameters are used for the variance - covariance matrix. Then, the
remaining parameters are used in the order of the specified
seasonalities
`s`

, first used for the AR coefficients of the first seasonality, and then the MA coefficients of the first seasonality, and so on for the subsequent seasonalities. - The self-specified part.

Care should be taken in specifying the initial parameters! Usually, I check out the variances of the dependent variables and then apply the transformation \(0.5\log(x)\) to the variances, and specify those as initial values for the parameters that go to the various variance - covariance matrices. For the AR and MA coefficients, it might be beneficial to initialise them close to 0, to prevent them from converging to unit root solutions. Using the information in this section, it should make the trial and error process of finding proper initial parameters less cumbersome!

Durbin, James, and Siem Jan Koopman. 2012. *Time Series Analysis by
State Space Methods*. Oxford university press.