UCM Procedure

The UCMs as State Space Models

The UCMs considered in PROC UCM are special cases of more general models, called (linear) state space models (SSM). The section State Space Model and Notation in Chapter 33, SSM Procedure, provides an elaborate notation for such models. However, for most of the UCMs considered in PROC UCM, much simpler notation suffices. This section describes a treatment of UCMs in terms of this simplified notation. At times the description and mathematical treatment (such as the expressions of likelihood) of state space models in PROC UCM and PROC SSM can appear different. However, these differences are only notational and the underlying mathematical quantities coincide. For example, the diffuse Kalman filter (DKF) described in this section is called the exact initial Kalman filter whereas the DKF described in the section Filtering, Smoothing, Likelihood, and Structural Break Detection in Chapter 33, SSM Procedure, is called the augmented Kalman filter. Both of these algorithms produce the same final output (see Durbin and Koopman (2012, chap. 5) for more information).

An SSM can be described as follows:

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals 3rd Column upper Z Subscript t Baseline alpha Subscript t 2nd Row 1st Column alpha Subscript t plus 1 2nd Column equals 3rd Column upper T Subscript t Baseline alpha Subscript t Baseline plus zeta Subscript t plus 1 Baseline comma zeta Subscript t Baseline tilde upper N left-parenthesis 0 comma upper Q Subscript t Baseline right-parenthesis 3rd Row 1st Column alpha 1 2nd Column tilde 3rd Column upper N left-parenthesis 0 comma upper P right-parenthesis EndLayout

The first equation, called the observation equation, relates the response series y Subscript t to a state vector alpha Subscript t that is usually unobserved. The second equation, called the state equation, describes the evolution of the state vector in time. The system matrices upper Z Subscript t and upper T Subscript t are of appropriate dimensions and are known, except possibly for some unknown elements that become part of the parameter vector of the model. The noise series zeta Subscript t consists of independent, zero-mean, Gaussian vectors with covariance matrices upper Q Subscript t. For most of the UCMs considered here, the system matrices upper Z Subscript t and upper T Subscript t, and the noise covariances upper Q Subscript t, are time invariant—that is, they do not depend on time. In a few cases, however, some or all of them can depend on time. The initial state vector alpha 1 is assumed to be independent of the noise series, and its covariance matrix P can be partially diffuse. A random vector has a partially diffuse covariance matrix if it can be partitioned such that one part of the vector has a properly defined probability distribution, while the covariance matrix of the other part is infinite—that is, you have no prior information about this part of the vector. The covariance of the initial state alpha 1 is assumed to have the form

upper P equals upper P Subscript asterisk Baseline plus kappa upper P Subscript normal infinity

where upper P Subscript asterisk and upper P Subscript normal infinity are nonnegative definite, symmetric matrices and kappa is a constant that is assumed to be close to normal infinity. In the case of UCMs considered here, upper P Subscript normal infinity is always a diagonal matrix that consists of zeros and ones, and, if a particular diagonal element of upper P Subscript normal infinity is one, then the corresponding row and column in upper P Subscript asterisk are zero.

The state space formulation of a UCM has many computational advantages. In this formulation there are convenient algorithms for estimating and forecasting the unobserved states StartSet alpha Subscript t Baseline EndSet by using the observed series StartSet y Subscript t Baseline EndSet. These algorithms also yield the in-sample and out-of-sample forecasts and the likelihood of StartSet y Subscript t Baseline EndSet. The state space representation of a UCM does not need to be unique. In the representation used here, the unobserved components in the UCM often appear as elements of the state vector. This makes the elements of the state interpretable and, more important, the sample estimates and forecasts of these unobserved components are easily obtained. For additional information about the computational aspects of the state space modeling, see Durbin and Koopman (2012). Next, some notation is developed to describe the essential quantities computed during the analysis of the state space models.

Let StartSet y Subscript t Baseline comma t equals 1 comma ellipsis comma n EndSet be the observed sample from a series that satisfies a state space model. Next, for 1 less-than-or-equal-to t less-than-or-equal-to n, let the one-step-ahead forecasts of the series, the states, and their variances be defined as follows, using the usual notation to denote the conditional expectation and conditional variance:

StartLayout 1st Row 1st Column ModifyingAbove alpha With caret Subscript t 2nd Column equals 3rd Column upper E left-parenthesis alpha Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript t minus 1 Baseline right-parenthesis 2nd Row 1st Column normal upper Gamma Subscript t 2nd Column equals 3rd Column Var left-parenthesis alpha Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript t minus 1 Baseline right-parenthesis 3rd Row 1st Column ModifyingAbove y With caret Subscript t 2nd Column equals 3rd Column upper E left-parenthesis y Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript t minus 1 Baseline right-parenthesis 4th Row 1st Column upper F Subscript t 2nd Column equals 3rd Column Var left-parenthesis y Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript t minus 1 Baseline right-parenthesis EndLayout

These are also called the filtered estimates of the series and the states. Similarly, for t greater-than-or-equal-to 1, let the following denote the full-sample estimates of the series and the state values at time t:

StartLayout 1st Row 1st Column alpha overTilde Subscript t 2nd Column equals 3rd Column upper E left-parenthesis alpha Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript n Baseline right-parenthesis 2nd Row 1st Column normal upper Delta Subscript t 2nd Column equals 3rd Column Var left-parenthesis alpha Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript n Baseline right-parenthesis 3rd Row 1st Column y overTilde Subscript t 2nd Column equals 3rd Column upper E left-parenthesis y Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript n Baseline right-parenthesis 4th Row 1st Column upper G Subscript t 2nd Column equals 3rd Column Var left-parenthesis y Subscript t Baseline vertical-bar y 1 comma y 2 comma ellipsis comma y Subscript n Baseline right-parenthesis EndLayout

If the time t is in the historical period—that is, if 1 less-than-or-equal-to t less-than-or-equal-to n—then the full-sample estimates are called the smoothed estimates, and if t lies in the future then they are called out-of-sample forecasts. Note that if 1 less-than-or-equal-to t less-than-or-equal-to n, then y overTilde Subscript t Baseline equals y Subscript t and upper G Subscript t Baseline equals 0, unless y Subscript t is missing.

All the filtered and smoothed estimates (ModifyingAbove alpha With caret Subscript t Baseline comma alpha overTilde Subscript t Baseline comma ellipsis comma upper G Subscript t Baseline, and so on) are computed by using the Kalman filtering and smoothing (KFS) algorithm, which is an iterative process. If the initial state is diffuse, as is often the case for the UCMs, its treatment requires modification of the traditional KFS, which is called the diffuse KFS (DKFS). The details of DKFS implemented in the UCM procedure can be found in De Jong and Chu-Chun-Lin (2003). Additional information on the state space models can be found in Durbin and Koopman (2012). The likelihood formulas described in this section are taken from the latter reference.

In the case of diffuse initial condition, the effect of the improper prior distribution of alpha 1 manifests itself in the first few filtering iterations. During these initial filtering iterations the distribution of the filtered quantities remains diffuse; that is, during these iterations the one-step-ahead series and state forecast variances upper F Subscript t and normal upper Gamma Subscript t have the following form:

StartLayout 1st Row 1st Column upper F Subscript t 2nd Column equals 3rd Column upper F Subscript asterisk t Baseline plus kappa upper F Subscript normal infinity t 2nd Row 1st Column normal upper Gamma Subscript t 2nd Column equals 3rd Column normal upper Gamma Subscript asterisk t Baseline plus kappa normal upper Gamma Subscript normal infinity t EndLayout

The actual number of iterations—for example, I—affected by this improper prior depends on the nature of the vectors upper Z Subscript t, the number of nonzero diagonal elements of upper P Subscript normal infinity, and the pattern of missing values in the dependent series. After I iterations, normal upper Gamma Subscript normal infinity t and upper F Subscript normal infinity t become zero and the one-step-ahead series and state forecasts have proper distributions. These first I iterations constitute the initialization phase of the DKFS algorithm. The post-initialization phase of the DKFS and the traditional KFS is the same. In the state space modeling literature the pre-initialization and post-initialization phases are some times called pre-collapse and post-collapse phases of the diffuse Kalman filtering. In certain missing value patterns it is possible for I to exceed the sample size; that is, the sample information can be insufficient to create a proper prior for the filtering process. In these cases, parameter estimation and forecasting is done on the basis of this improper prior, and some or all of the series and component forecasts can have infinite variances (or zero precision). The forecasts that have infinite variance are set to missing. The same situation can occur if the specified model contains components that are essentially multicollinear. In these situations no residual analysis is possible; in particular, no residuals-based goodness-of-fit statistics are produced.

The log likelihood of the sample (upper L Subscript normal infinity), which takes account of this diffuse initialization step, is computed by using the one-step-ahead series forecasts as follows,

upper L Subscript normal infinity Baseline left-parenthesis y 1 comma ellipsis comma y Subscript n Baseline right-parenthesis equals minus StartFraction left-parenthesis n minus d right-parenthesis Over 2 EndFraction log 2 pi minus one-half sigma-summation Underscript t equals 1 Overscript upper I Endscripts w Subscript t Baseline minus one-half sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts left-parenthesis log upper F Subscript t Baseline plus StartFraction nu Subscript t Superscript 2 Baseline Over upper F Subscript t Baseline EndFraction right-parenthesis

where d is the number of diffuse elements in the initial state alpha 1, nu Subscript t Baseline equals y Subscript t Baseline minus upper Z Subscript t Baseline ModifyingAbove alpha With caret Subscript t are the one-step-ahead residuals, and

StartLayout 1st Row 1st Column w Subscript t 2nd Column equals 3rd Column log upper F Subscript normal infinity t Baseline if upper F Subscript normal infinity t Baseline greater-than 0 2nd Row 1st Column Blank 2nd Column equals 3rd Column log upper F Subscript asterisk t Baseline plus StartFraction nu Subscript t Superscript 2 Baseline Over upper F Subscript asterisk t Baseline EndFraction if upper F Subscript normal infinity t Baseline equals 0 EndLayout

If y Subscript t is missing at some time t, then the corresponding summand in the log likelihood expression is deleted, and the constant term is adjusted suitably. Moreover, if the initialization step does not complete—that is, if I exceeds the sample size—then the value of d is reduced to the number of diffuse states that are successfully initialized.

The portion of the log likelihood that corresponds to the post-initialization period is called the nondiffuse log likelihood (upper L 0). The nondiffuse log likelihood is given by

upper L 0 left-parenthesis y 1 comma ellipsis comma y Subscript n Baseline right-parenthesis equals minus one-half sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts left-parenthesis log upper F Subscript t Baseline plus StartFraction nu Subscript t Superscript 2 Baseline Over upper F Subscript t Baseline EndFraction right-parenthesis

In the case of UCMs considered in PROC UCM, it often happens that the diffuse part of the likelihood, sigma-summation Underscript t equals 1 Overscript upper I Endscripts w Subscript t, does not depend on the model parameters, and in these cases the maximization of nondiffuse and diffuse likelihoods is equivalent. However, in some cases, such as when the model consists of dependent lags, the diffuse part does depend on the model parameters. In these cases the maximization of the diffuse and nondiffuse likelihood can produce different parameter estimates.

In some situations it is convenient to reparameterize the nondiffuse initial state covariance upper P Subscript asterisk as sigma squared upper P Subscript asterisk and the state noise covariance upper Q Subscript t as sigma squared upper Q Subscript t for some common scalar parameter sigma squared. In this case the preceding log-likelihood expression, up to a constant, can be written as

upper L Subscript normal infinity Baseline left-parenthesis y 1 comma ellipsis comma y Subscript n Baseline right-parenthesis equals minus one-half sigma-summation Underscript t equals 1 Overscript upper I Endscripts w Subscript t Baseline minus one-half sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts log upper F Subscript t Baseline minus StartFraction 1 Over 2 sigma squared EndFraction sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts StartFraction nu Subscript t Superscript 2 Baseline Over upper F Subscript t Baseline EndFraction minus StartFraction left-parenthesis n minus d right-parenthesis Over 2 EndFraction log sigma squared

Solving analytically for the optimum, the maximum likelihood estimate of sigma squared can be shown to be

ModifyingAbove sigma With caret squared equals StartFraction 1 Over left-parenthesis n minus d right-parenthesis EndFraction sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts StartFraction nu Subscript t Superscript 2 Baseline Over upper F Subscript t Baseline EndFraction

When this expression of sigma squared is substituted back into the likelihood formula, an expression called the profile likelihood (upper L Subscript normal p normal r normal o normal f normal i normal l normal e) of the data is obtained:

minus 2 upper L Subscript normal p normal r normal o normal f normal i normal l normal e Baseline left-parenthesis y 1 comma ellipsis comma y Subscript n Baseline right-parenthesis equals sigma-summation Underscript t equals 1 Overscript upper I Endscripts w Subscript t Baseline plus sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts log upper F Subscript t Baseline plus left-parenthesis n minus d right-parenthesis log left-parenthesis sigma-summation Underscript t equals upper I plus 1 Overscript n Endscripts StartFraction nu Subscript t Superscript 2 Baseline Over upper F Subscript t Baseline EndFraction right-parenthesis

In some situations the parameter estimation is done by optimizing the profile likelihood (see the section Parameter Estimation by Profile Likelihood Optimization and the PROFILE option in the ESTIMATE statement).

You can also request that parameter estimation be based on an alternate form of the likelihood, called the marginal likelihood (bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis). You can switch to the marginal-likelihood-based parameter estimation by specifying LIKE=MARGINAL in the ESTIMATE statement. This alternate likelihood and two additional likelihoods are described in the section Likelihood Computation and Model-Fitting Phase in Chapter 33, SSM Procedure. The diffuse likelihood, upper L Subscript normal infinity, described in this section is equivalent to the diffuse likelihood, bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis, described in that section. However, do not confuse the profile likelihood, upper L Subscript normal p normal r normal o normal f normal i normal l normal e, described in this section with the profile likelihood, bold upper L Subscript p Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis, described in that section. The profiling in bold upper L Subscript p Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis refers to the profiling of the diffuse effects, whereas the profiling in upper L Subscript normal p normal r normal o normal f normal i normal l normal e refers to the profiling of a common scalar parameter sigma squared. For each of the three likelihoods—diffuse, marginal and profile—that are described in that section, it is possible to profile out (also called concentrate out) a common scalar parameter sigma squared and obtain expressions similar to the upper L Subscript normal p normal r normal o normal f normal i normal l normal e likelihood that is described in this section. In fact, when you request that parameter estimation be based on the marginal likelihood by specifying LIKE=MARGINAL in the ESTIMATE statement, the profile version of marginal likelihood (bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis) is used if the PROFILE option is in effect (by default or when the PROFILE option is specified). The discussion in the section Parameter Estimation by Profile Likelihood Optimization also applies to marginal likelihood. As explained in the section Likelihood Computation and Model-Fitting Phase in Chapter 33, SSM Procedure, the estimates that are based on marginal likelihood and the estimates that are based on diffuse likelihood coincide in many cases. In PROC UCM, estimates that are based on marginal likelihood and diffuse likelihood will differ only if at least one of the following conditions holds:

  • The DEPLAG statement is present and the NOEST option is not specified.

  • In a TF statement, at least one denominator factor is present and the NOEST option is not specified.

  • In a CYCLE statement, RHO is fixed at 1 and the period is to be estimated—that is, RHO=1 and NOEST=RHO or NOEST=(RHO VARIANCE).

Whenever you specify LIKE=MARGINAL in the ESTIMATE statement, the FitSummary table that displays the likelihood-based fit statistics includes fit statistics and information criteria that are based on the marginal likelihood in addition to fit statistics that are based on diffuse likelihood.

In the remainder of this section, the state space formulation of UCMs is further explained by using some particular UCMs as examples. The examples show that the state space formulation of the UCMs depends on the components in the model in a simple fashion; for example, the system matrix T is usually a block diagonal matrix with blocks that correspond to the components in the model. The only exception to this pattern is the UCMs that consist of the lags of dependent variable. This case is considered at the end of the section.

In what follows, normal upper D normal i normal a normal g left-bracket a comma b comma ellipsis right-bracket denotes a diagonal matrix with diagonal entries left-bracket a comma b comma ellipsis right-bracket, and the transpose of a matrix T is denoted as upper T Superscript prime.

Locally Linear Trend Model

Recall that the dynamics of the locally linear trend model are

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals 3rd Column mu Subscript t Baseline plus epsilon Subscript t 2nd Row 1st Column mu Subscript t 2nd Column equals 3rd Column mu Subscript t minus 1 Baseline plus beta Subscript t minus 1 Baseline plus eta Subscript t 3rd Row 1st Column beta Subscript t 2nd Column equals 3rd Column beta Subscript t minus 1 Baseline plus xi Subscript t EndLayout

Here y Subscript t is the response series and epsilon Subscript t Baseline comma eta Subscript t Baseline comma and xi Subscript t are independent, zero-mean Gaussian disturbance sequences with variances sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2, and sigma Subscript xi Superscript 2, respectively. This model can be formulated as a state space model where the state vector alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline mu Subscript t Baseline beta Subscript t Baseline right-bracket Superscript prime and the state noise zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline eta Subscript t Baseline xi Subscript t Baseline right-bracket Superscript prime. Note that the elements of the state vector are precisely the unobserved components in the model. The system matrices T and Z and the noise covariance Q corresponding to this choice of state and state noise vectors can be seen to be time invariant and are given by

upper Z equals left-bracket 1 1 0 right-bracket comma upper T equals Start 3 By 1 Matrix 1st Row  0 0 0 2nd Row  0 1 1 3rd Row  0 0 1 EndMatrix normal a normal n normal d upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline right-bracket

The distribution of the initial state vector alpha 1 is diffuse, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma 0 comma 0 right-bracket and upper P Subscript normal infinity Baseline equals normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 right-bracket. The parameter vector theta consists of all the disturbance variances—that is, theta equals left-parenthesis sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline right-parenthesis.

Basic Structural Model

The basic structural model (BSM) is obtained by adding a seasonal component, gamma Subscript t, to the local level model. In order to economize on the space, the state space formulation of a BSM with a relatively short season length, season length = 4 (quarterly seasonality), is considered here. The pattern for longer season lengths such as 12 (monthly) and 52 (weekly) is easy to see.

Let us first consider the dummy form of seasonality. In this case the state and state noise vectors are alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline mu Subscript t Baseline beta Subscript t Baseline gamma Subscript 1 comma t Baseline gamma Subscript 2 comma t Baseline gamma Subscript 3 comma t Baseline right-bracket Superscript prime and zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline eta Subscript t Baseline xi Subscript t Baseline omega Subscript t Baseline 0 0 right-bracket Superscript prime, respectively. The first three elements of the state vector are the irregular, level, and slope components, respectively. The remaining elements, gamma Subscript i comma t, are lagged versions of the seasonal component gamma Subscript t. gamma Subscript 1 comma t corresponds to lag zero—that is, the same as gamma Subscript t, gamma Subscript 2 comma t to lag 1 and gamma Subscript 3 comma t to lag 2. The system matrices are

upper Z equals left-bracket 1 1 0 1 0 0 right-bracket comma upper T equals Start 6 By 6 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column negative 1 5th Column negative 1 6th Column negative 1 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column 0 EndMatrix

and upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline comma 0 comma 0 right-bracket. The distribution of the initial state vector alpha 1 is diffuse, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma 0 comma 0 comma 0 comma 0 comma 0 right-bracket and upper P Subscript normal infinity Baseline equals normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 comma 1 comma 1 comma 1 right-bracket.

In the case of the trigonometric type of seasonality, alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline mu Subscript t Baseline beta Subscript t Baseline gamma Subscript 1 comma t Baseline gamma Subscript 1 comma t Superscript asterisk Baseline gamma Subscript 2 comma t Baseline right-bracket Superscript prime and zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline eta Subscript t Baseline xi Subscript t Baseline omega Subscript 1 comma t Baseline omega Subscript 1 comma t Superscript asterisk Baseline omega Subscript 2 comma t Baseline right-bracket Superscript prime. The disturbance sequences, omega Subscript j comma t Baseline comma 1 less-than-or-equal-to j less-than-or-equal-to 2, and omega Subscript 1 comma t Superscript asterisk, are independent, zero-mean, Gaussian sequences with variance sigma Subscript omega Superscript 2. The system matrices are

upper Z equals left-bracket 1 1 0 1 0 1 right-bracket comma upper T equals Start 6 By 6 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column cosine lamda 1 5th Column sine lamda 1 6th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column minus sine lamda 1 5th Column cosine lamda 1 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column cosine lamda 2 EndMatrix

and upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline right-bracket. Here lamda Subscript j Baseline equals left-parenthesis 2 pi j right-parenthesis slash 4. The distribution of the initial state vector alpha 1 is diffuse, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma 0 comma 0 comma 0 comma 0 comma 0 right-bracket and upper P Subscript normal infinity Baseline equals normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 comma 1 comma 1 comma 1 right-bracket. The parameter vector in both the cases is theta equals left-parenthesis sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline right-parenthesis.

Seasons with Blocked Seasonal Values

Block seasonals are special seasonal components that impose a special block structure on the seasonal effects. Let us consider a BSM with monthly seasonality that has a quarterly block structure—that is, months within the same quarter are assumed to have identical effects except for some random perturbation. Such a seasonal component is a block seasonal with block size m equal to 3 and the number of blocks k equal to 4. The state space structure for such a model with dummy-type seasonality is as follows: The state and state noise vectors are alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline mu Subscript t Baseline beta Subscript t Baseline gamma Subscript 1 comma t Baseline gamma Subscript 2 comma t Baseline gamma Subscript 3 comma t Baseline right-bracket Superscript prime and zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline eta Subscript t Baseline xi Subscript t Baseline omega Subscript t Baseline 0 0 right-bracket Superscript prime, respectively. The first three elements of the state vector are the irregular, level, and slope components, respectively. The remaining elements, gamma Subscript i comma t, are lagged versions of the seasonal component gamma Subscript t. gamma Subscript 1 comma t corresponds to lag zero—that is, the same as gamma Subscript t, gamma Subscript 2 comma t to lag m and gamma Subscript 3 comma t to lag 2 m. All the system matrices are time invariant, except the matrix T. They can be seen to be upper Z equals left-bracket 1 1 0 1 0 0 right-bracket, upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline comma 0 comma 0 right-bracket, and

upper T Subscript t Baseline equals Start 6 By 6 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column negative 1 5th Column negative 1 6th Column negative 1 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column 0 EndMatrix

when t is a multiple of the block size m, and

upper T Subscript t Baseline equals Start 6 By 6 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 6th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 1 EndMatrix

otherwise. Note that when t is not a multiple of m, the portion of the upper T Subscript t matrix corresponding to the seasonal is identity. The distribution of the initial state vector alpha 1 is diffuse, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma 0 comma 0 comma 0 comma 0 comma 0 right-bracket and upper P Subscript normal infinity Baseline equals normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 comma 1 comma 1 comma 1 right-bracket.

Similarly, in the case of the trigonometric form of seasonality, alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline mu Subscript t Baseline beta Subscript t Baseline gamma Subscript 1 comma t Baseline gamma Subscript 1 comma t Superscript asterisk Baseline gamma Subscript 2 comma t Baseline right-bracket Superscript prime and zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline eta Subscript t Baseline xi Subscript t Baseline omega Subscript 1 comma t Baseline omega Subscript 1 comma t Superscript asterisk Baseline omega Subscript 2 comma t Baseline right-bracket Superscript prime. The disturbance sequences, omega Subscript j comma t Baseline comma 1 less-than-or-equal-to j less-than-or-equal-to 2, and omega Subscript 1 comma t Superscript asterisk, are independent, zero-mean, Gaussian sequences with variance sigma Subscript omega Superscript 2. upper Z equals left-bracket 1 1 0 1 0 1 right-bracket, upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline right-bracket, and

upper T Subscript t Baseline equals Start 6 By 6 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column cosine lamda 1 5th Column sine lamda 1 6th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column minus sine lamda 1 5th Column cosine lamda 1 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column cosine lamda 2 EndMatrix

when t is a multiple of the block size m, and

upper T Subscript t Baseline equals Start 6 By 6 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 0 2nd Row 1st Column 0 2nd Column 1 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 1 4th Column 0 5th Column 0 6th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 6th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 6th Column 0 6th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 6th Column 1 EndMatrix

otherwise. As before, when t is not a multiple of m, the portion of the upper T Subscript t matrix corresponding to the seasonal is identity. Here lamda Subscript j Baseline equals left-parenthesis 2 pi j right-parenthesis slash 4. The distribution of the initial state vector alpha 1 is diffuse, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma 0 comma 0 comma 0 comma 0 comma 0 right-bracket and upper P Subscript normal infinity Baseline equals normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 comma 1 comma 1 comma 1 right-bracket. The parameter vector in both the cases is theta equals left-parenthesis sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript omega Superscript 2 Baseline right-parenthesis.

Cycles and Autoregression

The preceding examples have illustrated how to build a state space model corresponding to a UCM that includes components such as irregular, trend, and seasonal. There you can see that the state vector and the system matrices have a simple block structure with blocks corresponding to the components in the model. Therefore, here only a simple model consisting of a single cycle and an irregular component is considered. The state space form for more complex UCMs consisting of multiple cycles and other components can be easily deduced from this example.

Recall that a stochastic cycle psi Subscript t with frequency lamda, 0 less-than lamda less-than pi, and damping coefficient rho can be modeled as

StartBinomialOrMatrix psi Subscript t Baseline Choose psi Subscript t Superscript asterisk Baseline EndBinomialOrMatrix equals rho Start 2 By 2 Matrix 1st Row 1st Column cosine lamda 2nd Column sine lamda 2nd Row 1st Column minus sine lamda 2nd Column cosine lamda EndMatrix StartBinomialOrMatrix psi Subscript t minus 1 Baseline Choose psi Subscript t minus 1 Superscript asterisk Baseline EndBinomialOrMatrix plus StartBinomialOrMatrix nu Subscript t Baseline Choose nu Subscript t Superscript asterisk EndBinomialOrMatrix

where nu Subscript t and nu Subscript t Superscript asterisk are independent, zero-mean, Gaussian disturbances with variance sigma Subscript nu Superscript 2. In what follows, a state space form for a model consisting of such a stochastic cycle and an irregular component is given.

The state vector alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline psi Subscript t Baseline psi Subscript t Superscript asterisk Baseline right-bracket Superscript prime, and the state noise vector zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline nu Subscript t Baseline nu Subscript t Superscript asterisk Baseline right-bracket Superscript prime. The system matrices are

upper Z equals left-bracket 1 1 0 right-bracket upper T equals Start 3 By 3 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 2nd Row 1st Column 0 2nd Column rho cosine lamda 3rd Column rho sine lamda 3rd Row 1st Column 0 2nd Column minus rho sine lamda 3rd Column rho cosine lamda EndMatrix upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript nu Superscript 2 Baseline comma sigma Subscript nu Superscript 2 Baseline right-bracket

The distribution of the initial state vector alpha 1 is proper, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript psi Superscript 2 Baseline comma sigma Subscript psi Superscript 2 Baseline right-bracket, where sigma Subscript psi Superscript 2 Baseline equals sigma Subscript nu Superscript 2 Baseline left-parenthesis 1 minus rho squared right-parenthesis Superscript negative 1. The parameter vector theta equals left-parenthesis sigma Subscript epsilon Superscript 2 Baseline comma rho comma lamda comma sigma Subscript nu Superscript 2 Baseline right-parenthesis.

An autoregression r Subscript t can be considered as a special case of cycle with frequency lamda equal to 0 or pi. In this case the equation for psi Subscript t Superscript asterisk is not needed. Therefore, for a UCM consisting of an autoregressive component and an irregular component, the state space model simplifies to the following form.

The state vector alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline r Subscript t Baseline right-bracket Superscript prime, and the state noise vector zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline nu Subscript t Baseline right-bracket Superscript prime. The system matrices are

upper Z equals left-bracket 1 1 right-bracket comma upper T equals Start 2 By 2 Matrix 1st Row 1st Column 0 2nd Column 0 2nd Row 1st Column 0 2nd Column rho EndMatrix normal a normal n normal d upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript nu Superscript 2 Baseline right-bracket

The distribution of the initial state vector alpha 1 is proper, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript r Superscript 2 Baseline right-bracket, where sigma Subscript r Superscript 2 Baseline equals sigma Subscript nu Superscript 2 Baseline left-parenthesis 1 minus rho squared right-parenthesis Superscript negative 1. The parameter vector theta equals left-parenthesis sigma Subscript epsilon Superscript 2 Baseline comma rho comma sigma Subscript nu Superscript 2 Baseline right-parenthesis.

Incorporating Predictors of Different Types

In the UCM procedure, you can incorporate predictors in a UCM in a variety of ways: you can specify simple time-invariant linear predictors in the MODEL statement, you can specify predictors that have time-varying coefficients in the RANDOMREG statement, and you can specify predictors that have a nonlinear relationship with the response variable in the SPLINEREG statement. You can also specify a transfer-function relationship by using the TF statement. As with earlier examples, the first part of this section uses a simple special case to show how to obtain a state space form of a UCM that consists of a variety of predictors (except the transfer-function relationship). The state space form that is associated with a transfer-function relationship is described in the section State Space Form of a Transfer Function Relationship.

Consider a random walk trend model that has predictors x comma u 1 comma u 2, and v. Assume that x is a simple regressor that is specified in the MODEL statement, u 1 and u 2 are random regressors with time-varying regression coefficients that are specified in the same RANDOMREG statement, and v is a nonlinear regressor that is specified in a SPLINEREG statement. Further assume that the spline that is associated with v has degree one and is based on two internal knots. As explained in the section SPLINEREG Statement, using v is equivalent to using left-parenthesis n italic knots plus italic degree right-parenthesis equals left-parenthesis 2 plus 1 right-parenthesis equals 3 derived (random) regressors: for example, s 1 comma s 2 comma s 3. There are left-parenthesis 1 plus 2 plus 3 right-parenthesis equals 6 regressors in all, the first one being a simple regressor and the others being time-varying coefficient regressors. The time-varying regressors are in two groups: the first group consists of u 1 and u 2, and the other group consists of s 1 comma s 2, and s 3. The dynamics of this model are as follows:

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals 3rd Column mu Subscript t Baseline plus beta x Subscript t plus kappa Subscript 1 t Baseline u Subscript 1 t plus kappa Subscript 2 t Baseline u Subscript 2 t plus sigma-summation Underscript i equals 1 Overscript 3 Endscripts gamma Subscript i t Baseline s Subscript i t plus epsilon Subscript t 2nd Row 1st Column mu Subscript t 2nd Column equals 3rd Column mu Subscript t minus 1 Baseline plus eta Subscript t 3rd Row 1st Column kappa Subscript 1 t 2nd Column equals 3rd Column kappa Subscript 1 left-parenthesis t minus 1 right-parenthesis Baseline plus xi Subscript 1 t 4th Row 1st Column kappa Subscript 2 t 2nd Column equals 3rd Column kappa Subscript 2 left-parenthesis t minus 1 right-parenthesis Baseline plus xi Subscript 2 t 5th Row 1st Column gamma Subscript 1 t 2nd Column equals 3rd Column gamma Subscript 1 left-parenthesis t minus 1 right-parenthesis Baseline plus zeta Subscript 1 t 6th Row 1st Column gamma Subscript 2 t 2nd Column equals 3rd Column gamma Subscript 2 left-parenthesis t minus 1 right-parenthesis Baseline plus zeta Subscript 2 t 7th Row 1st Column gamma Subscript 3 t 2nd Column equals 3rd Column gamma Subscript 3 left-parenthesis t minus 1 right-parenthesis Baseline plus zeta Subscript 3 t EndLayout

All the disturbances epsilon Subscript t Baseline comma eta Subscript t Baseline comma xi Subscript 1 t Baseline comma xi Subscript 2 t Baseline comma zeta Subscript 1 t Baseline comma zeta Subscript 2 t Baseline comma and zeta Subscript 3 t are independent, zero-mean, Gaussian variables, where xi Subscript 1 t Baseline comma xi Subscript 2 t Baseline share a common variance parameter sigma Subscript xi Superscript 2 and zeta Subscript 1 t Baseline comma zeta Subscript 2 t Baseline comma zeta Subscript 3 t Baseline share a common variance sigma Subscript zeta Superscript 2. These dynamics can be captured in the state space form by taking state alpha Subscript t Baseline equals left-bracket epsilon Subscript t Baseline mu Subscript t Baseline beta kappa Subscript 1 t Baseline kappa Subscript 2 t Baseline gamma Subscript 1 t Baseline gamma Subscript 2 t Baseline gamma Subscript 3 t Baseline right-bracket Superscript prime, state disturbance zeta Subscript t Baseline equals left-bracket epsilon Subscript t Baseline eta Subscript t Baseline 0 xi Subscript 1 t Baseline xi Subscript 2 t Baseline zeta Subscript 1 t Baseline zeta Subscript 2 t Baseline zeta Subscript 3 t Baseline right-bracket Superscript prime, and the system matrices

StartLayout 1st Row 1st Column upper Z Subscript t 2nd Column equals 3rd Column left-bracket 1 1 x Subscript t Baseline u Subscript 1 t Baseline u Subscript 2 t Baseline s Subscript 1 t Baseline s Subscript 2 t Baseline s Subscript 3 t Baseline right-bracket 2nd Row 1st Column upper T 2nd Column equals 3rd Column normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 comma 1 comma 1 comma 1 comma 1 comma 1 right-bracket 3rd Row 1st Column upper Q 2nd Column equals 3rd Column normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma sigma Subscript eta Superscript 2 Baseline comma 0 comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma sigma Subscript zeta Superscript 2 Baseline comma sigma Subscript zeta Superscript 2 Baseline comma sigma Subscript zeta Superscript 2 Baseline right-bracket EndLayout

Note that the regression coefficients are elements of the state vector and that the system vector upper Z Subscript t is not time-invariant. The distribution of the initial state vector alpha 1 is diffuse, with upper P Subscript asterisk Baseline equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline comma 0 comma 0 comma 0 comma 0 comma 0 comma 0 comma 0 right-bracket and upper P Subscript normal infinity Baseline equals normal upper D normal i normal a normal g left-bracket 0 comma 1 comma 1 comma 1 comma 1 comma 1 comma 1 comma 1 right-bracket. The parameters of this model are the disturbance variances, sigma Subscript epsilon Superscript 2, sigma Subscript eta Superscript 2 Baseline comma sigma Subscript xi Superscript 2 Baseline comma and sigma Subscript zeta Superscript 2, which are estimated by maximizing the likelihood. The regression coefficients, time-invariant beta, and time-varying kappa Subscript 1 t Baseline comma kappa Subscript 2 t Baseline comma gamma Subscript 1 t Baseline comma gamma Subscript 2 t Baseline and gamma Subscript 3 t are implicitly estimated during the state estimation (smoothing).

State Space Form of a Transfer Function Relationship

This section illustrates the state space form of a simple transfer-function relationship. The state space form of more complicated transfer-function relationships can be deduced using the same logic. Suppose that a predictor x enters the model for a response variable y as

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals 3rd Column f Subscript t Baseline plus epsilon Subscript t 2nd Row 1st Column f Subscript t 2nd Column equals 3rd Column StartFraction left-parenthesis gamma 0 plus gamma 1 upper B right-parenthesis Over left-parenthesis 1 minus delta 1 upper B minus delta 2 upper B squared right-parenthesis EndFraction x Subscript t EndLayout

where f Subscript t is the transfer-function component and epsilon Subscript t is a sequence of independent, zero-mean, Gaussian variables. In this description, the transfer-function component is described using the backward shift operator B. Alternatively, it can be described as follows:

f Subscript t Baseline equals delta 1 f Subscript t minus 1 Baseline plus delta 2 f Subscript t minus 2 Baseline plus gamma 0 x Subscript t Baseline plus gamma 1 x Subscript t minus 1

This model can be easily put in a state space form by taking state alpha alpha Subscript t Baseline equals left-parenthesis epsilon Subscript t Baseline f Subscript t Baseline f Subscript t minus 1 Baseline gamma 0 gamma 1 right-parenthesis Superscript prime, state disturbance zeta zeta Subscript t Baseline equals left-parenthesis epsilon Subscript t Baseline 0 0 0 0 right-parenthesis Superscript prime, the system matrices upper Z equals left-bracket 1 1 0 0 0 right-bracket, upper Q equals normal upper D normal i normal a normal g left-bracket sigma Subscript epsilon Superscript 2 Baseline 0 0 0 0 right-bracket, and

upper T Subscript t Baseline equals Start 5 By 5 Matrix 1st Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 0 2nd Row 1st Column 0 2nd Column delta 1 3rd Column delta 2 4th Column x Subscript t plus 1 Baseline 5th Column x Subscript t Baseline 3rd Row 1st Column 0 2nd Column 1 3rd Column 0 4th Column 0 5th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 1 5th Column 0 5th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column 0 5th Column 1 EndMatrix

The initial state alpha alpha 1 is partially diffuse. The precise form of the initial state depends on the value of the TFSTART= option in the TF statement. If the TFSTART option is not specified, all elements of alpha alpha 1 except for the first element (epsilon 1) are treated as diffuse. On the other hand, if a value is specified in the TFSTART= option, the initial transfer function values (f 1 and f 0) in alpha alpha 1 are fixed at that specified value. In this formulation of the model, the numerator coefficients of the transfer-function relationship (gamma 0 and gamma 1) are part of the state. They are implicitly estimated during the state estimation (smoothing). On the other hand, the denominator coefficients (delta 1 and delta 2) and the noise variance (sigma Subscript epsilon Superscript 2) are estimated by maximizing the likelihood.

Reporting Parameter Estimates for Random Regressors

If the random walk disturbance variance that is associated with a random regressor is held fixed at 0, then its coefficient is no longer time-varying. In the UCM procedure, the random regressor parameter estimates are reported differently if the random walk disturbance variance that is associated with a random regressor is held fixed at 0. The following points explain how the parameter estimates are reported in the parameter estimates table and in the OUTEST= data set:

  • If the random walk disturbance variance that is associated with a random regressor is not held fixed, then its estimate is reported in the parameter estimates table and in the OUTEST= data set.

  • If more that one random regressor is specified in a RANDOMREG statement, then the first regressor in the list is used as a representative of the list when the corresponding common variance parameter estimate is reported.

  • If the random walk disturbance variance is held fixed at 0, then the parameter estimates table and the OUTEST= data set contain the corresponding regression parameter estimate rather than the variance parameter estimate.

  • Similar considerations apply in the case of the derived random regressors that are associated with a spline regressor.

Forecasting with Predictor Variables

If regression effects are included in the model (in a MODEL statement or in one or more of the RANDOMREG, SPLINEREG, and TF statements) and the FORECAST statement is used to compute multistep forecasts, then future values of the predictor variables must be included in the DATA= data set for the forecast horizon that is defined by the BACK= and LEAD= options in the FORECAST statement. For more information about how the forecast horizon is defined, see the FORECAST statement.

ARMA Irregular Component

The state space form for the irregular component that follows an ARMA(p,q)times(P,Q)Subscript s model is described in this section. The notation for ARMA models is explained in the IRREGULAR statement. A number of alternate state space forms are possible in this case; the one given here is based on Jones (1980). With slight abuse of notation, let p equals p plus s upper P denote the effective autoregressive order and q equals q plus s upper Q denote the effective moving average order of the model. Similarly, let phi be the effective autoregressive polynomial and theta be the effective moving average polynomial in the backshift operator with coefficients phi 1 comma ellipsis comma phi Subscript p Baseline and theta 1 comma ellipsis comma theta Subscript q Baseline, obtained by multiplying the respective nonseasonal and seasonal factors. Then, a random sequence epsilon Subscript t that follows an ARMA(p,q)times(P,Q)Subscript s model with a white noise sequence a Subscript t has a state space form with state vector of size m equals max left-parenthesis p comma q plus 1 right-parenthesis. The system matrices, which are time invariant, are as follows: upper Z equals left-bracket 1 0 ellipsis 0 right-bracket. The state transition matrix T, in a blocked form, is given by

upper T equals Start 2 By 2 Matrix 1st Row 1st Column 0 2nd Column upper I Subscript m minus 1 Baseline 2nd Row 1st Column phi Subscript m Baseline ellipsis 2nd Column phi 1 EndMatrix

where phi Subscript i Baseline equals 0 if i greater-than p and upper I Subscript m minus 1 is an left-parenthesis m minus 1 right-parenthesis dimensional identity matrix. The covariance of the state disturbance matrix upper Q equals sigma squared psi psi Superscript prime where sigma squared is the variance of the white noise sequence a Subscript t and the vector psi equals left-bracket psi 0 ellipsis psi Subscript m minus 1 Baseline right-bracket Superscript prime contains the first m values of the impulse response function—that is, the first m coefficients in the expansion of the ratio theta slash phi. Since epsilon Subscript t is a stationary sequence, the initial state is nondiffuse and upper P Subscript normal infinity Baseline equals 0. The description of upper P Subscript asterisk, the covariance matrix of the initial state, is a little involved; the details are given in Jones (1980).

Models with Dependent Lags

The state space form of a UCM consisting of the lags of the dependent variable is quite different from the state space forms considered so far. Let us consider an example to illustrate this situation. Consider a model that has random walk trend, two simple time-invariant regressors, and that also includes a few—for example, k—lags of the dependent variable. That is,

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals 3rd Column sigma-summation Underscript i equals 1 Overscript k Endscripts phi Subscript i Baseline y Subscript t minus i plus mu Subscript t Baseline plus beta 1 x Subscript 1 t plus beta 2 x Subscript 2 t plus epsilon Subscript t 2nd Row 1st Column mu Subscript t 2nd Column equals 3rd Column mu Subscript t minus 1 Baseline plus eta Subscript t EndLayout

The state space form of this augmented model can be described in terms of the state space form of a model that has random walk trend with two simple time-invariant regressors. A superscript dagger (dagger) has been added to distinguish the augmented model state space entities from the corresponding entities of the state space form of the random walk with predictors model. With this notation, the state vector of the augmented model alpha Subscript t Superscript dagger Baseline equals left-bracket alpha Subscript t Superscript prime Baseline y Subscript t Baseline y Subscript t minus 1 Baseline ellipsis y Subscript t minus k plus 1 Baseline right-bracket Superscript prime and the new state noise vector zeta Subscript t Superscript dagger Baseline equals left-bracket zeta Subscript t Superscript prime Baseline u Subscript t Baseline 0 ellipsis 0 right-bracket Superscript prime, where u Subscript t is the matrix product upper Z Subscript t Baseline zeta Subscript t. Note that the length of the new state vector is k plus normal l normal e normal n normal g normal t normal h left-parenthesis alpha Subscript t Baseline right-parenthesis equals k plus 4. The new system matrices, in block form, are

upper Z Subscript t Superscript dagger Baseline equals left-bracket 0 0 0 0 1 ellipsis 0 right-bracket comma upper T Subscript t Superscript dagger Baseline equals Start 3 By 4 Matrix 1st Row 1st Column upper T Subscript t Baseline 2nd Column 0 3rd Column ellipsis 4th Column 0 2nd Row 1st Column upper Z Subscript t plus 1 Baseline upper T Subscript t Baseline 2nd Column phi 1 3rd Column ellipsis 4th Column phi Subscript k Baseline 3rd Row 1st Column 0 2nd Column upper I Subscript k minus 1 comma k minus 1 Baseline 3rd Column Blank 4th Column 0 EndMatrix

where upper I Subscript k minus 1 comma k minus 1 is the k minus 1 dimensional identity matrix and

upper Q Subscript t Superscript dagger Baseline equals Start 3 By 3 Matrix 1st Row 1st Column upper Q Subscript t Baseline 2nd Column upper Q Subscript t Baseline upper Z Subscript t Superscript prime Baseline 3rd Column 0 2nd Row 1st Column upper Z Subscript t Baseline upper Q Subscript t Baseline 2nd Column upper Z Subscript t Baseline upper Q Subscript t Baseline upper Z Subscript t Superscript prime Baseline 3rd Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column 0 EndMatrix

Note that the T and Q matrices of the random walk with predictors model are time invariant, and in the expressions above their time indices are kept because they illustrate the pattern for more general models. The initial state vector is diffuse, with

upper P Subscript asterisk Superscript dagger Baseline equals Start 2 By 2 Matrix 1st Row 1st Column upper P Subscript asterisk Baseline 2nd Column 0 2nd Row 1st Column 0 2nd Column 0 EndMatrix comma upper P Subscript normal infinity Superscript dagger Baseline equals Start 2 By 2 Matrix 1st Row 1st Column upper P Subscript normal infinity Baseline 2nd Column 0 2nd Row 1st Column 0 2nd Column upper I Subscript k comma k Baseline EndMatrix

The parameters of this model are the disturbance variances sigma Subscript epsilon Superscript 2 and sigma Subscript eta Superscript 2, the lag coefficients phi 1 comma phi 2 comma ellipsis comma phi Subscript k Baseline, and the regression coefficients beta 1 and beta 2. As before, the regression coefficients get estimated during the state smoothing, and the other parameters are estimated by maximizing the likelihood.

Last updated: June 19, 2025