SSM Procedure

Likelihood Computation and Model-Fitting Phase

Because of the Gaussian nature of the response vector, the likelihood of bold upper Y can be computed by using the prediction-error decomposition. The desired prediction-error decomposition is obtained by the filtering pass described in the previous section (Filtering Pass). When the state space model under consideration has a nondiffuse initial condition and no regression effects are present in the observation and state equations, bold upper Y has a proper Gaussian distribution and its likelihood is defined unambiguously. Otherwise, the definition of the likelihood depends on the treatment of the diffuse quantities—delta delta, beta beta, and gamma gamma. Francke, Koopman, and de Vos (2010) describe three variants of the likelihood function—diffuse-likelihood, marginal-likelihood, and profile-likelihood—that are commonly considered for state space models that have a diffuse initial condition. In the SSM procedure, you can either use diffuse likelihood (bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis) or marginal likelihood (bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis) for parameter estimation. By default, diffuse likelihood is used for parameter estimation.

In terms of the quantities described in Table 5, diffuse, marginal, and profile likelihoods are defined as follows:

StartLayout 1st Row 1st Column minus 2 log bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis 2nd Column equals upper N 0 log 2 pi plus sigma-summation Underscript t equals 1 Overscript n Endscripts sigma-summation Underscript i equals 1 Overscript q asterisk p Subscript t Baseline Endscripts left-parenthesis log upper F Subscript t comma i Baseline plus StartFraction nu Subscript t comma i Superscript 2 Baseline Over upper F Subscript t comma i Baseline EndFraction right-parenthesis minus bold b Subscript n comma p Sub Subscript n Subscript Superscript prime Baseline bold upper S Subscript n comma p Sub Subscript n Subscript Superscript negative 1 Baseline bold b Subscript n comma p Sub Subscript n Subscript Baseline plus log left-parenthesis StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Baseline EndAbsoluteValue right-parenthesis 2nd Row 1st Column minus 2 log bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis 2nd Column equals upper N 0 log 2 pi plus sigma-summation Underscript t equals 1 Overscript n Endscripts sigma-summation Underscript i equals 1 Overscript q asterisk p Subscript t Baseline Endscripts left-parenthesis log upper F Subscript t comma i Baseline plus StartFraction nu Subscript t comma i Superscript 2 Baseline Over upper F Subscript t comma i Baseline EndFraction right-parenthesis minus bold b Subscript n comma p Sub Subscript n Subscript Superscript prime Baseline bold upper S Subscript n comma p Sub Subscript n Subscript Superscript negative 1 Baseline bold b Subscript n comma p Sub Subscript n Subscript Baseline plus log left-parenthesis StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Baseline EndAbsoluteValue right-parenthesis 3rd Row 1st Column Blank 2nd Column minus log left-parenthesis StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk Baseline EndAbsoluteValue right-parenthesis 4th Row 1st Column minus 2 log bold upper L Subscript p Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis 2nd Column equals upper N log 2 pi plus sigma-summation Underscript t equals 1 Overscript n Endscripts sigma-summation Underscript i equals 1 Overscript q asterisk p Subscript t Baseline Endscripts left-parenthesis log upper F Subscript t comma i Baseline plus StartFraction nu Subscript t comma i Superscript 2 Baseline Over upper F Subscript t comma i Baseline EndFraction right-parenthesis minus bold b Subscript n comma p Sub Subscript n Subscript Superscript prime Baseline bold upper S Subscript n comma p Sub Subscript n Subscript Superscript negative 1 Baseline bold b Subscript n comma p Sub Subscript n Subscript EndLayout

In the preceding formulas, the terms that are associated with the missing response values y Subscript t comma i are excluded and normal upper N denotes the total number of nonmissing response values in the sample. In addition, upper N 0 equals left-parenthesis upper N minus k minus g minus d right-parenthesis; StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Baseline EndAbsoluteValue and StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk Baseline EndAbsoluteValue denote the determinants of bold upper S Subscript n comma p Sub Subscript n Subscript and bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk, respectively; and bold b Subscript n comma p Sub Subscript n Subscript Superscript prime denotes the transpose of the column vector bold b Subscript n comma p Sub Subscript n Subscript. If bold upper S Subscript n comma p Sub Subscript n Subscript is not invertible, then a generalized inverse is used in place of bold upper S Subscript n comma p Sub Subscript n Subscript Superscript negative 1, and StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Baseline EndAbsoluteValue and StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk Baseline EndAbsoluteValue are computed based on the nonzero eigenvalues of bold upper S Subscript n comma p Sub Subscript n Subscript and bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk, respectively. Moreover, in this case, upper N 0 equals upper N minus normal upper R normal a normal n normal k left-parenthesis bold upper S Subscript n comma p Sub Subscript n Subscript Baseline right-parenthesis. When left-parenthesis d plus k plus g right-parenthesis equals 0, the terms that involve bold upper S Subscript n comma p Sub Subscript n Subscript, bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk, and bold b Subscript n comma p Sub Subscript n Subscript are absent.

The expression for marginal likelihood is derived by treating the diffuse quantities as fixed but unknown parameters. The expression can be shown to be based on a linear transformation of the normal upper N-dimensional response vector bold upper Y. For a suitably chosen upper N times upper N matrix bold upper H, let bold upper U equals bold upper H bold upper Y. bold upper H is chosen such that the normal upper N-dimensional transformed vector bold upper U partitions into two uncorrelated (and independent because of their Gaussian nature) subvectors: an upper N 0-dimensional vector bold upper U 1 and a left-parenthesis d plus k plus g right-parenthesis-dimensional vector bold upper U 2. Furthermore, the distribution of bold upper U 1 does not depend on the diffuse vectors (delta delta, beta beta, and gamma gamma), and bold upper U 2 stores the generalized least squares estimates of the diffuse vectors: bold upper U 2 Superscript prime Baseline equals left-parenthesis ModifyingAbove delta delta With caret ModifyingAbove beta beta With caret ModifyingAbove gamma gamma With caret right-parenthesis Superscript prime. It turns out that the upper N 0-dimensional vector, bold upper U 1, has a proper Gaussian distribution and the marginal likelihood, log bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis, is the proper likelihood of bold upper U 1. The diffuse likelihood, log bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis, is also based on bold upper U 1. However, rather than assuming the diffuse quantities as unknown parameters, the expression of the diffuse likelihood is derived by assuming that delta delta, beta beta, and gamma gamma are random vectors with diffuse priors. Even though the marginal and diffuse likelihoods are based on different interpretations of the diffuse quantities, their expressions differ by only one term: the diffuse likelihood does not have the term minus log left-parenthesis StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk Baseline EndAbsoluteValue right-parenthesis. Since the marginal likelihood is the proper likelihood of bold upper U 1, the diffuse likelihood can be interpreted as a quasi-likelihood of bold upper U 1. Apart from being essential to make it a proper likelihood, the extra term in the marginal likelihood plays another useful role: it makes the marginal likelihood invariant to linear rescaling of the diffuse effects, a desirable property in a likelihood. The diffuse likelihood is not invariant to linear rescaling of the diffuse effects. The profile likelihood, log bold upper L Subscript p Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis, is the likelihood of the response vector bold upper Y evaluated at the generalized least squares estimates of the diffuse vectors: left-parenthesis delta delta beta beta gamma gamma right-parenthesis Superscript prime Baseline equals left-parenthesis ModifyingAbove delta delta With caret ModifyingAbove beta beta With caret ModifyingAbove gamma gamma With caret right-parenthesis Superscript prime. It is derived by treating the diffuse quantities as fixed but unknown parameters and, like the marginal likelihood, is invariant to linear rescaling of the diffuse effects. For an illustration of this invariance property, see Example 33.18.

In the literature, the marginal likelihood (in addition to the diffuse likelihood) is also called the restricted-likelihood, and the estimate of the parameter vector theta theta that is obtained by maximizing log bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis (or log bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis) is called the restricted maximum likelihood estimate (REML). In this section, the REML estimate that is based on marginal likelihood is denoted by REML_M and the REML estimate that is based on diffuse likelihood is denoted by REML_D. In addition, the estimate of theta theta that is obtained by maximizing the profile likelihood is called the maximum likelihood estimate (ML). In the absence of the diffuse quantities, all three likelihoods are the same and the REML_M, REML_D, and ML estimates coincide. When diffuse quantities are present, there is some evidence to prefer REML_M, the estimate of theta theta that is based on the marginal likelihood. For more information, see Francke, Koopman, and de Vos (2010) and the references therein. By default, the SSM procedure uses diffuse likelihood for parameter estimation. You can switch to marginal likelihood by using the LIKE=MARGINAL option in the PROC SSM statement. In the current release of the SSM procedure, you cannot request parameter estimation by using profile likelihood.

Interestingly, for many types of state space models, REML_M and REML_D coincide even when diffuse effects are present. This is because for these models the extra term in the marginal likelihood, minus log left-parenthesis StartAbsoluteValue bold upper S Subscript n comma p Sub Subscript n Subscript Superscript asterisk Baseline EndAbsoluteValue right-parenthesis, turns out to be independent of the parameter vector theta theta. Specifically, for models that are specified by using the PROC SSM syntax, REML_M and REML_D differ only if at least one of the following conditions hold:

  • The transition matrix (bold upper T Subscript t) that is implied by a STATE statement depends on at least one unknown parameter and the diffuse dimension of the associated state subsection is nonzero.

  • The list of variables that is specified in a COMPONENT statement depends on at least one unknown parameter and the diffuse dimension of the associated state subsection is nonzero.

  • At least one lag term in a DEPLAG statement depends on an unknown parameter.

  • A TREND statement with GROWTH or GROWTH(OU) type is present and the growth parameter, phi, is unknown.

In particular, if the parameter vector affects only the disturbance covariances (upper Q Subscript t) in the state equation and the error variances (sigma Subscript t comma i Superscript 2) in the observation equation (see Table 4), REML_M and REML_D coincide. These observations also imply that REML_M and REML_D coincide for the most commonly used univariate and multivariate unobserved component models and for ARIMA models, with or without regression effects.

Note: For many examples in the section Examples: SSM Procedure, one of the preceding conditions does hold and the REML_M and REML_D estimates do differ. However, in all these cases, it turns out that the differences in REML_M and REML_D are not large enough to change the overall conclusions of the analysis. As verification, you can rerun the analyses that are described in Example 33.10, Example 33.13, and Example 33.14 by using the LIKE=MARGINAL option in the PROC SSM statement. Of course, this will not be true in general.

The REML_D and REML_M estimates of the unknown parameter vector theta theta (each denoted as ModifyingAbove theta theta With caret), are computed by maximizing the diffuse (or marginal) likelihood. This is done by using a nonlinear optimization process that involves repeated evaluations of bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis (or bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis) at different values of theta theta. Approximate standard errors of ModifyingAbove theta theta With caret are computed by taking the square root of the diagonal elements of its (approximate) covariance matrix. This covariance is computed as minus bold upper H Superscript negative 1, where bold upper H is the Hessian (the matrix of the second-order partials) of log bold upper L Subscript d Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis (or log bold upper L Subscript m Baseline left-parenthesis bold upper Y comma theta theta right-parenthesis) evaluated at the optimum ModifyingAbove theta theta With caret. It is known that under mild regularity assumptions (as the number of distinct time points tends toward infinity), ModifyingAbove theta theta With caret is consistent and efficient. For good discussions about REML_D, REML_M, and ML estimates, see Searle, Casella, and McCulloch (1992); Laird (2004); Francke, Koopman, and de Vos (2010).

If the marginal likelihood is used for parameter estimation, the SSM procedure reports the values of all three likelihoods at the parameter estimate ModifyingAbove theta theta With caret. Otherwise, PROC SSM reports the values of the diffuse and profile likelihoods that are calculated at the parameter estimate ModifyingAbove theta theta With caret. Let normal d normal i normal m left-parenthesis theta right-parenthesis denote the dimension of the parameter vector theta theta. After PROC SSM completes the parameter estimation, it prints the "Likelihood Computation Summary" table, which summarizes the likelihood calculations at ModifyingAbove theta theta With caret, as shown in Table 6.

Table 6: Likelihood Computation Summary

Quantity Formula
Nonmissing response values used normal upper N
Estimated parameters normal d normal i normal m left-parenthesis theta right-parenthesis
Initialized diffuse state elements normal r normal a normal n normal k left-parenthesis bold upper S Subscript n comma p Sub Subscript n Subscript Baseline right-parenthesis
Normalized residual sum of squares sigma-summation Underscript t equals 1 Overscript n Endscripts sigma-summation Underscript i equals 1 Overscript q asterisk p Subscript t Endscripts left-parenthesis StartFraction nu Subscript t comma i Superscript 2 Baseline Over upper F Subscript t comma i Baseline EndFraction right-parenthesis minus bold b Subscript n comma p Sub Subscript n Subscript Superscript prime Baseline bold upper S Subscript n comma p Sub Subscript n Subscript Superscript negative 1 Baseline bold b Subscript n comma p Sub Subscript n Subscript
Diffuse log likelihood log bold upper L Subscript d Baseline left-parenthesis bold upper Y comma ModifyingAbove theta theta With caret right-parenthesis
Marginal log likelihood log bold upper L Subscript m Baseline left-parenthesis bold upper Y comma ModifyingAbove theta theta With caret right-parenthesis
Profile log likelihood log bold upper L Subscript p Baseline left-parenthesis bold upper Y comma ModifyingAbove theta theta With caret right-parenthesis


In addition to the likelihood computation summary, PROC SSM also reports the information criteria that are based on the diffuse and profile likelihoods. It also reports the information criteria that are based on the marginal likelihood if marginal likelihood is used for parameter estimation. A variety of information criteria are reported. All these criteria are functions of twice the negative likelihood (minus 2 log bold upper L, where the likelihood can be diffuse, marginal, or profile), upper N Subscript asterisk (the effective sample size), and n p a r m (the effective number of model parameters). For information criteria that are based on the diffuse and marginal likelihoods, the effective sample size, upper N Subscript asterisk, is equal to upper N 0 and the effective number of model parameters, n p a r m, is equal to normal d normal i normal m left-parenthesis theta right-parenthesis. For information criteria that are based on the profile likelihood, the effective sample size, upper N Subscript asterisk, is equal to upper N and the effective number of model parameters, n p a r m, is equal to normal d normal i normal m left-parenthesis theta right-parenthesis plus d plus k plus g. Table 7 summarizes the reported information criteria in smaller-is-better form.

Table 7: Information Criteria

Criterion Formula Reference
AIC minus 2 log bold upper L plus 2 n p a r m Akaike (1974)
AICC minus 2 log bold upper L plus 2 n p a r m upper N Subscript asterisk slash left-parenthesis upper N Subscript asterisk Baseline minus n p a r m minus 1 right-parenthesis Hurvich and Tsai (1989)
Burnham and Anderson (1998)
HQIC minus 2 log bold upper L plus 2 n p a r m log log left-parenthesis upper N Subscript asterisk Baseline right-parenthesis Hannan and Quinn (1979)
BIC minus 2 log bold upper L plus n p a r m log left-parenthesis upper N Subscript asterisk Baseline right-parenthesis Schwarz (1978)
CAIC minus 2 log bold upper L plus n p a r m left-parenthesis log left-parenthesis upper N Subscript asterisk Baseline right-parenthesis plus 1 right-parenthesis Bozdogan (1987)


Last updated: June 19, 2025