Because of the Gaussian nature of the response vector, the likelihood of can be computed by using the prediction-error decomposition. The desired prediction-error decomposition is obtained by the filtering pass described in the previous section (Filtering Pass). When the state space model under consideration has a nondiffuse initial condition and no regression effects are present in the observation and state equations,
has a proper Gaussian distribution and its likelihood is defined unambiguously. Otherwise, the definition of the likelihood depends on the treatment of the diffuse quantities—
,
, and
. Francke, Koopman, and de Vos (2010) describe three variants of the likelihood function—diffuse-likelihood, marginal-likelihood, and profile-likelihood—that are commonly considered for state space models that have a diffuse initial condition. In the SSM procedure, you can either use diffuse likelihood (
) or marginal likelihood (
) for parameter estimation. By default, diffuse likelihood is used for parameter estimation.
In terms of the quantities described in Table 5, diffuse, marginal, and profile likelihoods are defined as follows:
In the preceding formulas, the terms that are associated with the missing response values are excluded and
denotes the total number of nonmissing response values in the sample. In addition,
;
and
denote the determinants of
and
, respectively; and
denotes the transpose of the column vector
. If
is not invertible, then a generalized inverse is used in place of
, and
and
are computed based on the nonzero eigenvalues of
and
, respectively. Moreover, in this case,
. When
, the terms that involve
,
, and
are absent.
The expression for marginal likelihood is derived by treating the diffuse quantities as fixed but unknown parameters. The expression can be shown to be based on a linear transformation of the -dimensional response vector
. For a suitably chosen
matrix
, let
.
is chosen such that the
-dimensional transformed vector
partitions into two uncorrelated (and independent because of their Gaussian nature) subvectors: an
-dimensional vector
and a
-dimensional vector
. Furthermore, the distribution of
does not depend on the diffuse vectors (
,
, and
), and
stores the generalized least squares estimates of the diffuse vectors:
. It turns out that the
-dimensional vector,
, has a proper Gaussian distribution and the marginal likelihood,
, is the proper likelihood of
. The diffuse likelihood,
, is also based on
. However, rather than assuming the diffuse quantities as unknown parameters, the expression of the diffuse likelihood is derived by assuming that
,
, and
are random vectors with diffuse priors. Even though the marginal and diffuse likelihoods are based on different interpretations of the diffuse quantities, their expressions differ by only one term: the diffuse likelihood does not have the term
. Since the marginal likelihood is the proper likelihood of
, the diffuse likelihood can be interpreted as a quasi-likelihood of
. Apart from being essential to make it a proper likelihood, the extra term in the marginal likelihood plays another useful role: it makes the marginal likelihood invariant to linear rescaling of the diffuse effects, a desirable property in a likelihood. The diffuse likelihood is not invariant to linear rescaling of the diffuse effects. The profile likelihood,
, is the likelihood of the response vector
evaluated at the generalized least squares estimates of the diffuse vectors:
. It is derived by treating the diffuse quantities as fixed but unknown parameters and, like the marginal likelihood, is invariant to linear rescaling of the diffuse effects. For an illustration of this invariance property, see Example 33.18.
In the literature, the marginal likelihood (in addition to the diffuse likelihood) is also called the restricted-likelihood, and the estimate of the parameter vector that is obtained by maximizing
(or
) is called the restricted maximum likelihood estimate (REML). In this section, the REML estimate that is based on marginal likelihood is denoted by REML_M and the REML estimate that is based on diffuse likelihood is denoted by REML_D. In addition, the estimate of
that is obtained by maximizing the profile likelihood is called the maximum likelihood estimate (ML). In the absence of the diffuse quantities, all three likelihoods are the same and the REML_M, REML_D, and ML estimates coincide. When diffuse quantities are present, there is some evidence to prefer REML_M, the estimate of
that is based on the marginal likelihood. For more information, see Francke, Koopman, and de Vos (2010) and the references therein. By default, the SSM procedure uses diffuse likelihood for parameter estimation. You can switch to marginal likelihood by using the LIKE=MARGINAL option in the PROC SSM statement. In the current release of the SSM procedure, you cannot request parameter estimation by using profile likelihood.
Interestingly, for many types of state space models, REML_M and REML_D coincide even when diffuse effects are present. This is because for these models the extra term in the marginal likelihood, , turns out to be independent of the parameter vector
. Specifically, for models that are specified by using the PROC SSM syntax, REML_M and REML_D differ only if at least one of the following conditions hold:
The transition matrix () that is implied by a STATE statement depends on at least one unknown parameter and the diffuse dimension of the associated state subsection is nonzero.
The list of variables that is specified in a COMPONENT statement depends on at least one unknown parameter and the diffuse dimension of the associated state subsection is nonzero.
At least one lag term in a DEPLAG statement depends on an unknown parameter.
A TREND statement with GROWTH or GROWTH(OU) type is present and the growth parameter, , is unknown.
In particular, if the parameter vector affects only the disturbance covariances () in the state equation and the error variances (
) in the observation equation (see Table 4), REML_M and REML_D coincide. These observations also imply that REML_M and REML_D coincide for the most commonly used univariate and multivariate unobserved component models and for ARIMA models, with or without regression effects.
Note: For many examples in the section Examples: SSM Procedure, one of the preceding conditions does hold and the REML_M and REML_D estimates do differ. However, in all these cases, it turns out that the differences in REML_M and REML_D are not large enough to change the overall conclusions of the analysis. As verification, you can rerun the analyses that are described in Example 33.10, Example 33.13, and Example 33.14 by using the LIKE=MARGINAL option in the PROC SSM statement. Of course, this will not be true in general.
The REML_D and REML_M estimates of the unknown parameter vector (each denoted as
), are computed by maximizing the diffuse (or marginal) likelihood. This is done by using a nonlinear optimization process that involves repeated evaluations of
(or
) at different values of
. Approximate standard errors of
are computed by taking the square root of the diagonal elements of its (approximate) covariance matrix. This covariance is computed as
, where
is the Hessian (the matrix of the second-order partials) of
(or
) evaluated at the optimum
. It is known that under mild regularity assumptions (as the number of distinct time points tends toward infinity),
is consistent and efficient. For good discussions about REML_D, REML_M, and ML estimates, see Searle, Casella, and McCulloch (1992); Laird (2004); Francke, Koopman, and de Vos (2010).
If the marginal likelihood is used for parameter estimation, the SSM procedure reports the values of all three likelihoods at the parameter estimate . Otherwise, PROC SSM reports the values of the diffuse and profile likelihoods that are calculated at the parameter estimate
. Let
denote the dimension of the parameter vector
. After PROC SSM completes the parameter estimation, it prints the "Likelihood Computation Summary" table, which summarizes the likelihood calculations at
, as shown in Table 6.
Table 6: Likelihood Computation Summary
In addition to the likelihood computation summary, PROC SSM also reports the information criteria that are based on the diffuse and profile likelihoods. It also reports the information criteria that are based on the marginal likelihood if marginal likelihood is used for parameter estimation. A variety of information criteria are reported. All these criteria are functions of twice the negative likelihood (, where the likelihood can be diffuse, marginal, or profile),
(the effective sample size), and
(the effective number of model parameters). For information criteria that are based on the diffuse and marginal likelihoods, the effective sample size,
, is equal to
and the effective number of model parameters,
, is equal to
. For information criteria that are based on the profile likelihood, the effective sample size,
, is equal to
and the effective number of model parameters,
, is equal to
. Table 7 summarizes the reported information criteria in smaller-is-better form.
Table 7: Information Criteria