(View the complete code for this example.)
Consider the following alternate but equivalent specifications of a trend-plus-seasonal model (monthly seasonality):
Here the trend (, a random walk with drift) and the irregular component (
, white noise) are the same in both the specifications. However, the seasonal component is specified differently: in Spec1 the seasonality is modeled as a deterministic trigonometric seasonal component (
) whereas in Spec2 it is modeled using the seasonal dummies (
). Spec1 and Spec2 are statistically equivalent models from the perspective of the data generation process. This example uses these two specifications to demonstrate a useful invariance property of the marginal and profile likelihoods, which is described in the section Likelihood Computation and Model-Fitting Phase. The airline passenger series, given as Series G in Box and Jenkins (1976), is used to illustrate the computations. The following DATA step prepares the log-transformed passenger series and the seasonal dummies that are needed for this example:
data seriesG;
set sashelp.air;
logair = log(air);
array m{11} m1-m11;
do i=1 to 11;
m[i] = (month(date)=i);
end;
run;
The following statements fit the two models to the log-transformed passenger series. The first PROC SSM call fits Spec1, and the second call fits Spec2.
proc ssm data=seriesG plots=none like=marginal;
id date interval=month;
trend rwDrift(ll) slopevar=0;
irregular wn;
state trigState(1) type=season(length=12);
comp season = trigState[1];
model logair = rwDrift season wn;
run;
proc ssm data=seriesG plots=none like=marginal;
id date interval=month;
trend rwDrift(ll) slopevar=0;
irregular wn;
model logair = rwDrift m1-m11 wn;
run;
For these two models, the parameter estimates that are based on the diffuse likelihood (REML_D) and the marginal likelihood ((REML_M) coincide because the extra term in the marginal likelihood () turns out to be independent of these parameters. Nevertheless, it is useful to use the LIKE=MARGINAL option in the PROC SSM statement so that both the likelihood computation summary and the information criteria tables display the likelihood values and the information criteria for all three likelihoods—diffuse, marginal, and profile—at the estimated parameters. The parameter estimates for Spec1 and Spec2 are displayed in Output 33.18.1 and Output 33.18.2, respectively. As expected, the parameter estimates for the two specifications are the same because they are statistically equivalent models. The other aspects of the fit (such as model-based forecasts), which are not shown, also agree.
Output 33.18.1: Parameter Estimates For Spec1
| Model Parameter Estimates | |||||
|---|---|---|---|---|---|
| Component | Type | Parameter | Estimate | Standard Error |
t Value |
| rwDrift | LL Trend | Level Variance | 0.000766 | 0.000219 | 3.49 |
| wn | Irregular | Variance | 0.000368 | 0.000141 | 2.60 |
Output 33.18.2: Parameter Estimates For Spec2
| Model Parameter Estimates | |||||
|---|---|---|---|---|---|
| Component | Type | Parameter | Estimate | Standard Error |
t Value |
| rwDrift | LL Trend | Level Variance | 0.000766 | 0.000219 | 3.49 |
| wn | Irregular | Variance | 0.000368 | 0.000141 | 2.60 |
The fit summary tables shown in Output 33.18.3 (for Spec1) and Output 33.18.4 (for Spec2) show that the marginal and profile likelihoods (the last two lines in each table) for the two specifications also agree. However, you can see that the diffuse likelihood value for the two specifications differ (diffuse likelihood = 215.45 for Spec1 and diffuse likelihood = 226.89 for Spec2). This difference occurs because the diffuse likelihood is not invariant to the different (but equivalent) formulations of the seasonal effects. This also means that the information criteria that are based on the marginal and profile likelihoods, which are shown in Output 33.18.5 (for Spec1) and Output 33.18.6 (for Spec2), correctly conclude that the two specifications cannot be distinguished on the basis of these criteria, whereas the information criteria that are based on the diffuse likelihood erroneously suggest that Spec1 is inferior to Spec2.
Output 33.18.3: Likelihood Computation Summary For Spec1
| Likelihood Computation Summary | |
|---|---|
| Statistic | Value |
| Nonmissing Response Values Used | 144 |
| Estimated Parameters | 2 |
| Initialized Diffuse State Elements | 13 |
| Normalized Residual Sum of Squares | 131 |
| Diffuse Log Likelihood | 215.4522 |
| Profile Log Likelihood | 265.63882 |
| Marginal Log Likelihood | 248.01412 |
Output 33.18.4: Likelihood Computation Summary For Spec2
| Likelihood Computation Summary | |
|---|---|
| Statistic | Value |
| Nonmissing Response Values Used | 144 |
| Estimated Parameters | 2 |
| Initialized Diffuse State Elements | 13 |
| Normalized Residual Sum of Squares | 131 |
| Diffuse Log Likelihood | 226.8959 |
| Profile Log Likelihood | 265.63882 |
| Marginal Log Likelihood | 248.01412 |
Output 33.18.5: Information Criteria For Spec1
| Information Criteria | |||
|---|---|---|---|
| Statistic | Diffuse Likelihood Based |
Profile Likelihood Based |
Marginal Likelihood Based |
| AIC (lower is better) | -426.9044 | -501.2776 | -492.0282 |
| BIC (lower is better) | -421.1540 | -456.7304 | -486.2779 |
| AICC (lower is better) | -426.8106 | -497.5276 | -491.9345 |
| HQIC (lower is better) | -424.5678 | -483.1762 | -489.6916 |
| CAIC (lower is better) | -419.1540 | -441.7304 | -484.2779 |
Output 33.18.6: Information Criteria For Spec2
| Information Criteria | |||
|---|---|---|---|
| Statistic | Diffuse Likelihood Based |
Profile Likelihood Based |
Marginal Likelihood Based |
| AIC (lower is better) | -449.7918 | -501.2776 | -492.0282 |
| BIC (lower is better) | -444.0414 | -456.7304 | -486.2779 |
| AICC (lower is better) | -449.6981 | -497.5276 | -491.9345 |
| HQIC (lower is better) | -447.4552 | -483.1762 | -489.6916 |
| CAIC (lower is better) | -442.0414 | -441.7304 | -484.2779 |
This example highlights the care that must be taken while doing model selection based on information criteria. It suggests that information criteria that are based on the marginal and profile likelihoods are preferred over the information criteria that are based on diffuse likelihood.