SSM Procedure

Temporal Distribution

For the sake of simplicity, consider a simple case of distributing weekly observations of a flow variable, y, at a daily interval. Even though the values of y are observed weekly (suppose they are recorded each Sunday), in this case it is necessary to treat the observations y Subscript t Baseline comma t greater-than-or-equal-to 1 comma as a daily time series such that y Subscript t equals the weekly total when t corresponds to the end of the week (Sunday), and y Subscript t is missing on other days of the week. In addition, suppose that y Subscript t Superscript dagger denotes the unobserved time series of daily values of y. In other words, if t corresponds to a Sunday, then

y Subscript t Baseline equals sigma-summation Underscript s equals t minus 6 Overscript t Endscripts y Subscript s Superscript dagger

Suppose that the unobserved daily series y Subscript t Superscript dagger can be modeled by a state space model. For example, suppose the model for y Subscript t Superscript dagger is

StartLayout 1st Row 1st Column y Subscript t Superscript dagger 2nd Column equals 3rd Column bold upper Z Subscript t Baseline alpha alpha Subscript t plus epsilon Subscript t 2nd Row 1st Column alpha alpha Subscript t plus 1 2nd Column equals 3rd Column bold upper T Subscript t Baseline alpha alpha Subscript t plus eta eta Subscript t plus 1 EndLayout

Then it is easy to see that the aggregated series y Subscript t follows a state space model of the form

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals 3rd Column bold upper Z Subscript t Superscript dagger Baseline alpha alpha Subscript t Superscript dagger 2nd Row 1st Column alpha alpha Subscript t plus 1 Superscript dagger 2nd Column equals 3rd Column bold upper T Subscript t Superscript dagger Baseline alpha alpha Subscript t Superscript dagger plus eta eta Subscript t plus 1 Superscript dagger EndLayout

where the following are true (both the row and column vectors are displayed horizontally to save space):

  • The new state vector (alpha alpha Subscript t Superscript dagger) is formed by augmenting the old state vector (alpha alpha Subscript t) with a latent variable, y Subscript t Superscript f. That is, alpha alpha Subscript t Superscript dagger Baseline equals left-bracket alpha alpha Subscript t Baseline y Subscript t Superscript f Baseline right-bracket. In fact, y Subscript t Superscript f represents the within-week running total of y Subscript t Superscript dagger, so that when t corresponds to a Sunday, y Subscript t Superscript f Baseline equals y Subscript t.

  • The new transition matrix bold upper T Subscript t Superscript dagger is

    bold upper T Subscript t Superscript dagger Baseline equals Start 2 By 2 Matrix 1st Row 1st Column bold upper T Subscript t Baseline 2nd Column bold 0 2nd Row 1st Column bold upper Z Subscript t plus 1 Baseline bold upper T Subscript t Baseline 2nd Column psi Subscript t plus 1 EndMatrix

    where psi Subscript t is a dummy variable that equals 1 when t is not the start of the week (not Monday) and equals 0 when t is the start of the week (Monday).

  • The new disturbance vector (eta eta Subscript t Superscript dagger) is formed by augmenting the old disturbance vector (eta eta Subscript t) by left-parenthesis bold upper Z Subscript t Baseline eta eta Subscript t Baseline plus epsilon Subscript t Baseline right-parenthesis. That is, eta eta Subscript t Superscript dagger Baseline equals left-bracket eta eta Subscript t Baseline bold upper Z Subscript t Baseline eta eta Subscript t Baseline plus epsilon Subscript t Baseline right-bracket.

  • The new design matrix for the state effect (bold upper Z Subscript t Superscript dagger) is bold upper Z Subscript t Superscript dagger Baseline equals left-bracket bold 0 1 right-bracket, where bold 0 is a zero vector of the same size as the old state vector alpha alpha Subscript t.

This shows that you can do model-based distribution of y values by carrying out the following steps:

  1. Organize the y values as a daily time series.

  2. Define a dummy variable, startWeek, that flags the start of the week—that is, startWeek is 1 when the day is Monday and 0 otherwise. Note that psi Subscript t Baseline equals 1 minus startWeek Subscript t.

  3. Specify a suitable state space model for the unobserved daily series y Subscript t Superscript dagger. This specification in turn implies a state space model specification for y.

  4. Carry out the analysis—model fitting, component estimation, and forecasting—of y in the usual fashion by using this implied model specification.

  5. The smoothed values of y from the previous step provide the estimates of y Subscript t Superscript f. In addition, the estimates of y Subscript t Superscript dagger can be obtained as the smoothed estimates of appropriate linear combination of the elements of alpha alpha Subscript t and epsilon Subscript t.

The SSM procedure enables you to carry out the key steps—Step 3 to Step 5—in this process quite easily. The usual model specification syntax that uses the STATE, COMPONENT, and TREND statements to define the terms in a MODEL statement is used to define a model for the unobserved daily series y Subscript t Superscript dagger (the first part of Step 3). Then, the use of the DISTRIBUTE(START=startWeek) option in the MODEL statement causes the SSM procedure to use the implied model to analyze the observed y values. As a brief illustration, suppose that a data set Test contains two variables: date, a SAS date variable that indexes the daily observations, and y, the values of the weekly variable arranged as a daily series. Then the following PROC SSM statements show you how to distribute y at the daily interval:

proc ssm data=test;
    id date interval=day;
    startWeek = (weekday(date) = 2); /* indicator of Monday */
    state ...;
    comp term1 = ...;
    ...;
    state noise(1) type=wn ...;
    comp wnoise = noise[1];
    model y = term1 term2 ... wnoise / distribute(start=startWeek);
    /* daily_Y = sum of all terms in the MODEL statement */
    eval daily_Y = term1 + term2 + ...+ wnoise;
    output out=...;
 run;

Here are a few comments about this program:

  • The terms in the MODEL statement correspond to the observation equation for the unobserved daily series y Subscript t Superscript dagger. However, the DISTRIBUTE(START=startWeek) option causes the SSM procedure to use the implied model (with the augmented state vector) to analyze y—the weekly variable arranged as a daily series.

  • Because wnoise—the white noise term (epsilon Subscript t) in the observation equation of y Subscript t Superscript dagger—is subsequently to be used in an EVAL statement, this program specifies it by using the STATE statement rather than by using the IRREGULAR statement.

  • Because daily_Y (the component specified in the EVAL statement) is the sum of all the terms in the MODEL statement, it corresponds to the unobserved daily series y Subscript t Superscript dagger. Therefore, the smoothed estimate of daily_Y (smoothed_daily_Y) provides the needed distribution of y at the daily interval.

In this release of the SSM procedure, the last element of the augmented state vector, y Superscript f, is always initialized with diffuse distribution. A more flexible specification of the initial distribution of y Superscript f might become possible in a future release.

To keep the explanation simple, the preceding discussion was confined to a single response variable. In fact, you can use the SSM procedure for temporal distribution in more general settings—for example, you can consider temporal distribution of one or more flow variables in a multivariate model that includes one or more response variables of stock type, one or more response variables of flow type, and one or more explanatory variables. An illustration of such modeling is shown in Example 33.16. The modeling of a response variable as a temporal aggregate of some unobserved latent variable is also needed in a process known as benchmarking; see Durbin and Koopman (2012, chap. 3, sec. 10.2) and Pelagatti (2015, chap. 9, sec. 2). You can use the SSM procedure in such benchmarking situations as well.

Note: Model specification in the temporal distribution setting requires some care. The model for the lower-frequency observed series, y Subscript t, is based on the model specified for the unobserved higher-frequency series y Subscript t Superscript dagger. Because aggregation from higher frequency to lower frequency involves loss of information, an otherwise identifiable model for y Subscript t Superscript dagger can lead to unidentifiable model for y Subscript t.

Last updated: June 19, 2025