For the sake of simplicity, consider a simple case of distributing weekly observations of a flow variable, y, at a daily interval. Even though the values of y are observed weekly (suppose they are recorded each Sunday), in this case it is necessary to treat the observations as a daily time series such that
equals the weekly total when t corresponds to the end of the week (Sunday), and
is missing on other days of the week. In addition, suppose that
denotes the unobserved time series of daily values of
y. In other words, if t corresponds to a Sunday, then
Suppose that the unobserved daily series can be modeled by a state space model. For example, suppose the model for
is
Then it is easy to see that the aggregated series follows a state space model of the form
where the following are true (both the row and column vectors are displayed horizontally to save space):
The new state vector () is formed by augmenting the old state vector (
) with a latent variable,
. That is,
. In fact,
represents the within-week running total of
, so that when t corresponds to a Sunday,
.
where is a dummy variable that equals 1 when t is not the start of the week (not Monday) and equals 0 when t is the start of the week (Monday).
The new disturbance vector () is formed by augmenting the old disturbance vector (
) by
. That is,
.
The new design matrix for the state effect () is
, where
is a zero vector of the same size as the old state vector
.
This shows that you can do model-based distribution of y values by carrying out the following steps:
Organize the y values as a daily time series.
Define a dummy variable, startWeek, that flags the start of the week—that is, startWeek is 1 when the day is Monday and 0 otherwise. Note that .
Specify a suitable state space model for the unobserved daily series . This specification in turn implies a state space model specification for
y.
Carry out the analysis—model fitting, component estimation, and forecasting—of y in the usual fashion by using this implied model specification.
The smoothed values of y from the previous step provide the estimates of . In addition, the estimates of
can be obtained as the smoothed estimates of appropriate linear combination of the elements of
and
.
The SSM procedure enables you to carry out the key steps—Step 3 to Step 5—in this process quite easily. The usual model specification syntax that uses the STATE, COMPONENT, and TREND statements to define the terms in a MODEL statement is used to define a model for the unobserved daily series (the first part of Step 3). Then, the use of the DISTRIBUTE(START=startWeek) option in the MODEL statement causes the SSM procedure to use the implied model to analyze the observed
y values. As a brief illustration, suppose that a data set Test contains two variables: date, a SAS date variable that indexes the daily observations, and y, the values of the weekly variable arranged as a daily series. Then the following PROC SSM statements show you how to distribute y at the daily interval:
proc ssm data=test;
id date interval=day;
startWeek = (weekday(date) = 2); /* indicator of Monday */
state ...;
comp term1 = ...;
...;
state noise(1) type=wn ...;
comp wnoise = noise[1];
model y = term1 term2 ... wnoise / distribute(start=startWeek);
/* daily_Y = sum of all terms in the MODEL statement */
eval daily_Y = term1 + term2 + ...+ wnoise;
output out=...;
run;
Here are a few comments about this program:
The terms in the MODEL statement correspond to the observation equation for the unobserved daily series . However, the DISTRIBUTE(START=startWeek) option causes the SSM procedure to use the implied model (with the augmented state vector) to analyze
y—the weekly variable arranged as a daily series.
Because wnoise—the white noise term () in the observation equation of
—is subsequently to be used in an EVAL statement, this program specifies it by using the STATE statement rather than by using the IRREGULAR statement.
Because daily_Y (the component specified in the EVAL statement) is the sum of all the terms in the MODEL statement, it corresponds to the unobserved daily series . Therefore, the smoothed estimate of
daily_Y (smoothed_daily_Y) provides the needed distribution of y at the daily interval.
In this release of the SSM procedure, the last element of the augmented state vector, , is always initialized with diffuse distribution. A more flexible specification of the initial distribution of
might become possible in a future release.
To keep the explanation simple, the preceding discussion was confined to a single response variable. In fact, you can use the SSM procedure for temporal distribution in more general settings—for example, you can consider temporal distribution of one or more flow variables in a multivariate model that includes one or more response variables of stock type, one or more response variables of flow type, and one or more explanatory variables. An illustration of such modeling is shown in Example 33.16. The modeling of a response variable as a temporal aggregate of some unobserved latent variable is also needed in a process known as benchmarking; see Durbin and Koopman (2012, chap. 3, sec. 10.2) and Pelagatti (2015, chap. 9, sec. 2). You can use the SSM procedure in such benchmarking situations as well.
Note: Model specification in the temporal distribution setting requires some care. The model for the lower-frequency observed series, , is based on the model specified for the unobserved higher-frequency series
. Because aggregation from higher frequency to lower frequency involves loss of information, an otherwise identifiable model for
can lead to unidentifiable model for
.