(View the complete code for this example.)
This example illustrates how you can do model-based temporal aggregation of a response variable. The following DATA step creates a data set, Nile, by using a well-known data set that contains annual recordings of the Nile water level measured between the years 1871 and 1970. The Nile water level is clearly a stock variable, and temporal aggregation of such variables is usually meaningless. However, for illustration purposes, assume that you are interested in forecasting triannual totals of the water level.
data Nile;
input level @@;
year = intnx( 'year', '1jan1871'd, _n_-1 );
format year year4.;
startAggr = (mod(_n_, 3) = 1);
datalines;
1120 1160 963 1210 1160 1160 813 1230 1370 1140
995 935 1110 994 1020 960 1180 799 958 1140
1100 1210 1150 1250 1260 1220 1030 1100 774 840
874 694 940 833 701 916 692 1020 1050 969
831 726 456 824 702 1120 1100 832 764 821
768 845 864 862 698 845 744 796 1040 759
781 865 845 944 984 897 822 1010 771 676
649 846 812 742 801 1040 860 874 848 890
744 749 838 1050 918 986 797 923 975 815
1020 906 901 1170 912 746 919 718 714 740
. . . . . . .
;
The Nile date set contains three variables: year indicates the observation year, level contains the yearly water level, and startAggr is a dummy variable that indicates the start of the triannual aggregation intervals. It is known that for the time span of the observations, the yearly water levels can be reasonably modeled as a sum of a random walk trend, a level shift in the year 1899, and the observation error. The following statements show you how to obtain forecasts of the triannual water level that are consistent with the model postulated for the yearly water levels:
proc ssm data=Nile;
id year interval=year;
shift1899 = ( year >= '1jan1899'd );
trend rw(rw);
irregular wn;
model level = shift1899 RW wn / aggregate(start=startAggr);
output out=nileOut;
quit;
As a result of running this program, you get the usual output that is associated with fitting the specified model to the yearly water level. In addition (as explained in the section Temporal Aggregation), the AGGREGATE option in the MODEL statement causes the estimation and printing of triannual aggregates of the water level. Output 33.17.1 shows the last few rows of this output. When the summands—the response values—in the aggregation are known, the aggregation can be done without error; that is, the standard error of the estimation is zero. However, when at least one summand in the aggregate is missing, the standard error of estimation is nonzero.
Output 33.17.1: Triannual Aggregate Values of the Nile Water Levels (Partial Output)
| Time | Response | Start_Flag | Aggregate | StdErr | Lower | Upper |
|---|---|---|---|---|---|---|
| 1967 | 919 | 1 | 919 | 0 | 919 | 919 |
| 1968 | 718 | 0 | 1637 | 0 | 1637 | 1637 |
| 1969 | 714 | 0 | 2351 | 0 | 2351 | 2351 |
| 1970 | 740 | 1 | 740 | 0 | 740 | 740 |
| 1971 | . | 0 | 1590 | 128 | 1338 | 1842 |
| 1972 | . | 0 | 2440 | 183 | 2081 | 2798 |
| 1973 | . | 1 | 850 | 128 | 598 | 1102 |
| 1974 | . | 0 | 1700 | 183 | 1341 | 2058 |
| 1975 | . | 0 | 2550 | 226 | 2108 | 2992 |
| 1976 | . | 1 | 850 | 128 | 598 | 1102 |
| 1977 | . | 0 | 1700 | 183 | 1341 | 2058 |