SSM Procedure

Example 33.12 Multivariate Modeling: Long-Term Temperature Trends

(View the complete code for this example.)

In a presentation by Ansley and de Jong (2015), three monthly time series are jointly modeled to obtain long-term—several decades long—temperature predictions for certain regions of the northern hemisphere. This example shows how you can specify and fit the final model that this presentation proposed. The following data set, Temp, contains four variables: date dates the monthly observations; UAH contains monthly satellite global temperature readings, starting in December 1978; CRU contains monthly temperature data, starting in January 1850 (from a different source); and GISS contains monthly temperature data, starting in January 1880 (from yet another source). All these temperature data are scaled suitably so that the numbers represent temperature readings in centigrade.

data Temp;
input UAH CRU GISS @@;
date = intnx('month', '01jan1850'd, _n_-1);
format date date.;
datalines;
.    8.243    .
.    9.733    .

   ... more lines ...   

The following statements produce scatter plots of these three series in a single graph:

 proc sgplot data=Temp;
       title "Scatter Plots of the Temperature Series";
       scatter x=date y=cru;
       scatter x=date y=uah ;
       scatter x=date y=giss;
  run;

Output 33.12.1 shows the resulting graph. As already noted, these three series start at different points in the past. However, they all end at the same time: they all have measurements until January 2012, which is the last month in the data set. The mean levels of these series are different: the GISS measurements are generally larger than CRU and UAH by about 4 degrees. In addition, the variability in the CRU values seems to decrease with time (this is more apparent when the series is plotted by itself). The goal of the analysis is to use these data to make long-term predictions about future temperature levels. The following statements append 1,200 missing measurements to Temp, so that the model fitted by using the SSM procedure can be extrapolated to obtain temperature forecasts 100 years in the future:

data append(keep=date cru giss uah);
    do i=1 to 1200;
        cru = .; giss=.; uah=.;
        date = intnx('month','01jan2012'D, i);
    format date monyy7.;
    output;
    end;
run;
 proc append base=Temp data=Append; run;

Output 33.12.1: Scatter Plots of CRU, UAH, and GISS

Scatter Plots of , , and


Ansley and de Jong propose a parsimonious model that links these three time series. It can be described as

StartLayout 1st Row 1st Column normal upper G normal upper I normal upper S normal upper S Subscript t 2nd Column equals 3rd Column mu Subscript t Baseline plus a zeta Subscript t plus a r 1 epsilon Subscript 1 t 2nd Row 1st Column normal upper C normal upper R normal upper U Subscript t 2nd Column equals 3rd Column beta Subscript c r u Baseline plus mu Subscript t Baseline plus a zeta Subscript t plus a epsilon Subscript 2 t 3rd Row 1st Column normal upper U normal upper A normal upper H Subscript t 2nd Column equals 3rd Column beta Subscript u a h Baseline plus mu Subscript t Baseline plus a zeta Subscript t plus a r 3 epsilon Subscript 3 t EndLayout

where beta Subscript c r u and beta Subscript u a h are intercepts that are associated with CRU and UAH, respectively; mu Subscript t is an integrated random walk trend; zeta Subscript t is a zero-mean, autoregressive noise term (which is scaled by an unknown scaling factor a); and epsilon Subscript i t (i equals 1 comma 2 comma 3) are independent observation errors with different variances that are also scaled suitably. Note that the trend mu Subscript t and the autoregressive noise term zeta Subscript t are shared by the models of the three series, and, for identification purposes, the intercept for GISS is taken to be zero. In addition, the model parameters are assumed to be interrelated and are parameterized in a particular way (which leads to fewer parameters to estimate, and their relative scaling helps in parameter estimation). This special parameterization can be expressed as a function of seven basic parameters: loga1, logr1, logr3, logsigma, b, c, and rhoParm (this naming convention is different from that used by Ansley and de Jong (2015)).

Let sigma squared equals exp left-parenthesis 2 logsigma right-parenthesis, and let the scaling factors a equals exp left-parenthesis loga 1 right-parenthesis, r 1 equals exp left-parenthesis logr 1 right-parenthesis, and r 3 equals exp left-parenthesis logr 3 right-parenthesis. Then the model parameters can be described as follows:

  • The parameters that are associated with the autoregression zeta Subscript t: the damping factor rho equals StartFraction exp left-parenthesis rhoParm right-parenthesis minus 1 Over exp left-parenthesis rhoParm right-parenthesis plus 1 EndFraction, the variance of the disturbance term = a squared sigma squared, and the variance of the initial state = a squared sigma squared slash left-parenthesis 1 minus rho squared right-parenthesis

  • The parameters that are associated with the integrated random walk trend: the variance of the disturbance term in the slope equation = sigma squared

  • The variance of the observation errors for GISS = a squared r 1 squared sigma squared

  • The variance of the observation errors for UAH = a squared r 3 squared sigma squared

  • The variance of the observation errors for CRU is taken to be time-varying with the following form: for t less-than-or-equal-to n o b s comma

    sigma Subscript 2 comma t Superscript 2 Baseline equals a squared sigma squared exp left-parenthesis 2 b plus 2 c t slash n o b s right-parenthesis

    where n o b s = 1,945 (the number of observations in the unappended data set). For t greater-than 1,945, it is fixed at its last value: sigma Subscript 2 comma t Superscript 2 Baseline equals a squared sigma squared exp left-parenthesis 2 b plus 2 c right-parenthesis.

The following DATA step adds an observation index, tindex, to Temp, which is used in the SSM procedure to define the time-varying observation error variance for CRU:

data temp;
   set temp;
   tindex = _n_;
run;

The following statements fit the preceding model to the Temp data:

 ods output RegressionEstimates=regEst;
 proc ssm data=Temp;
     id date interval=month;
     parms loga1 logr1 logr3  logsigma;
     parms b=0 c=0;
     parms rhoParm;
     rho = (exp(rhoParm)-1)/(exp(rhoParm)+1);
     sigmaSq = exp(2*logsigma);
     initSigmaSq = sigmaSq/(1-rho*rho);
     a1 = exp(loga1);
     a1Sq = a1*a1;
     r1sq  = exp(2*logr1);
     r3sq  = exp(2*logr3);
     giss_var = a1Sq*r1sq*sigmaSq;
     nobs=1945;
     if tindex <= nobs then
         cru_var = a1Sq*exp(2*b + 2*c*tindex/nobs)*sigmaSq;
     else cru_var = a1Sq*exp(2*b + 2*c)*sigmaSq;
     uah_var = a1Sq*r3sq*sigmaSq;
     UAH_Intercept=1.0;
     CRU_Intercept=1.0;
     trend level(ll) variance=0 slopevar=sigmaSq;
     state auto(1) T(g)=(rho) cov(g)=(sigmaSq)  cov1(g)=(initSigmaSq);
     comp auto_common = auto*(a1);
     state wn(3) type=wn cov(d)=(giss_var cru_var uah_var);
     comp wn_giss = wn[1];
     comp wn_cru = wn[2];
     comp wn_uah = wn[3];
     model GISS = level auto_common  wn_giss;
     model CRU = CRU_Intercept level auto_common  wn_cru;
     model UAH = UAH_Intercept level auto_common  wn_uah;
     comp slope = level_state_*(0 1);
     output out=For pdv press;
 run;

Output 33.12.2 shows the estimated intercepts ModifyingAbove beta With caret Subscript c r u and ModifyingAbove beta With caret Subscript u a h. As expected, they are quite close (see the scatter plots of CRU and UAH in Output 33.12.1).

Output 33.12.2: Estimated Intercepts for CRU and UAH

The SSM Procedure

Regression Parameter Estimates
Response Variable Regression Variable Estimate Standard
Error
t Value Pr > |t|
CRU CRU_Intercept -4.09 0.00359 -1139.0 <.0001
UAH UAH_Intercept -4.48 0.00671 -666.54 <.0001


The estimates of the basic parameters that underlie the model parameters are shown in Output 33.12.3.

Output 33.12.3: Estimates of Basic Model Parameters

Estimates of Named Parameters
Parameter Estimate Standard
Error
t Value
loga1 7.779 0.3899 19.95
logr1 -1.005 0.0884 -11.36
logr3 -0.230 0.0453 -5.09
logsigma -9.641 0.3885 -24.81
b 0.737 0.0502 14.68
c -1.402 0.0690 -20.34
rhoParm 1.429 0.0766 18.66


The following DATA steps add two variables (CRU_ADJ = CRUModifyingAbove beta With caret Subscript c r u and UAH_ADJ = UAHModifyingAbove beta With caret Subscript u a h) to the output data set For. These adjusted versions of CRU and UAH have the same mean level as GISS—estimated mu Subscript t.

data _NULL_;
   set regEst;
   if _n_ = 1 then call symput('intercept1',trim(left(estimate)));
   else call symput('intercept2',trim(left(estimate)));
run;
data for;
    set For;
    cru_adj = cru - &intercept1;
    uah_adj = uah - &intercept2;
run;

The following statements produce a graph that contains four plots: scatter plots of GISS, CRU_ADJ, and UAH_ADJ and a series plot of the estimated mu Subscript t.

 proc sgplot data=For;
    where date < '01feb2021'd;
    title "Fitted Trend for the Temp Series (Up to Year 2020) ";
    title2 "(CRU and UAH Adjusted by Their Estimated Intercepts) ";
    scatter x=date y=cru_adj / LEGENDLABEL="Adjusted CRU"
        MARKERATTRS=GraphData1(symbol=star size=3);
    scatter x=date y=uah_adj / LEGENDLABEL="Adjusted UAH"
        MARKERATTRS=GraphData2(symbol=plus size=3);
    scatter x=date y=giss /
        MARKERATTRS=GraphData3(symbol=triangle size=3);
    series x=date y=smoothed_level / MARKERATTRS=GraphData4;
 run;

Output 33.12.4 shows the resulting graph. It shows that the estimated mean level mu Subscript t tracks the observed data quite well.

Output 33.12.4: Fitted Trend mu Subscript t

Fitted Trend μt


The following statements produce the time series plot of the estimated mu Subscript t along with the 95% confidence band:

 proc sgplot data=For;
    title "Temperature Projections for the Next 100 Years";
    band x=date lower=smoothed_lower_level upper=smoothed_upper_level;
    series x=date y=smoothed_level;
    refline '01feb2012'd / axis=x lineattrs=(pattern=shortdash)
       LEGENDLABEL= "Start of Multistep Forecasts"
       name="Forecast Reference Line";
 run;

Output 33.12.5 shows the resulting graph.

Output 33.12.5: Long-Term Forecasts of mu Subscript t

Long-Term Forecasts of μt


The following statements produce a similar graph for the estimated slope of mu Subscript t:

 proc sgplot data=For;
    where date <=  '01feb2031'd;
    title "The Monthly Rate of Temperature Change  (Up to Year 2030)";
    band x=date lower=smoothed_lower_slope upper=smoothed_upper_slope;
    series x=date y=smoothed_slope;
    refline '01feb2012'd / axis=x lineattrs=(pattern=shortdash)
       LEGENDLABEL= "Start of Multistep Forecasts"
       name="Forecast Reference Line";
    refline 0 / axis=y lineattrs=(pattern=dash);
 run;

Output 33.12.6 shows the resulting plot of the estimated slope of mu Subscript t.

Output 33.12.6: Forecasts of the Slope of mu Subscript t

Forecasts of the Slope of μt


Based on the preceding analysis (see the plots of mu Subscript t and its slope in Output 33.12.5 and Output 33.12.6), it appears that there has been statistically significant warming over the last 10 years, but the warming does not appear to be accelerating.

Last updated: June 19, 2025