MODEL Procedure

Example 24.13 Switching Regression Example

(View the complete code for this example.)

Take the usual linear regression problem

y equals bold upper X beta plus u

where Y denotes the n column vector of the dependent variable, bold upper X denotes the (n times k) matrix of independent variables, beta denotes the k column vector of coefficients to be estimated, n denotes the number of observations (i equals 1 comma 2 comma ellipsis comma n), and k denotes the number of independent variables.

You can take this basic equation and split it into two regimes, where the ith observation on y is generated by one regime or the other,

StartLayout 1st Row  y Subscript i Baseline equals sigma-summation Underscript j equals 1 Overscript k Endscripts beta Subscript 1 j Baseline upper X Subscript j i Baseline plus u Subscript 1 i Baseline equals x prime Subscript i Baseline beta 1 plus u Subscript 1 i Baseline 2nd Row  y Subscript i Baseline equals sigma-summation Underscript j equals 1 Overscript k Endscripts beta Subscript 2 j Baseline upper X Subscript j i Baseline plus u Subscript 2 i Baseline equals x prime Subscript i Baseline beta 2 plus u Subscript 2 i EndLayout

where x Subscript h i and x Subscript h j are the ith and jth observations, respectively, on x Subscript h. The errors, u Subscript 1 i and u Subscript 2 i, are assumed to be distributed normally and independently with mean zero and constant variance. The variance for the first regime is sigma 1 squared, and the variance for the second regime is sigma 2 squared. If sigma 1 squared not-equals sigma 2 squared and beta 1 not-equals beta 2, the regression system given previously is thought to be switching between the two regimes.

The problem is to estimate beta 1, beta 2, sigma 1, and sigma 2 without knowing a priori which of the n values of the dependent variable, y, was generated by which regime. If it is known a priori which observations belong to which regime, a simple Chow test can be used to test sigma 1 squared equals sigma 2 squared and beta 1 equals beta 2.

Using Goldfeld and Quandt’s D-method for switching regression, you can solve this problem. Assume that observations exist on some exogenous variables z Subscript 1 i Baseline comma z Subscript 2 i Baseline comma ellipsis comma z Subscript p i Baseline, where z determines whether the ith observation is generated from one equation or the other. The equations are given as

StartLayout 1st Row 1st Column y Subscript i 2nd Column equals 3rd Column x prime Subscript i Baseline beta 1 plus u Subscript 1 i Baseline if sigma-summation Underscript j equals 1 Overscript p Endscripts pi Subscript j Baseline z Subscript j i Baseline less-than-or-equal-to 0 2nd Row 1st Column y Subscript i 2nd Column equals 3rd Column x prime Subscript i Baseline beta 2 plus u Subscript 2 i Baseline if sigma-summation Underscript j equals 1 Overscript p Endscripts pi Subscript j Baseline z Subscript j i Baseline greater-than 0 EndLayout

where pi Subscript j are unknown coefficients to be estimated. Define d left-parenthesis z Subscript i Baseline right-parenthesis as a continuous approximation to a step function. Replacing the unit step function with a continuous approximation by using the cumulative normal integral enables a more practical method that produces consistent estimates.

d left-parenthesis z Subscript i Baseline right-parenthesis equals StartFraction 1 Over StartRoot 2 pi EndRoot sigma EndFraction integral Subscript negative normal infinity Superscript sigma-summation pi Subscript j Baseline z Subscript j i Baseline Baseline e x p left-bracket minus one-half StartFraction xi squared Over sigma squared EndFraction right-bracket d xi

bold upper D is the n dimensional diagonal matrix consisting of d left-parenthesis z Subscript i Baseline right-parenthesis:

bold upper D equals Start 4 By 4 Matrix 1st Row 1st Column d left-parenthesis z 1 right-parenthesis 2nd Column 0 3rd Column 0 4th Column 0 2nd Row 1st Column 0 2nd Column d left-parenthesis z 2 right-parenthesis 3rd Column 0 4th Column 0 3rd Row 1st Column 0 2nd Column 0 3rd Column down-right-diagonal-ellipsis 4th Column 0 4th Row 1st Column 0 2nd Column 0 3rd Column 0 4th Column d left-parenthesis z Subscript n Baseline right-parenthesis EndMatrix

The parameters to estimate are now the k beta 1’s, the k beta 2’s, sigma 1 squared, sigma 2 squared, p pi’s, and the sigma introduced in the d left-parenthesis z Subscript i Baseline right-parenthesis equation. The sigma can be considered as given a priori, or it can be estimated, in which case, the estimated magnitude provides an estimate of the success in discriminating between the two regimes (Goldfeld and Quandt 1976). Given the preceding equations, the model can be written as

upper Y equals left-parenthesis bold upper I minus bold upper D right-parenthesis bold upper X beta 1 plus bold upper D bold upper X beta 2 plus upper W

where upper W equals left-parenthesis bold upper I minus bold upper D right-parenthesis upper U 1 plus bold upper D upper U 2, and W is a vector of unobservable and heteroscedastic error terms. The covariance matrix of W is denoted by bold upper Omega, where bold upper Omega equals left-parenthesis bold upper I minus bold upper D right-parenthesis squared sigma 1 squared plus bold upper D squared sigma 2 squared. The maximum likelihood parameter estimates maximize the following log-likelihood function:

StartLayout 1st Row 1st Column log upper L 2nd Column equals 3rd Column minus StartFraction n Over 2 EndFraction log 2 pi minus one-half log bar bold upper Omega bar minus 2nd Row 1st Column Blank 2nd Column Blank 3rd Column one-half asterisk left-bracket left-bracket upper Y minus left-parenthesis bold upper I minus bold upper D right-parenthesis bold upper X beta 1 minus bold upper D bold upper X beta 2 right-bracket prime bold upper Omega Superscript negative 1 Baseline left-bracket upper Y minus left-parenthesis bold upper I minus bold upper D right-parenthesis bold upper X beta 1 minus bold upper D bold upper X beta 2 right-bracket right-bracket EndLayout

As an example, you now can use this switching regression likelihood to develop a model of housing starts as a function of changes in mortgage interest rates. The data for this example are from the U.S. Census Bureau and cover the period from January 1973 to March 1999. The hypothesis is that there are different coefficients on your model based on whether the interest rates are going up or down.

So the model for z Subscript i is

z Subscript i Baseline equals p asterisk left-parenthesis normal r normal a normal t normal e Subscript i Baseline minus normal r normal a normal t normal e Subscript i minus 1 Baseline right-parenthesis

where normal r normal a normal t normal e Subscript i is the mortgage interest rate at time i and p is a scale parameter to be estimated.

The regression model is

StartLayout 1st Row 1st Column normal s normal t normal a normal r normal t normal s Subscript i 2nd Column equals 3rd Column normal i normal n normal t normal e normal r normal c normal e normal p normal t Subscript 1 Baseline plus normal a normal r 1 asterisk normal s normal t normal a normal r normal t normal s Subscript i minus 1 Baseline plus normal d normal j normal f 1 asterisk normal d normal e normal c normal j normal a normal n normal f normal e normal b z Subscript i Baseline less-than 0 2nd Row 1st Column normal s normal t normal a normal r normal t normal s Subscript i 2nd Column equals 3rd Column normal i normal n normal t normal e normal r normal c normal e normal p normal t Subscript 2 Baseline plus normal a normal r 2 asterisk normal s normal t normal a normal r normal t normal s Subscript i minus 1 Baseline plus normal d normal j normal f 2 asterisk normal d normal e normal c normal j normal a normal n normal f normal e normal b z Subscript i Baseline greater-than equals 0 EndLayout

where normal s normal t normal a normal r normal t normal s Subscript i is the number of housing starts at month i and normal d normal e normal c normal j normal a normal n normal f normal e normal b is a dummy variable that indicates that the current month is one of December, January, or February.

This model is written by using the following SAS statements:

title1 'Switching Regression Example';

proc model data=switch;
   parms sig1=10 sig2=10 int1 b11 b13 int2 b21 b23 p;
   bounds 0.0001 < sig1 sig2;

   decjanfeb = ( month(date) = 12 | month(date) <= 2 );

   a = p*dif(rate);       /* Upper bound of integral */
   d = probnorm(a);       /* Normal CDF as an approx of switch */

                          /* Regime 1 */
   y1 = int1 + zlag(starts)*b11 + decjanfeb *b13 ;
                          /* Regime 2 */
   y2 = int2 + zlag(starts)*b21 + decjanfeb *b23 ;
                          /* Composite regression equation */
   starts  = (1 - d)*y1 +  d*y2;

                         /* Resulting log-likelihood function */
   logL = (1/2)*( (log(2*3.1415)) +
        log( (sig1**2)*((1-d)**2)+(sig2**2)*(d**2) )
       + (resid.starts*( 1/( (sig1**2)*((1-d)**2)+
        (sig2**2)*(d**2) ) )*resid.starts) ) ;

   errormodel starts ~ general(logL);

   fit starts / method=marquardt converge=1.0e-5;

     /* Test for significant differences in the parms */
   test int1 = int2 ,/ lm;
   test b11 = b21 ,/ lm;
   test b13 = b23 ,/ lm;
   test sig1 = sig2 ,/ lm;

run;

Four TEST statements are added to test the hypothesis that the parameters are the same in both regimes. The parameter estimates and ANOVA table from this run are shown in Output 24.13.1.

Output 24.13.1: Parameter Estimates from the Switching Regression

Switching Regression Example

The MODEL Procedure

Nonlinear Liklhood Summary of Residual Errors 
Equation DF Model DF Error SSE MSE Root MSE R-Square Adj R-Sq Label
starts 9 304 85878.0 282.5 16.8075 0.7806 0.7748 Housing Starts

Nonlinear Liklhood Parameter Estimates
Parameter Estimate Approx Std Err t Value Approx
Pr > |t|
sig1 15.47484 0.9476 16.33 <.0001
sig2 19.77808 1.2711 15.56 <.0001
int1 32.82221 5.9087 5.55 <.0001
b11 0.73952 0.0444 16.64 <.0001
b13 -15.4556 3.1912 -4.84 <.0001
int2 42.73348 6.8166 6.27 <.0001
b21 0.734117 0.0478 15.37 <.0001
b23 -22.5184 4.2989 -5.24 <.0001
p 25.94712 8.5201 3.05 0.0025


The test results shown in Output 24.13.2 suggest that the variance of the housing starts, SIG1 and SIG2, are significantly different in the two regimes. The tests also show a significant difference in the AR term on the housing starts.

Output 24.13.2: Test Results for Switching Regression

Test Results
Test Type Statistic Pr > ChiSq Label
Test0 L.M. 1.00 0.3185 int1 = int2
Test1 L.M. 15634 <.0001 b11 = b21
Test2 L.M. 1.45 0.2279 b13 = b23
Test3 L.M. 4.39 0.0361 sig1 = sig2


Last updated: June 19, 2025