SAS Macros and Functions

PROBDF Function for Dickey-Fuller Tests

The PROBDF function calculates significance probabilities for Dickey-Fuller tests for unit roots in time series. The PROBDF function can be used wherever SAS library functions can be used, including DATA step programs, SCL programs, and PROC MODEL programs.

Syntax

PROBDF( x, n < , d < , type > > )

x

is the test statistic.

n

is the sample size. The minimum value of n allowed depends on the value specified for the third argument, d. For d in the set (1,2,4,6,12), n must be an integer greater than or equal to normal m normal a normal x left-parenthesis 2 d comma 5 right-parenthesis; for other values of d the minimum value of n is 24.

d

is an optional integer giving the degree of the unit root tested for. Specify d=1 for tests of a simple unit root left-parenthesis 1 minus normal upper B right-parenthesis. Specify d equal to the seasonal cycle length for tests for a seasonal unit root left-parenthesis 1 minus normal upper B Superscript d Baseline right-parenthesis. The default value of d is 1; that is, a test for a simple unit root left-parenthesis 1 minus normal upper B right-parenthesis is assumed if d is not specified. The maximum value of d allowed is 12.

type

is an optional character argument that specifies the type of test statistic used. The values of type are the following:

SZM

studentized test statistic for the zero mean (no intercept) case

RZM

regression test statistic for the zero mean (no intercept) case

SSM

studentized test statistic for the single mean (intercept) case

RSM

regression test statistic for the single mean (intercept) case

STR

studentized test statistic for the deterministic time trend case

RTR

regression test statistic for the deterministic time trend case

The values STR and RTR are allowed only when d=1. The default value of type is SZM.

Details

Theoretical Background

When a time series has a unit root, the series is nonstationary and the ordinary least squares (OLS) estimator is not normally distributed. The limiting distribution of the OLS estimator of autoregressive models for time series with a simple unit root was studied by Dickey (1976); Dickey and Fuller (1979). Dickey, Hasza, and Fuller (1984) obtained the limiting distribution for time series with seasonal unit roots. We will mainly introduce the nonseasonal tests in the following and list references for the nonseasonal tests.

Consider the Dickey-Fuller regression first. The null hypothesis is that there is an autoregressive unit root upper H 0 colon alpha equals 1, and the alternative is upper H Subscript a Baseline colon StartAbsoluteValue alpha EndAbsoluteValue less-than 1, where alpha is the autoregressive coefficient of the time series

y Subscript t Baseline equals alpha y Subscript t minus 1 Baseline plus epsilon Subscript t

This is referred to as the zero mean (ZM) model. The standard Dickey-Fuller (DF) test assumes that errors epsilon Subscript t are white noise. There are two other types of regression models that include a constant or a time trend as follows:

StartLayout 1st Row 1st Column y Subscript t 2nd Column equals mu plus alpha y Subscript t minus 1 Baseline plus epsilon Subscript t Baseline 2nd Row 1st Column y Subscript t 2nd Column equals mu plus beta t plus alpha y Subscript t minus 1 Baseline plus epsilon Subscript t EndLayout

These two models are referred to as the constant mean model (SM) and the trend model (TR), respectively. The constant mean model includes a constant mean mu of the time series. However, the interpretation of mu depends on the stationarity in the following sense: the mean in the stationary case when alpha less-than 1 is the trend in the integrated case when alpha equals 1. Therefore, the null hypothesis should be the joint hypothesis that alpha equals 1 and mu equals 0. However for the unit root tests, the test statistics are concerned with the null hypothesis of alpha equals 1. The joint null hypothesis is not commonly used. This issue is address in Bhargava (1986) with a different nesting model.

Under the null of I(1) of the Dickey-Fuller test, the differenced process is not serially correlated. There is a great need for the generalization of this specification. The augmented Dickey-Fuller (ADF) test, originally proposed in Dickey and Fuller (1979), adjusts for the serial correlation in the time series by adding lagged first differences to the autoregressive model as follows. Consider the (p plus 1)th-order autoregressive time series

y Subscript t Baseline equals alpha 1 y Subscript t minus 1 Baseline plus alpha 2 y Subscript t minus 2 Baseline plus midline-horizontal-ellipsis plus alpha Subscript p plus 1 Baseline y Subscript t minus p minus 1 Baseline plus e Subscript t

and its characteristic equation

m Superscript p plus 1 Baseline minus alpha 1 m Superscript p Baseline minus alpha 2 m Superscript p minus 1 Baseline minus midline-horizontal-ellipsis minus alpha Subscript p plus 1 Baseline equals 0

If all the characteristic roots are less than 1 in absolute value, y Subscript t is stationary. y Subscript t is nonstationary if there is a unit root. If there is a unit root, the sum of the autoregressive parameters is 1, and hence you can test for a unit root by testing whether the sum of the autoregressive parameters is 1 or not. The no-intercept model is parameterized as

nabla y Subscript t Baseline equals delta y Subscript t minus 1 Baseline plus theta 1 nabla y Subscript t minus 1 Baseline plus midline-horizontal-ellipsis plus theta Subscript p Baseline nabla y Subscript t minus p Baseline plus e Subscript t

where nabla y Subscript t Baseline equals y Subscript t Baseline minus y Subscript t minus 1 and

delta equals alpha 1 plus midline-horizontal-ellipsis plus alpha Subscript p plus 1 Baseline minus 1
theta Subscript k Baseline equals minus alpha Subscript k plus 1 Baseline minus midline-horizontal-ellipsis minus alpha Subscript p plus 1

The estimators are obtained by regressing nabla y Subscript t on y Subscript t minus 1 Baseline comma nabla y Subscript t minus 1 Baseline comma ellipsis comma nabla y Subscript t minus p Baseline. The t statistic of the ordinary least squares estimator of delta is the test statistic for the unit root test.

If the type argument value specifies a test for a nonzero mean (intercept case), the autoregressive model includes a mean term alpha 0. If the type argument value specifies a test for a time trend, the model also includes a time trend term and the model is as follows:

nabla y Subscript t Baseline equals alpha 0 plus gamma t plus delta y Subscript t minus 1 Baseline plus theta 1 nabla y Subscript t minus 1 Baseline plus midline-horizontal-ellipsis plus theta Subscript p Baseline nabla y Subscript t minus p Baseline plus e Subscript t

For testing for a seasonal unit root, consider the multiplicative model

left-parenthesis 1 minus alpha Subscript d Baseline upper B Superscript d Baseline right-parenthesis left-parenthesis 1 minus theta 1 upper B minus midline-horizontal-ellipsis minus theta Subscript p Baseline upper B Superscript p Baseline right-parenthesis y Subscript t Baseline equals e Subscript t

Let nabla Superscript d Baseline y Subscript t Baseline identical-to y Subscript t Baseline minus y Subscript t minus d. The test statistic is calculated in the following steps:

  1. Regress nabla Superscript d Baseline y Subscript t on nabla Superscript d Baseline y Subscript t minus 1 midline-horizontal-ellipsis nabla Superscript d Baseline y Subscript t minus p to obtain the initial estimators ModifyingAbove theta With caret Subscript i and compute residuals ModifyingAbove e With caret Subscript t. Under the null hypothesis that alpha Subscript d Baseline equals 1, ModifyingAbove theta With caret Subscript i are consistent estimators of theta Subscript i.

  2. Regress ModifyingAbove e With caret Subscript t on left-parenthesis 1 minus ModifyingAbove theta With caret Subscript 1 Baseline upper B minus midline-horizontal-ellipsis minus ModifyingAbove theta With caret Subscript p Baseline upper B Superscript p Baseline right-parenthesis y Subscript t minus d Baseline comma nabla Superscript d Baseline y Subscript t minus 1 Baseline comma ellipsis comma nabla Superscript d Baseline y Subscript t minus p Baseline to obtain estimates of delta equals alpha Subscript d Baseline minus 1 and theta Subscript i Baseline minus ModifyingAbove theta With caret Subscript i.

The t ratio for the estimate of delta produced by the second step is used as a test statistic for testing for a seasonal unit root. The estimates of theta Subscript i are obtained by adding the estimates of theta Subscript i Baseline minus ModifyingAbove theta With caret Subscript i from the second step to ModifyingAbove theta With caret Subscript i from the first step.

The series left-parenthesis 1 minus upper B Superscript d Baseline right-parenthesis y Subscript t is assumed to be stationary, where d is the value of the third argument to the PROBDF function.

If the series is an ARMA process, a large value of p might be desirable in order to obtain a reliable test statistic. To determine an appropriate value for p, see Said and Dickey (1984).

Test Statistics

The Dickey-Fuller test is used to test the null hypothesis that the time series exhibits a lag d unit root against the alternative of stationarity. The PROBDF function computes the probability of observing a test statistic more extreme than x under the assumption that the null hypothesis is true. You should reject the unit root hypothesis when PROBDF returns a small (significant) probability value.

Consider the Dickey-Fuller regression first. There are several different versions of the Dickey-Fuller test. The PROBDF function supports six versions, as selected by the type argument. Specify the type value that corresponds to the way that you calculated the test statistic x.

The last two characters of the type value specify the kind of regression model used to compute the Dickey-Fuller test statistic. The meaning of the last two characters of the type value are as follows:

ZM

zero mean or no-intercept case. The test statistic x is assumed to be computed from the regression model

y Subscript t Baseline equals alpha y Subscript t minus 1 Baseline plus e Subscript t
SM

single mean or intercept case. The test statistic x is assumed to be computed from the regression model

y Subscript t Baseline equals mu plus alpha y Subscript t minus 1 Baseline plus e Subscript t
TR

intercept and deterministic time trend case. The test statistic x is assumed to be computed from the regression model

y Subscript t Baseline equals mu plus gamma t plus alpha y Subscript t minus 1 Baseline plus e Subscript t

The first character of the type value specifies whether the regression test statistic or the studentized test statistic is used. Let ModifyingAbove alpha With caret be the estimated regression coefficient for the lag of the series, and let normal s normal e Subscript ModifyingAbove alpha With caret be the standard error of ModifyingAbove alpha With caret. The meaning of the first character of the type value is as follows:

R

the regression-coefficient-based test statistic. The test statistic is

rho equals n left-parenthesis ModifyingAbove alpha With caret minus 1 right-parenthesis
S

the studentized test statistic. The test statistic is

upper D upper F Subscript tau Baseline equals StartFraction left-parenthesis ModifyingAbove alpha With caret minus 1 right-parenthesis Over normal s normal e Subscript ModifyingAbove alpha With caret Baseline EndFraction

The first one is also called rho-test and the second is called tau-test. For the zero mean model, the asymptotic distributions of the Dickey-Fuller test statistics are

StartLayout 1st Row 1st Column n left-parenthesis ModifyingAbove alpha With caret minus 1 right-parenthesis 2nd Column right double arrow left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d upper W left-parenthesis r right-parenthesis right-parenthesis left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis squared d r right-parenthesis Superscript negative 1 Baseline 2nd Row 1st Column upper D upper F Subscript tau 2nd Column right double arrow left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d upper W left-parenthesis r right-parenthesis right-parenthesis left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis squared d r right-parenthesis Superscript negative 1 slash 2 EndLayout

For the constant mean model, the asymptotic distributions are

StartLayout 1st Row 1st Column n left-parenthesis ModifyingAbove alpha With caret minus 1 right-parenthesis 2nd Column right double arrow left-parenthesis left-bracket upper W left-parenthesis 1 right-parenthesis squared minus 1 right-bracket slash 2 minus upper W left-parenthesis 1 right-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis squared d r minus left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis squared right-parenthesis Superscript negative 1 Baseline 2nd Row 1st Column upper D upper F Subscript tau 2nd Column right double arrow left-parenthesis left-bracket upper W left-parenthesis 1 right-parenthesis squared minus 1 right-bracket slash 2 minus upper W left-parenthesis 1 right-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis squared d r minus left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis squared right-parenthesis Superscript negative 1 slash 2 EndLayout

For the trend model, the asymptotic distributions are

StartLayout 1st Row 1st Column n left-parenthesis ModifyingAbove alpha With caret minus 1 right-parenthesis 2nd Column right double arrow left-bracket upper W left-parenthesis r right-parenthesis d upper W plus 12 left-parenthesis integral Subscript 0 Superscript 1 Baseline r upper W left-parenthesis r right-parenthesis d r minus one-half integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r minus one-half upper W left-parenthesis 1 right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column minus upper W left-parenthesis 1 right-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-bracket upper D Superscript negative 1 3rd Row 1st Column upper D upper F Subscript tau 2nd Column right double arrow left-bracket upper W left-parenthesis r right-parenthesis d upper W plus 12 left-parenthesis integral Subscript 0 Superscript 1 Baseline r upper W left-parenthesis r right-parenthesis d r minus one-half integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r minus one-half upper W left-parenthesis 1 right-parenthesis right-parenthesis 4th Row 1st Column Blank 2nd Column minus upper W left-parenthesis 1 right-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-bracket upper D Superscript 1 slash 2 EndLayout

where

upper D equals integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis squared d r minus 12 left-parenthesis integral Subscript 0 Superscript 1 Baseline r left-parenthesis upper W left-parenthesis r right-parenthesis d r right-parenthesis squared plus 12 integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r integral Subscript 0 Superscript 1 Baseline r upper W left-parenthesis r right-parenthesis d r minus 4 left-parenthesis integral Subscript 0 Superscript 1 Baseline upper W left-parenthesis r right-parenthesis d r right-parenthesis squared

For more information about the Dickey-Fuller test null distribution see: Dickey and Fuller (1979); Dickey, Hasza, and Fuller (1984); Hamilton (1994). The preceding formulas are for the basic Dickey-Fuller test. The PROBDF function can also be used for the augmented Dickey-Fuller test, in which the error term e Subscript t is modeled as an autoregressive process; however, the test statistic is computed somewhat differently for the augmented Dickey-Fuller test. For the nonseasonal augmented Dickey-Fuller test, the test statistics can take one of the two forms similar to Dickey-Fuller test. One is the OLS t value

StartFraction ModifyingAbove alpha With caret minus 1 Over s d left-parenthesis ModifyingAbove alpha With caret right-parenthesis EndFraction

and the other is given by

StartFraction n left-parenthesis ModifyingAbove alpha With caret minus 1 right-parenthesis Over 1 minus ModifyingAbove alpha With caret Subscript 1 Baseline minus midline-horizontal-ellipsis minus ModifyingAbove alpha With caret Subscript p Baseline EndFraction

The asymptotic distributions of the test statistics are the same as those of the standard Dickey-Fuller test statistics. For information about seasonal and nonseasonal augmented Dickey-Fuller tests see Dickey, Hasza, and Fuller (1984); Hamilton (1994).

The PROBDF function is calculated from approximating functions fit to empirical quantiles that are produced by a Monte Carlo simulation that employs 10 Superscript 8 replications for each simulation. Separate simulations were performed for selected values of n and for d equals 1 comma 2 comma 4 comma 6 comma 12 (where n and d are the second and third arguments to the PROBDF function).

The maximum error of the PROBDF function is approximately plus-or-minus 10 Superscript negative 3 for d in the set (1,2,4,6,12) and can be slightly larger for other d values. Because the number of simulation replications used to produce the PROBDF function is much greater than the 60,000 replications used by Dickey and colleagues (Dickey and Fuller 1979; Dickey, Hasza, and Fuller 1984), the PROBDF function can be expected to produce results that are substantially more accurate than the critical values reported in those papers.

Examples

Suppose the data set TEST contains 104 observations of the time series variable Y, and you want to test the null hypothesis that there exists a lag 4 seasonal unit root in the Y series. The following statements illustrate how to perform the single-mean Dickey-Fuller regression coefficient test using PROC REG and PROBDF:

data test1;
   set test;
   y4 = lag4(y);
run;

proc reg data=test1 outest=alpha;
   model y = y4 / noprint;
run;

data _null_;
   set alpha;
   x = 100 * ( y4 - 1 );
   p = probdf( x, 100, 4, "RSM" );
   put p= pvalue5.3;
run;

To perform the augmented Dickey-Fuller test, regress the differences of the series on lagged differences and on the lagged value of the series, and compute the test statistic from the regression coefficient for the lagged series. The following statements illustrate how to perform the single-mean augmented Dickey-Fuller studentized test for a simple unit root using PROC REG and PROBDF:

data test1;
   set test;
   yl  = lag(y);
   yd  = dif(y);
   yd1 = lag1(yd); yd2 = lag2(yd);
   yd3 = lag3(yd); yd4 = lag4(yd);
run;

proc reg data=test1 outest=alpha covout;
   model yd = yl yd1-yd4 / noprint;
run;

data _null_;
   set alpha;
   retain a;
   if _type_ = 'PARMS' then a = yl ;
   if _type_ = 'COV' & _NAME_ = 'Y1' then do;
      x = a / sqrt(yl);
      p = probdf( x, 99, 1, "SSM" );
      put p= pvalue5.3;
      end;
run;

The %DFTEST macro provides an easier way to perform Dickey-Fuller tests. The following statements perform the same tests as the preceding example:

%dftest( test, y, ar=4 );
%put p=&dftest;
Last updated: June 19, 2025