VARMAX Procedure

VARFIMA and VARFIMAX Modeling

VAR and VARMA series are short-range dependent (SRD) in the sense that their autocovariance function dies out exponentially fast with the increasing lag. However, in many financial and macroeconomics applications, stationary yet persistent series arise, calling for models that have a slowly decaying autocovariance function and that are therefore more suitable to capture long-range dependence in the data.

The VARFIMA model captures both long-range and short-range dependence dynamics in a multivariate series. For a k-dimensional series bold y Subscript t Baseline equals left-parenthesis y Subscript 1 t Baseline comma ellipsis comma y Subscript k t Baseline right-parenthesis Superscript prime Baseline comma t equals 1 comma ellipsis comma upper T comma the VARFIMAleft-parenthesis p comma upper D comma q right-parenthesis model is defined as

normal upper Phi left-parenthesis upper B right-parenthesis bold y Subscript t Baseline equals left-parenthesis upper I minus upper B right-parenthesis Superscript negative upper D Baseline normal upper Theta left-parenthesis upper B right-parenthesis bold-italic epsilon Subscript t

where B and I are the backshift and identity operators; upper D equals normal d normal i normal a normal g left-parenthesis d Subscript j Baseline right-parenthesis d Subscript j Baseline element-of left-parenthesis negative 1 slash 2 comma 1 slash 2 right-parenthesis, are the LRD parameters of the component series StartSet y Subscript j t Baseline EndSet Subscript t element-of normal upper Z, j equals 1 comma ellipsis comma k; and StartSet bold-italic epsilon Subscript t Baseline EndSet Subscript t element-of normal upper Z is a k-dimensional white noise series with zero mean normal upper E bold-italic epsilon Subscript t Baseline equals 0 and covariance normal upper E bold-italic epsilon Subscript t Baseline bold-italic epsilon Subscript t Superscript prime Baseline equals normal upper Sigma.

The fractional integration operator left-parenthesis upper I minus upper B right-parenthesis Superscript negative upper D allows for long memory in the series. On the other hand, normal upper Phi left-parenthesis z right-parenthesis and normal upper Theta left-parenthesis z right-parenthesis, which are the typical autoregressive and moving average matrix polynomials of orders p and q, respectively, capture the short-range dependence.

The VARFIMAleft-parenthesis p comma upper D comma q right-parenthesis series satisfies the multivariate long-range dependence definitions given in Kechagias and Pipiras (2015). Moreover, each component series StartSet y Subscript j t Baseline EndSet Subscript t element-of normal upper Z, j equals 1 comma ellipsis comma k, satisfies the univariate time and frequency domain LRD definitions given in Beran et al. (2013). The following sections briefly review these definitions and show how you can detect long-range dependence in the data before fitting a VARFIMA model.

Autocorrelation and Spectral Density of VARFIMA Series

The diagonal components of the autocorrelation matrix function of a VARFIMAleft-parenthesis p comma upper D comma q right-parenthesis series satisfy the univariate LRD time domain definition

rho Subscript i Baseline left-parenthesis n right-parenthesis tilde c 1 n Superscript 2 d Super Subscript i Superscript minus 1 Baseline comma i equals 1 comma ellipsis comma k comma normal a normal s n right-arrow normal infinity

where a Subscript n Baseline tilde b Subscript n implies that limit Underscript n right-arrow normal infinity Endscripts a Subscript n Baseline slash b Subscript n Baseline equals 1 and c 1 greater-than 0. Similarly, the diagonal components of the spectral density matrix function of a VARFIMAleft-parenthesis p comma upper D comma q right-parenthesis series satisfy

f Subscript i Baseline left-parenthesis lamda right-parenthesis tilde c 2 lamda Superscript minus 2 d Super Subscript i Superscript Baseline comma i equals 1 comma ellipsis comma k comma normal a normal s lamda right-arrow 0 Superscript plus

for some c 2 greater-than 0.

To obtain preliminary estimates of the LRD parameters, you can plot the logged periodogram values against the log of the Fourier frequencies lamda Subscript j Baseline equals 2 pi j slash upper T, j equals 1 comma ellipsis comma upper T slash 2 comma and then fit a line for frequencies near 0. The slope of this line is expected to be equal to minus 2 d Subscript i (the exponent in the right-hand side of the preceding relation). The following statements demonstrate this procedure for a synthetic VARFIMAleft-parenthesis 1 comma upper D comma 1 right-parenthesis series with upper T equals 2,000 and true parameters d 1 equals 0.4, d 2 equals 0.3, normal upper Phi 11 equals normal upper Sigma 11 equals normal upper Sigma 22 equals 3 comma normal upper Sigma 12 equals 0.5, normal upper Phi 11 equals 0.8, normal upper Phi 12 equals 0.3, normal upper Phi 21 equals negative 0.2, normal upper Phi 22 equals 0.1, normal upper Theta 11 equals 0.2, normal upper Theta 12 equals 0.4, normal upper Theta 21 equals 0, and normal upper Theta 22 equals 0.3:

data VARFIMA1D1;
   time = _N_;
   input y1 y2;
datalines;
1.495250048 2.694910375
4.503081454 1.42319642

   ... more lines ...   

3.12049851 5.330308391
7.732287586 1.665071247
;
/* Compute the two periodograms */
proc spectra data = VARFIMA1D1 out = spectra;
   var y1 y2;
run;
/* Convert to log scale */
data logspectra;
   set spectra(firstobs=2);
   /* compute Fourier frequencies */
   j = _N_;
   pi = constant('pi');
   logfreq = log(2*pi*j/2000);

   logpdg1 = log(P_01);
   logpdg2 = log(P_02);

   /* Introduce weights where regression will be performed */
   wt = (1<= j <=100);
   keep wt logfreq logpdg1 logpdg2;
run;
/* Regression for log-periodogram of y1*/
proc autoreg data = logspectra(obs = 100);
   model logpdg1 = logfreq;
run;
/* Regression for log-periodogram of y1*/
proc autoreg data = logspectra(obs = 100);
         model logpdg2 = logfreq;
run;

The output from the two regressions is shown in Figure 86 and Figure 87.

Figure 86: Regression Estimates for y1

The AUTOREG Procedure

Parameter Estimates
Variable DF Estimate Standard
Error
t Value Approx
Pr > |t|
Intercept 1 4.3279 0.2885 15.00 <.0001
logfreq 1 -0.9051 0.1245 -7.27 <.0001


Figure 87: Regression Estimates for y2

The AUTOREG Procedure

Parameter Estimates
Variable DF Estimate Standard
Error
t Value Approx
Pr > |t|
Intercept 1 2.0811 0.3172 6.56 <.0001
logfreq 1 -0.5227 0.1369 -3.82 0.0002


The following statements produce log-log plots of the two periodograms along with the regression lines:

/*Plot the periodograms in log-log scale*/
ods graphics on;

proc sgplot data = logspectra;
         series x = logfreq y = logpdg1 / lineattrs = (pattern = solid);
         reg y = logpdg1 x = logfreq / nomarkers weight = wt lineattrs =
                                       (thickness = 1 color = 'red' );
         inset "Slope = -0.905" / position = topright textattrs = (color = 'red');
         xaxis label = 'log-frequency';
         yaxis label = 'log-periodogram';
         title 'Log-periodogram of y1';
run;
proc sgplot data = logspectra;
         series x = logfreq y = logpdg2 / lineattrs = (pattern = solid);
         reg y = logpdg2 x = logfreq / nomarkers weight = wt lineattrs =
                                       (thickness = 1 color = 'red' );
         inset "Slope = -0.523" / position = topright textattrs = (color = 'red');
         xaxis label = 'log-frequency';
         yaxis label = 'log-periodogram';
         title 'Log-periodogram of y2';
run;

The final plots are shown in Figure 88.

Figure 88: Log-Log Periodogram Plots for the Two Series

logpdg1 logpdg2


Dividing the slopes by 2 and removing the negative signs yields preliminary estimates for the LRD parameters, ModifyingAbove d With caret Subscript 1 Baseline equals 0.45 and ModifyingAbove d With caret Subscript 2 Baseline equals 0.26.

Estimation

Estimation of all the parameters in the VARFIMA model is performed using the conditional likelihood Durbin-Levinson (CLDL) algorithm of Tsay (2010). This method uses the multivariate Durbin-Levinson algorithm, whose order of complexity is upper O left-parenthesis upper T squared right-parenthesis, making it computationally feasible for small or medium sample sizes.

The initial values of the LRD parameters are obtained by the semiparametric estimator of Geweke and Porter-Hudak (1983). The initial values of the AR and MA parameters are obtained from least squares estimation on the fractionally differenced series left-parenthesis upper I minus upper B right-parenthesis Superscript upper D Baseline bold y Subscript t. The LRD parameters are restricted in the range left-parenthesis negative 1 slash 2 comma 1 slash 2 right-parenthesis. If an initial LRD parameter estimate is outside this range, then the chosen starting value is either negative 1 slash 2 plus 10 Superscript negative 6 or 1 slash 2 minus 10 Superscript negative 6 for negative or positive initial semiparametric estimates, respectively.

Forecasting

One-step-ahead and multi-step-ahead forecasts for the VARFIMA series are based on a finite past. However, the h-step-ahead forecast errors for h greater-than 1 are based on the infinite past except for VARFIMA series that have only MA components. In the latter case, the forecast errors are also based on a finite past.

The following statements plot the h-step-ahead forecasts, h equals 1 comma ellipsis comma 36, for a bivariate synthetic VARFIMAleft-parenthesis 1 comma upper D comma 1 right-parenthesis series with upper T equals 400 and true parameters d 1 equals 0.4, d 2 equals 0.3, normal upper Phi 11 equals normal upper Sigma 11 equals normal upper Sigma 22 equals 3 comma normal upper Sigma 12 equals 0.5, normal upper Phi 11 equals 0.8, normal upper Phi 12 equals 0.3, normal upper Phi 21 equals negative 0.2, normal upper Phi 22 equals 0.1, normal upper Theta 11 equals 0.2, normal upper Theta 12 equals 0.4, normal upper Theta 21 equals 0, and normal upper Theta 22 equals 0.3. The statements also specify initial values for d 1 and d 2 close to the true parameter values.

data VARFIMA1D1N4;
   time = _N_;
   input y1 y2;
datalines;
0.55596529 2.114409393
-1.842925215 3.415027987

   ... more lines ...   

-2.86707489 1.147627529
-0.195787414 0.820107072
;
proc varmax data = VARFIMA1D1N4 plots = (forecasts);
         model y1 y2 / noint fi p=1 q=1;
         initial d(1) = 0.45, d(2) = 0.25;
         output out = forec back = 36 lead = 36;
run;

Figure 89: Plot of the Two Series and h-Step-Ahead Forecasts, h equals 1 comma ellipsis comma 36

VarfimaEx3For1 VarfimaEx3For2


The BACK option in the preceding SAS statements is used to specify the point where the historical data ends and multi-step-ahead forecasting begins. Note that the BACK option does not affect estimation. The latter is performed using the whole data set, even when you specify the BACK option.

Impulse Response Functions

The impulse response functions of the VARFIMA series are calculated using the methodology of Chung (2001). The following statements produce the first 12 simple, accumulated and orthogonal impulse response functions and their corresponding standard errors for the VARFIMAleft-parenthesis 1 comma upper D comma 1 right-parenthesis series of the preceding example.

proc varmax data = VARFIMA1D1N4 plots = (impulse);
         model y1 y2 / noint fi p=1 q=1 print = (impulse = (all));
run;

VARFIMAX Modeling

The VARFIMAXleft-parenthesis p comma upper D comma q comma s right-parenthesis series is defined as

normal upper Phi left-parenthesis upper B right-parenthesis bold y Subscript t Baseline plus normal upper Theta Superscript asterisk Baseline left-parenthesis upper B right-parenthesis bold x Subscript t Baseline equals left-parenthesis upper I minus upper B right-parenthesis Superscript negative upper D Baseline normal upper Theta left-parenthesis upper B right-parenthesis bold-italic epsilon Subscript t

where bold x Subscript t Baseline equals left-parenthesis x Subscript 1 t Baseline comma ellipsis comma x Subscript r t Baseline right-parenthesis prime, t equals 1 comma ellipsis comma upper T comma is an r-dimensional time series vector of exogenous variables and normal upper Theta Superscript asterisk Baseline left-parenthesis z right-parenthesis is the order s matrix polynomial defined as normal upper Theta Superscript asterisk Baseline left-parenthesis z right-parenthesis equals normal upper Theta 0 Superscript asterisk Baseline plus normal upper Theta 1 Superscript asterisk Baseline z plus midline-horizontal-ellipsis plus normal upper Theta Subscript s Superscript asterisk Baseline z Superscript s for some k times r real matrices normal upper Theta Subscript i Superscript asterisk, i equals 1 comma ellipsis comma s.

The following statements estimate a bivariate VARFIMAXleft-parenthesis 1 comma upper D comma 1 comma 0 right-parenthesis model:

model y1 y2 = x1 / fi p=1 q=1;
Last updated: June 19, 2025