(View the complete code for this example.)
Suppose the correlation structure that is required for a normal copula function is already known. For example, the correlation structure can be estimated from the historical data on default times in some industries, but this estimation is not within the scope of this example. The correlation structure is saved in a SAS data set called Inparm. The following statements and their output in Output 17.1.1 show that the correlation parameter is set at 0.8:
proc print data = inparm;
run;
Output 17.1.1: Copula Correlation Matrix
| Obs | Y1 | Y2 |
|---|---|---|
| 1 | 1.0 | 0.8 |
| 2 | 0.8 | 1.0 |
The following statements use PROC HPCOPULA to simulate the data:
/* simulate the data from bivariate normal copula */
proc hpcopula;
var Y1-Y2;
define cop normal (corr=inparm);
simulate cop /
ndraws = 1000000
seed = 1234
outuniform = normal_unifdata;
PERFORMANCE nthreads=4 details
host="&GRIDHOST" install="&GRIDINSTALLLOC";
run;
The VAR statement specifies the list of variables that contains the simulated data. The DEFINE statement assigns the name COP and specifies a normal copula that reads the correlation matrix from the Inparm data set. The SIMULATE statement refers to the COP label that is defined in the VAR statement and specifies several options: the NDRAWS= option specifies a sample size, the SEED= option specifies 1234 as the random number generator seed, and the OUTUNIFORM=NORMAL_UNIFDATA option names the output data set to contain the result of simulation in uniforms. The PERFORMANCE statement requests that the analytic computations be performed on four threads. Output 17.1.2 shows the run time of this particular simulation experiment.
Output 17.1.2: Run-Time Performance
| Performance Information | |
|---|---|
| Execution Mode | Single-Machine |
| Number of Threads | 4 |
| Procedure Task Timing | ||
|---|---|---|
| Task | Seconds | Percent |
| Simulation of Model | 0.24 | 100.00% |
The following DATA step transforms the variables from zero-one uniformly distributed to nonnegative exponentially distributed with parameter 0.5 and adds three indicator variables to the data set: SURVIVE1 and SURVIVE2 are equal to 1 if company 1 or company 2, respectively, has remained in business for more than three years, and SURVIVE is equal to 1 if both companies survived the same period together.
/* default time has exponential marginal distribution with parameter 0.5 */
data default;
set normal_unifdata;
array arr{2} Y1-Y2;
array time{2} time1-time2;
array surv{2} survive1-survive2;
lambda = 0.5;
do i=1 to 2;
time[i] = -log(1-arr[i])/lambda;
surv[i] = 0;
if (time[i] >3) then surv[i]=1;
end;
survive = 0;
if (time1 >3) && (time2 >3) then survive = 1;
run;
The first analysis step is to look at correlations between survival times of the two companies. You can perform this step by using the CORR procedure as follows:
proc corr data = default pearson kendall;
var time1 time2;
run;
Output 17.1.3 shows the output of this code. The output contains some descriptive statistics and two measures of correlation: Pearson and Kendall. Both measures indicate high and statistically significant dependence between the life spans of the two companies.
Output 17.1.3: Default Time Descriptive Statistics and Correlations
| 2 Variables: | time1 time2 |
|---|
| Simple Statistics | ||||||
|---|---|---|---|---|---|---|
| Variable | N | Mean | Std Dev | Median | Minimum | Maximum |
| time1 | 1000000 | 2.00023 | 1.99883 | 1.38633 | 3.85293E-6 | 24.18938 |
| time2 | 1000000 | 1.99897 | 1.99965 | 1.38476 | 6.43006E-6 | 25.85567 |
| Pearson Correlation Coefficients, N = 1000000 Prob > |r| under H0: Rho=0 |
||||||
|---|---|---|---|---|---|---|
| time1 | time2 | |||||
| time1 |
|
|
||||
| time2 |
|
|
||||
| Kendall Tau b Correlation Coefficients, N = 1000000 Prob > |tau| under H0: Tau=0 |
||||||
|---|---|---|---|---|---|---|
| time1 | time2 | |||||
| time1 |
|
|
||||
| time2 |
|
|
||||
The second and final step is to empirically estimate the default probabilities of the two companies. This is done by using the FREQ procedure as follows:
proc freq data=default;
table survive survive1-survive2;
run;
The results are shown in Output 17.1.4.
Output 17.1.4: Probabilities of Default
| survive | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| 0 | 852317 | 85.23 | 852317 | 85.23 |
| 1 | 147683 | 14.77 | 1000000 | 100.00 |
| survive1 | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| 0 | 776594 | 77.66 | 776594 | 77.66 |
| 1 | 223406 | 22.34 | 1000000 | 100.00 |
| survive2 | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|
| 0 | 777292 | 77.73 | 777292 | 77.73 |
| 1 | 222708 | 22.27 | 1000000 | 100.00 |
Output 17.1.4 shows that the empirical default probabilities are 78% and 78%. Assuming that these companies are independent yields the probability estimate that both companies default during the period of three years as 0.78*0.78=0.61 (61%). Comparing this naive estimate with the much higher actual 85% joint default probability illustrates that neglecting the correlation between the two companies significantly underestimates the probability of default.