The SURVEYSELECT Procedure

Example 124.3 PPS (Dollar-Unit) Sampling

(View the complete code for this example.)

A small company wants to audit employee travel expenses in an effort to improve the expense reporting procedure and possibly reduce expenses. The company does not have resources to examine all expense reports and wants to use statistical sampling to objectively select expense reports for audit.

The data set TravelExpense contains the dollar amount of all employee travel expense transactions during the past month:

data TravelExpense;
   input ID$ Amount @@;
   if (Amount < 500) then Level='1_Low ';
      else if (Amount > 1500) then Level='3_High';
      else Level='2_Avg ';
   datalines;
110  237.18   002  567.89   234  118.50
743   74.38   411 1287.23   782  258.10
216  325.36   174  218.38   568 1670.80
302  134.71   285 2020.70   314   47.80
139 1183.45   775  330.54   425  780.10
506  895.80   239  620.10   011  420.18
672  979.66   142  810.25   738  670.85
192  314.58   243   87.50   263 1893.40
496  753.30   332  540.65   486 2580.35
614  230.56   654  185.60   308  688.43
784  505.14   017  205.48   162  650.42
289 1348.34   691   30.50   545 2214.80
517  940.35   382  217.85   024  142.90
478  806.90   107  560.72
;

In the SAS data set TravelExpense, the variable ID identifies the travel expense report. The variable Amount contains the dollar amount of the reported expense. The variable Level equals '1_Low', '2_Avg', or '3_High', depending on the value of Amount.

In the sample design for this audit, expense reports are stratified by Level. This ensures that each of these expense levels is included in the sample and also permits a disproportionate allocation of the sample, selecting proportionately more of the expense reports from the higher levels. Within strata, the sample of expense reports is selected with probability proportional to the amount of the expense, thus giving a greater chance of selection to larger expenses. In auditing terms, this is known as monetary-unit sampling. For more information, see Wilburn (1984).

PROC SURVEYSELECT requires that the input data set be sorted by the STRATA variables. The following PROC SORT statements sort the TravelExpense data set by the stratification variable Level.

proc sort data=TravelExpense;
   by Level;
run;

Output 124.3.1 displays the sampling frame data set TravelExpense, which contains 41 observations.

Output 124.3.1: Sampling Frame

Travel Expense Audit

ObsIDAmountLevel
1110237.181_Low
2234118.501_Low
374374.381_Low
4782258.101_Low
5216325.361_Low
6174218.381_Low
7302134.711_Low
831447.801_Low
9775330.541_Low
10011420.181_Low
11192314.581_Low
1224387.501_Low
13614230.561_Low
14654185.601_Low
15017205.481_Low
1669130.501_Low
17382217.851_Low
18024142.901_Low
19002567.892_Avg
204111287.232_Avg
211391183.452_Avg
22425780.102_Avg
23506895.802_Avg
24239620.102_Avg
25672979.662_Avg
26142810.252_Avg
27738670.852_Avg
28496753.302_Avg
29332540.652_Avg
30308688.432_Avg
31784505.142_Avg
32162650.422_Avg
332891348.342_Avg
34517940.352_Avg
35478806.902_Avg
36107560.722_Avg
375681670.803_High
382852020.703_High
392631893.403_High
404862580.353_High
415452214.803_High


The following PROC SURVEYSELECT statements select a probability sample of expense reports from the TravelExpense data set by using the stratified design with PPS selection within strata:

title1 'Travel Expense Audit';
title2 'Stratified PPS (Dollar-Unit) Sampling';
proc surveyselect data=TravelExpense method=pps n=(6 10 4)
                  seed=47279 out=AuditSample;
   size Amount;
   strata Level;
run;

The STRATA statement names the stratification variable Level. The SIZE statement specifies the size measure variable Amount. In the PROC SURVEYSELECT statement, the METHOD=PPS option requests sample selection with probability proportional to size and without replacement. The N=(6 10 4) option specifies the stratum sample sizes by listing the sample sizes in the same order as the strata appear in the TravelExpense data set. The sample size of 6 corresponds to the first stratum (Level = '1_Low'); the sample size of 10 corresponds to the second stratum (Level = '2_Avg'); and the sample size of 4 corresponds to the last stratum (Level = '3_High'). The SEED= option specifies 47279 as the initial seed for random number generation.

Output 124.3.2 displays the output from PROC SURVEYSELECT. A total of 20 expense reports are selected for audit. The data set AuditSample contains the sample of travel expense reports.

Output 124.3.2: Sample Selection Summary

Travel Expense Audit
Stratified PPS (Dollar-Unit) Sampling

The SURVEYSELECT Procedure

Selection MethodPPS, Without Replacement
Size MeasureAmount
Strata VariableLevel

Input Data SetTRAVELEXPENSE
Random Number Seed47279
Number of Strata3
Total Sample Size20
Output Data SetAUDITSAMPLE


The following PROC PRINT statements display the audit sample, which is shown in Output 124.3.3:

title1 'Travel Expense Audit';
title2 'Sample Selected by Stratified PPS Design';
proc print data=AuditSample;
run;

Output 124.3.3: Audit Sample

Travel Expense Audit
Sample Selected by Stratified PPS Design

ObsLevelIDAmountSelectionProbSamplingWeight
11_Low024142.900.239494.17553
21_Low614230.560.386402.58797
31_Low110237.180.397502.51574
41_Low782258.100.432562.31183
51_Low192314.580.527211.89676
61_Low216325.360.545281.83392
72_Avg239620.100.425032.35278
82_Avg308688.430.471862.11925
92_Avg496753.300.516331.93676
102_Avg478806.900.553071.80810
112_Avg142810.250.555361.80063
122_Avg517940.350.644541.55151
132_Avg672979.660.671481.48925
142_Avg1391183.450.811161.23280
152_Avg4111287.230.882291.13341
162_Avg2891348.340.924181.08204
173_High5681670.800.643851.55316
183_High2631893.400.729631.37056
193_High5452214.800.853481.17167
203_High4862580.350.994351.00568


Last updated: February 21, 2025