Shared Concepts

Levelization of Classification Variables

This section applies to actions in the following action sets: gam, mixed, modelmatrix, nonlinear, pca, phreg, pls, quantreg, regression, and varReduce.

A classification variable enters the statistical analysis or model not through its values but through its levels. The process of associating values of a variable with levels is called levelization.

During the process of levelization, observations that share the same value are assigned to the same level. The manner in which values are grouped can be affected by the inclusion of formats. The sort order of the levels can be determined by specifying the order subparameter in the classGlobalOpts parameter. In actions in this book, you can also control the sorting order separately for each variable in the class parameter.

Consider the data on nine observations in Table 9. The variable A is integer-valued, and the variable X is a continuous variable that has a missing value for the fourth observation. The fourth and fifth columns of Table 9 apply two different formats to the variable X.

Table 9: Example Data for Levelization

Obs A x FORMAT x 3.0 FORMAT x 3.1
1 2 1.09 1 1.1
2 2 1.13 1 1.1
3 2 1.27 1 1.3
4 3 . . .
5 3 2.26 2 2.3
6 3 2.48 2 2.5
7 4 3.34 3 3.3
8 4 3.34 3 3.3
9 4 3.14 3 3.1


By default, levelization of the variables groups the observations by the formatted value of the variable, except for numerical variables for which no explicit format is provided. Numerical variables for which no explicit format is provided are sorted by their internal value. The levelization of the four columns in Table 9 leads to the level assignment in Table 10.

Table 10: Values and Levels

A X FORMAT x 3.0 FORMAT x 3.1
Obs Value Level Value Level Value Level Value Level
1 2 1 1.09 1 1 1 1.1 1
2 2 1 1.13 2 1 1 1.1 1
3 2 1 1.27 3 1 1 1.3 2
4 3 2 . . . . . .
5 3 2 2.26 4 2 2 2.3 3
6 3 2 2.48 5 2 2 2.5 4
7 4 3 3.34 7 3 3 3.3 6
8 4 3 3.34 7 3 3 3.3 6
9 4 3 3.14 6 3 3 3.1 5


The sort order for the levels of classification variables can be specified in the order subparameter in the class parameter.

When order is FORMATTED (which is the default) for numeric variables for which you have supplied no explicit format, the levels are ordered by their internal values. To order numeric classification levels that have no explicit format by their BEST12. formatted values, you can specify the BEST12. format explicitly for the classification variables.

Table 11 shows how values of the order subparameter are interpreted.

Table 11: Interpretation of Values of order Parameter

Value of order Levels Sorted By
FORMATTED External formatted value, except for numeric variables that have no explicit format, which are sorted by their unformatted (internal) value. The sort order is machine-dependent.
FREQ Descending frequency count (levels that have the most observations come first in the order)
INTERNAL Unformatted value. The sort order is machine-dependent.


For more information about sort order, see the chapter about the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Programmers Guide: Essentials.

When the countMissing subparameter is specified in the class parameter, the missing values ('.'—and '.A' through '.Z' for some programming languages—for a numeric variable and blanks for a character variable) are included in the levelization and are assigned a level. Table 12 displays the results of levelizing the values in Table 9 when the countMissing parameter is in effect.

Table 12: Values and Levels When the countMissing Parameter Is Specified

A X FORMAT x 3.0 FORMAT x 3.1
Obs Value Level Value Level Value Level Value Level
1 2 1 1.09 2 1 2 1.1 2
2 2 1 1.13 3 1 2 1.1 2
3 2 1 1.27 4 1 2 1.3 3
4 3 2 . 1 . 1 . 1
5 3 2 2.26 5 2 3 2.3 4
6 3 2 2.48 6 2 3 2.5 5
7 4 3 3.34 8 3 4 3.3 7
8 4 3 3.34 8 3 4 3.3 7
9 4 3 3.14 7 3 4 3.1 6


When the ignoreMissing subparameter is specified in the class parameter, then as one variable is being levelized, all missing values for the other variables are ignored.

When the countMissing and ignoreMissing subparameters are not specified, it is important to understand the implications of missing values for your statistical analysis. When an action in this book levelizes the classification variables, an observation for which any classification variable has a missing value is excluded from the analysis.

Actions in this book print a "Number of Observations" table that shows the number of observations that are read from the data set and the number of observations that are used in the analysis. Pay careful attention to this table—especially when your data table contains missing values—to ensure that no observations are unintentionally excluded from the analysis.

Last updated: March 05, 2026