DATASOURCE Procedure

OUT= Data Set

The OUT= data set can contain the following variables:

  • the BY variables, which identify cross-sectional dimensions when the input data file contains time series replicated for different values of the BY variables. Use the BY variables in a WHERE statement to process the OUT= data set by cross sections. The order in which BY variables are defined in the OUT= data set corresponds to the order in which the data file is sorted.

  • DATE, a SAS date-, time-, or datetime-valued variable that reports the time period of each observation. The values of the DATE variable can span different time ranges for different BY groups. The format of the DATE variable depends on the INTERVAL= option.

  • the periodic time series variables, which are included in the OUT= data set only if they have data in at least one selected BY group and they are not discarded by a KEEP or DROP statement

  • the event variables, which are included in the OUT= data set if they are not discarded by a KEEP or DROP statement. By default, these variables are not output to the OUT= data set.

The values of BY variables remain constant in each cross section. Observations within each BY group correspond to the sampling of the series variables at the time periods indicated by the DATE variable.

You can create a set of single indexes for the OUT= data set by using the INDEX option, provided there are BY variables. Under some circumstances, this might increase the efficiency of subsequent PROC and DATA steps that use BY and WHERE statements. However, there is a cost associated with creation and maintenance of indexes. The SAS Programmers Guide: Essentials lists the conditions under which the benefits of indexes outweigh the cost.

With data files containing cross sections, there can be various degrees of overlap among the series variables. One extreme is when all the series variables contain data for all the cross sections. In this case, the output data set is very compact. In the other extreme case, however, the set of time series variables are unique for each cross section, making the output data set very sparse, as depicted in Table 4.

Table 4: The OUT= Data Set Containing Unique Series for Each BY Group

BY Series in Series in ellipsis Series in
Variables first BY group second BY group ellipsis last BY group
BY1 ellipsis BYP F1 F2 F3 ellipsis FN S1 S2 S3 ellipsis SM ellipsis T1 T2 T3 ellipsis TK
BY DATA
group is
1 here
BY DATA Data is missing
group is everywhere except
2 here on diagonal
DATA
vertical-ellipsis is
here
BY DATA
group is
N here


The data in Table 4 can be represented more compactly if cross-sectional information is incorporated into series variable names.

Last updated: June 19, 2025