This section applies to actions in the following action sets: factmac and svm.
Action sets in this book require the input data to reside on a CAS server. Some of these action sets are sensitive to the order of the data on the CAS server, and they might provide different results when the order of the data changes or the distribution of the data across workers changes, even when the parameters (including the seed) are the same. The listed action sets include the applyRowOrder parameter, which enables you to use a prespecified ordering of the data and a prespecified distribution of observations to threads when you perform an analysis of the data.
The applyRowOrder parameter requires you to use the partition action in the table action set. The partition action distributes the data to threads and workers in your distributed system according to a partition variable that you specify. All rows of data that have the same partition key value reside on the same worker and the same thread in your distributed system. In addition to the partition key, an order-by key enables you to specify the order of the rows of data on each thread of the worker. This key must be unique to each row in order to provide reproducible row ordering.
For example, you can use the CAS procedure as follows to run the partition action:
proc cas;
action table.partition /
table={name="inData",
groupby={"partitionKey"},
orderby={"orderbyKey"}
},
casout={name="inData2", replace=True};
run;
The partition action redistributes the data that are specified by the inData table according to the variables that are specified in the groupby and orderby parameters. The redistributed data are saved in the data table named inData2. The groupby parameter specifies the partition key, and the orderby parameter specifies the order-by key.
Only actions that support the applyRowOrder parameter can use the data ordering that the partition action provides. If you want the action to use this data ordering, specify the applyRowOrder parameter in the action call. For example, the following forestTrain action call indicates that you want to use the ordered inData2 data table, making use of the row order that you previously applied:
proc cas
action decisionTree.forestTrain /
table={name="inData2"},
applyroworder=True
...;
run;
If you have not previously run the partition action along with the groupby and orderby parameters before running an action with the applyRowOrder parameter, then the action issues a warning and the parameter has no other effect.
Finally, note that even if you have the same data with the same partition keys and order-by keys, the distribution will be different between systems that have different numbers of threads or different numbers of workers.