Provides actions for fitting Bayesian additive regression trees models
Fits probit Bayesian additive regression trees (BART) models to binary distributed response data..
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the input data table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit. |
|
|
— |
||
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
required parametercasout |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples. |
|
|
— |
stores the model in a binary table object that you can use for scoring. |
specifies the significance level to use for constructing equal-tail credible limits for predictive margins.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
| Default | FALSE |
|---|
changes the attributes of variables used in the action. Currently, attributes specified on the inputs and nominal parameters are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | attribute |
|---|
names the classification variables to use as explanatory variables in the analysis.
| Alias | classVars |
|---|
The classStatement value can be one or more of the following:
when set to True, reverses the sort order that is imposed by the order parameter.
| Default | FALSE |
|---|
specifies the sort order for the levels of the classification variable. This ordering determines which parameters in the model correspond to each level in the data.
specifies the reference level to use when you specify a nonsingular parameterization in the param parameter. For an individual variable, you can specify the level of the variable to use as the reference level. If the action supports the global class options parameter, then you can specify FIRST or LAST.
specifies the classification variables.
| Alias | name |
|---|
specifies differences of predictive margins.
| Alias | diffs |
|---|
The bartScoreMargin_scoreDiff value can be one or more of the following:
specifies the event predictive margin by its name.
| Alias | evtScen |
|---|
labels the difference in predictive margins in output tables.
names the difference in predictive margins in output tables.
specifies the reference predictive margin by its name.
| Alias | refScen |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
specifies a distributed mode that divides the MCMC sampling in a grid environment. This mode distributes the training data to workers so that the specified number of workers have a full copy of the training data and run a separate chain. This parameter is not applicable when you are in single-machine mode. When you specify a value of 0, a single chain is run, and each worker node is assigned a portion of the training data.
| Minimum value | 0 |
|---|
names the numeric variable that contains the frequency of occurrence for each observation.
specifies the input variables to use in the analysis.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | input |
|---|
specifies the value used to determine the prior variance for the leaf parameter.
| Default | 2 |
|---|---|
| Minimum value (exclusive) | 0 |
specifies a predictive margin.
| Alias | scenarios |
|---|
The bartScoreMargin_evaluate value can be one or more of the following:
specifies the variables to modify in a predictive margin and the values they are set to.
| Alias | evaluate |
|---|
The bartScoreMargin_varValue value can be one or more of the following:
specifies the value a variable is set to in the predictive margin. For continuous variables, a numeric value is specified. For classification variables, the formatted level is specified.
names a variable to modify in a predictive margin.
| Alias | variable |
|---|
labels the predictive margin in output tables.
names the predictive margin in output tables.
specifies an upper limit (in seconds) on the time for MCMC sampling.
| Alias | maxTime |
|---|---|
| Minimum value (exclusive) | 0 |
specifies the minimum number of observations that each child of a split must contain in the training data in order for the split to be considered.
| Alias | leafSize |
|---|---|
| Default | 5 |
| Minimum value | 1 |
specifies how to handle missing values in predictor variables.
| Default | SEPARATE |
|---|
during the training phase, treats missing values for continuous predictors as the largest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the right branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as the smallest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the left branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, excludes all observations that have a missing predictor value. In the scoring phase, observations that have missing values or observations whose unknown categorical predictor level is unknown are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as a separate group and treats missing values for categorical predictors as a separate level. In the training phase, when a split operation is sampled for a continuous predictor and there are observations that have a missing value of the splitting variable on the node, a primary rule for routing missing values is sampled before the primary splitting rule for nonmissing values is sampled. If a continuous predictor does not have a missing value on the node that you are splitting, a primary rule for routing missing values is not sampled. In the scoring phase, observations that have an unknown categorical predictor level or have a missing continuous predictor value for a node without a primary rule for routing missing values are assigned to the larger branch of the split.
names the dependent variable and explanatory effects.
The bartProbitModel value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of burn-in iterations to perform before the action starts to save samples for prediction.
| Alias | burnin |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the number of bins to use for binning continuous input variables.
| Default | 50 |
|---|---|
| Minimum value | 2 |
limits the display of class levels. The value 0 suppresses all levels.
| Minimum value | 0 |
|---|
specifies the number of MCMC iterations, excluding the burn-in iterations. This is the MCMC sample size if the thinning rate is 1. This option is ignored if you specify the nMCDist parameter and you run distributed chains.
| Default | 1000 |
|---|---|
| Minimum value | 1 |
specifies the nominal input variables to use in the analysis.
For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | nominal |
|---|
specifies the thinning rate of the simulation.
| Alias | thin |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the number of trees in a sample of the sum-of-trees ensemble.
| Default | 200 |
|---|---|
| Minimum value | 1 |
when set to True, stores a mapping of each observation to terminal nodes in memory when the model is trained.
| Default | FALSE |
|---|
specifies a numeric offset variable. This variable cannot be a classification variable, a response variable, or one of the explanatory variables.
specifies the minimum cardinality for which a categorical input uses splitting rules according to level ordering.
| Default | 50 |
|---|---|
| Minimum value (exclusive) | 0 |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit.
The bartBinOutputStatement value can be one or more of the following:
specifies the significance level to use for the construction of all equal-tail credible limits.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
when set to FALSE, predictions from each MCMC sample are included in the output table in addition to the sample average predictions.
| Alias | averageOnly |
|---|---|
| Default | TRUE |
specifies the settings for an output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.
names the predicted response level. The default name is Into.
specifies the predicted event probability that determines the predicted binary response level.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
names the equal-tail lower credible limit.
names the predicted value. If you do not specify any output statistics, then the predicted value is named Pred by default.
| Aliases | p |
|---|---|
| predicted |
names the residual.
| Aliases | r |
|---|---|
| residual |
identifies the training and test roles for observations.
names the equal-tail upper credible limit.
For more information about specifying the outputMargins parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies the fraction of the data to be used for testing.
The partByFracStatement value can be one or more of the following:
specifies the seed to use in the random number generator that is used for partitioning the data.
| Default | 0 |
|---|
randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.
| Range | 0–1 |
|---|
names the variable and its values used to partition the data into training and testing roles.
| Long form | partByVar={name="variable-name"} |
|---|---|
| Shortcut form | partByVar="variable-name" |
The partByVarStatement value can be one or more of the following:
names the variable in the input table whose values are used to assign roles to each observation.
specifies the formatted value of the variable that is used to assign observations to the testing role.
specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.
when set to True, specifies that bin boundaries are set at quantiles of numeric inputs instead of bins of equal width.
| Aliases | qbin |
|---|---|
| qtbin | |
| Default | TRUE |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
The bartProbit_sampleSummary value can be one or more of the following:
names the variable that contains average number of nodes per tree in the sample.
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
For more information about specifying the casout parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
names the variable that contains proportion of accepted tree modifications.
names the variable that contains an indicator for whether the sample is saved for prediction.
specifies a seed for starting the pseudorandom number generator.
| Default | 0 |
|---|---|
| Range | 0–4294967295 |
stores the model in a binary table object that you can use for scoring.
For more information about specifying the store parameter, see the common casouttablebasic parameter (Appendix A: Common Parameters).
| Aliases | savemodel |
|---|---|
| save | |
| savestate |
specifies the input data table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the target variable.
when set to True, stores data in memory when the model is trained.
| Default | FALSE |
|---|
specifies the regularization prior for the sum-of-trees ensemble.
The bart_treePrior value can be one or more of the following:
specifies the base probability for splitting an internal node as a function of its depth from the root. A larger base probability value makes splitting a node more likely.
| Default | 0.95 |
|---|---|
| Range | (0, 1) |
specifies the power parameter used to compute the probability of splitting an internal node as a function of its depth from the root. A larger depth power value decreases the probability of splitting a node.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the probability of sampling the operation of pruning a pair of terminal nodes for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
specifies the probability of sampling the operation of splitting a terminal node for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
Fits probit Bayesian additive regression trees (BART) models to binary distributed response data..
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the input data table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit. |
|
|
— |
||
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
required parametercasout |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples. |
|
|
— |
stores the model in a binary table object that you can use for scoring. |
specifies the significance level to use for constructing equal-tail credible limits for predictive margins.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
| Default | false |
|---|
changes the attributes of variables used in the action. Currently, attributes specified on the inputs and nominal parameters are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | attribute |
|---|
names the classification variables to use as explanatory variables in the analysis.
| Alias | classVars |
|---|
The classStatement value can be one or more of the following:
when set to True, reverses the sort order that is imposed by the order parameter.
| Default | false |
|---|
specifies the sort order for the levels of the classification variable. This ordering determines which parameters in the model correspond to each level in the data.
specifies the reference level to use when you specify a nonsingular parameterization in the param parameter. For an individual variable, you can specify the level of the variable to use as the reference level. If the action supports the global class options parameter, then you can specify FIRST or LAST.
specifies the classification variables.
| Alias | name |
|---|
specifies differences of predictive margins.
| Alias | diffs |
|---|
The bartScoreMargin_scoreDiff value can be one or more of the following:
specifies the event predictive margin by its name.
| Alias | evtScen |
|---|
labels the difference in predictive margins in output tables.
names the difference in predictive margins in output tables.
specifies the reference predictive margin by its name.
| Alias | refScen |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
specifies a distributed mode that divides the MCMC sampling in a grid environment. This mode distributes the training data to workers so that the specified number of workers have a full copy of the training data and run a separate chain. This parameter is not applicable when you are in single-machine mode. When you specify a value of 0, a single chain is run, and each worker node is assigned a portion of the training data.
| Minimum value | 0 |
|---|
names the numeric variable that contains the frequency of occurrence for each observation.
specifies the input variables to use in the analysis.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | input |
|---|
specifies the value used to determine the prior variance for the leaf parameter.
| Default | 2 |
|---|---|
| Minimum value (exclusive) | 0 |
specifies a predictive margin.
| Alias | scenarios |
|---|
The bartScoreMargin_evaluate value can be one or more of the following:
specifies the variables to modify in a predictive margin and the values they are set to.
| Alias | evaluate |
|---|
The bartScoreMargin_varValue value can be one or more of the following:
specifies the value a variable is set to in the predictive margin. For continuous variables, a numeric value is specified. For classification variables, the formatted level is specified.
names a variable to modify in a predictive margin.
| Alias | variable |
|---|
labels the predictive margin in output tables.
names the predictive margin in output tables.
specifies an upper limit (in seconds) on the time for MCMC sampling.
| Alias | maxTime |
|---|---|
| Minimum value (exclusive) | 0 |
specifies the minimum number of observations that each child of a split must contain in the training data in order for the split to be considered.
| Alias | leafSize |
|---|---|
| Default | 5 |
| Minimum value | 1 |
specifies how to handle missing values in predictor variables.
| Default | SEPARATE |
|---|
during the training phase, treats missing values for continuous predictors as the largest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the right branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as the smallest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the left branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, excludes all observations that have a missing predictor value. In the scoring phase, observations that have missing values or observations whose unknown categorical predictor level is unknown are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as a separate group and treats missing values for categorical predictors as a separate level. In the training phase, when a split operation is sampled for a continuous predictor and there are observations that have a missing value of the splitting variable on the node, a primary rule for routing missing values is sampled before the primary splitting rule for nonmissing values is sampled. If a continuous predictor does not have a missing value on the node that you are splitting, a primary rule for routing missing values is not sampled. In the scoring phase, observations that have an unknown categorical predictor level or have a missing continuous predictor value for a node without a primary rule for routing missing values are assigned to the larger branch of the split.
names the dependent variable and explanatory effects.
The bartProbitModel value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of burn-in iterations to perform before the action starts to save samples for prediction.
| Alias | burnin |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the number of bins to use for binning continuous input variables.
| Default | 50 |
|---|---|
| Minimum value | 2 |
limits the display of class levels. The value 0 suppresses all levels.
| Minimum value | 0 |
|---|
specifies the number of MCMC iterations, excluding the burn-in iterations. This is the MCMC sample size if the thinning rate is 1. This option is ignored if you specify the nMCDist parameter and you run distributed chains.
| Default | 1000 |
|---|---|
| Minimum value | 1 |
specifies the nominal input variables to use in the analysis.
For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | nominal |
|---|
specifies the thinning rate of the simulation.
| Alias | thin |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the number of trees in a sample of the sum-of-trees ensemble.
| Default | 200 |
|---|---|
| Minimum value | 1 |
when set to True, stores a mapping of each observation to terminal nodes in memory when the model is trained.
| Default | false |
|---|
specifies a numeric offset variable. This variable cannot be a classification variable, a response variable, or one of the explanatory variables.
specifies the minimum cardinality for which a categorical input uses splitting rules according to level ordering.
| Default | 50 |
|---|---|
| Minimum value (exclusive) | 0 |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit.
The bartBinOutputStatement value can be one or more of the following:
specifies the significance level to use for the construction of all equal-tail credible limits.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
when set to FALSE, predictions from each MCMC sample are included in the output table in addition to the sample average predictions.
| Alias | averageOnly |
|---|---|
| Default | true |
specifies the settings for an output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.
names the predicted response level. The default name is Into.
specifies the predicted event probability that determines the predicted binary response level.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
names the equal-tail lower credible limit.
names the predicted value. If you do not specify any output statistics, then the predicted value is named Pred by default.
| Aliases | p |
|---|---|
| predicted |
names the residual.
| Aliases | r |
|---|---|
| residual |
identifies the training and test roles for observations.
names the equal-tail upper credible limit.
For more information about specifying the outputMargins parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies the fraction of the data to be used for testing.
The partByFracStatement value can be one or more of the following:
specifies the seed to use in the random number generator that is used for partitioning the data.
| Default | 0 |
|---|
randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.
| Range | 0–1 |
|---|
names the variable and its values used to partition the data into training and testing roles.
| Long form | partByVar={name="variable-name"} |
|---|---|
| Shortcut form | partByVar="variable-name" |
The partByVarStatement value can be one or more of the following:
names the variable in the input table whose values are used to assign roles to each observation.
specifies the formatted value of the variable that is used to assign observations to the testing role.
specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.
when set to True, specifies that bin boundaries are set at quantiles of numeric inputs instead of bins of equal width.
| Aliases | qbin |
|---|---|
| qtbin | |
| Default | true |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
The bartProbit_sampleSummary value can be one or more of the following:
names the variable that contains average number of nodes per tree in the sample.
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
For more information about specifying the casout parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
names the variable that contains proportion of accepted tree modifications.
names the variable that contains an indicator for whether the sample is saved for prediction.
specifies a seed for starting the pseudorandom number generator.
| Default | 0 |
|---|---|
| Range | 0–4294967295 |
stores the model in a binary table object that you can use for scoring.
For more information about specifying the store parameter, see the common casouttablebasic parameter (Appendix A: Common Parameters).
| Aliases | savemodel |
|---|---|
| save | |
| savestate |
specifies the input data table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the target variable.
when set to True, stores data in memory when the model is trained.
| Default | false |
|---|
specifies the regularization prior for the sum-of-trees ensemble.
The bart_treePrior value can be one or more of the following:
specifies the base probability for splitting an internal node as a function of its depth from the root. A larger base probability value makes splitting a node more likely.
| Default | 0.95 |
|---|---|
| Range | (0, 1) |
specifies the power parameter used to compute the probability of splitting an internal node as a function of its depth from the root. A larger depth power value decreases the probability of splitting a node.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the probability of sampling the operation of pruning a pair of terminal nodes for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
specifies the probability of sampling the operation of splitting a terminal node for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
Fits probit Bayesian additive regression trees (BART) models to binary distributed response data..
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the input data table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit. |
|
|
— |
||
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
required parametercasout |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples. |
|
|
— |
stores the model in a binary table object that you can use for scoring. |
specifies the significance level to use for constructing equal-tail credible limits for predictive margins.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
| Default | False |
|---|
changes the attributes of variables used in the action. Currently, attributes specified on the inputs and nominal parameters are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | attribute |
|---|
names the classification variables to use as explanatory variables in the analysis.
| Alias | classVars |
|---|
The classStatement value can be one or more of the following:
when set to True, reverses the sort order that is imposed by the order parameter.
| Default | False |
|---|
specifies the sort order for the levels of the classification variable. This ordering determines which parameters in the model correspond to each level in the data.
specifies the reference level to use when you specify a nonsingular parameterization in the param parameter. For an individual variable, you can specify the level of the variable to use as the reference level. If the action supports the global class options parameter, then you can specify FIRST or LAST.
specifies the classification variables.
| Alias | name |
|---|
specifies differences of predictive margins.
| Alias | diffs |
|---|
The bartScoreMargin_scoreDiff value can be one or more of the following:
specifies the event predictive margin by its name.
| Alias | evtScen |
|---|
labels the difference in predictive margins in output tables.
names the difference in predictive margins in output tables.
specifies the reference predictive margin by its name.
| Alias | refScen |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
specifies a distributed mode that divides the MCMC sampling in a grid environment. This mode distributes the training data to workers so that the specified number of workers have a full copy of the training data and run a separate chain. This parameter is not applicable when you are in single-machine mode. When you specify a value of 0, a single chain is run, and each worker node is assigned a portion of the training data.
| Minimum value | 0 |
|---|
names the numeric variable that contains the frequency of occurrence for each observation.
specifies the input variables to use in the analysis.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | input |
|---|
specifies the value used to determine the prior variance for the leaf parameter.
| Default | 2 |
|---|---|
| Minimum value (exclusive) | 0 |
specifies a predictive margin.
| Alias | scenarios |
|---|
The bartScoreMargin_evaluate value can be one or more of the following:
specifies the variables to modify in a predictive margin and the values they are set to.
| Alias | evaluate |
|---|
The bartScoreMargin_varValue value can be one or more of the following:
specifies the value a variable is set to in the predictive margin. For continuous variables, a numeric value is specified. For classification variables, the formatted level is specified.
names a variable to modify in a predictive margin.
| Alias | variable |
|---|
labels the predictive margin in output tables.
names the predictive margin in output tables.
specifies an upper limit (in seconds) on the time for MCMC sampling.
| Alias | maxTime |
|---|---|
| Minimum value (exclusive) | 0 |
specifies the minimum number of observations that each child of a split must contain in the training data in order for the split to be considered.
| Alias | leafSize |
|---|---|
| Default | 5 |
| Minimum value | 1 |
specifies how to handle missing values in predictor variables.
| Default | SEPARATE |
|---|
during the training phase, treats missing values for continuous predictors as the largest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the right branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as the smallest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the left branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, excludes all observations that have a missing predictor value. In the scoring phase, observations that have missing values or observations whose unknown categorical predictor level is unknown are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as a separate group and treats missing values for categorical predictors as a separate level. In the training phase, when a split operation is sampled for a continuous predictor and there are observations that have a missing value of the splitting variable on the node, a primary rule for routing missing values is sampled before the primary splitting rule for nonmissing values is sampled. If a continuous predictor does not have a missing value on the node that you are splitting, a primary rule for routing missing values is not sampled. In the scoring phase, observations that have an unknown categorical predictor level or have a missing continuous predictor value for a node without a primary rule for routing missing values are assigned to the larger branch of the split.
names the dependent variable and explanatory effects.
The bartProbitModel value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of burn-in iterations to perform before the action starts to save samples for prediction.
| Alias | burnin |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the number of bins to use for binning continuous input variables.
| Default | 50 |
|---|---|
| Minimum value | 2 |
limits the display of class levels. The value 0 suppresses all levels.
| Minimum value | 0 |
|---|
specifies the number of MCMC iterations, excluding the burn-in iterations. This is the MCMC sample size if the thinning rate is 1. This option is ignored if you specify the nMCDist parameter and you run distributed chains.
| Default | 1000 |
|---|---|
| Minimum value | 1 |
specifies the nominal input variables to use in the analysis.
For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | nominal |
|---|
specifies the thinning rate of the simulation.
| Alias | thin |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the number of trees in a sample of the sum-of-trees ensemble.
| Default | 200 |
|---|---|
| Minimum value | 1 |
when set to True, stores a mapping of each observation to terminal nodes in memory when the model is trained.
| Default | False |
|---|
specifies a numeric offset variable. This variable cannot be a classification variable, a response variable, or one of the explanatory variables.
specifies the minimum cardinality for which a categorical input uses splitting rules according to level ordering.
| Default | 50 |
|---|---|
| Minimum value (exclusive) | 0 |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit.
The bartBinOutputStatement value can be one or more of the following:
specifies the significance level to use for the construction of all equal-tail credible limits.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
when set to FALSE, predictions from each MCMC sample are included in the output table in addition to the sample average predictions.
| Alias | averageOnly |
|---|---|
| Default | True |
specifies the settings for an output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.
names the predicted response level. The default name is Into.
specifies the predicted event probability that determines the predicted binary response level.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
names the equal-tail lower credible limit.
names the predicted value. If you do not specify any output statistics, then the predicted value is named Pred by default.
| Aliases | p |
|---|---|
| predicted |
names the residual.
| Aliases | r |
|---|---|
| residual |
identifies the training and test roles for observations.
names the equal-tail upper credible limit.
For more information about specifying the outputMargins parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies the fraction of the data to be used for testing.
The partByFracStatement value can be one or more of the following:
specifies the seed to use in the random number generator that is used for partitioning the data.
| Default | 0 |
|---|
randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.
| Range | 0–1 |
|---|
names the variable and its values used to partition the data into training and testing roles.
| Long form | partByVar={"name":"variable-name"} |
|---|---|
| Shortcut form | partByVar="variable-name" |
The partByVarStatement value can be one or more of the following:
names the variable in the input table whose values are used to assign roles to each observation.
specifies the formatted value of the variable that is used to assign observations to the testing role.
specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.
when set to True, specifies that bin boundaries are set at quantiles of numeric inputs instead of bins of equal width.
| Aliases | qbin |
|---|---|
| qtbin | |
| Default | True |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
The bartProbit_sampleSummary value can be one or more of the following:
names the variable that contains average number of nodes per tree in the sample.
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
For more information about specifying the casout parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
names the variable that contains proportion of accepted tree modifications.
names the variable that contains an indicator for whether the sample is saved for prediction.
specifies a seed for starting the pseudorandom number generator.
| Default | 0 |
|---|---|
| Range | 0–4294967295 |
stores the model in a binary table object that you can use for scoring.
For more information about specifying the store parameter, see the common casouttablebasic parameter (Appendix A: Common Parameters).
| Aliases | savemodel |
|---|---|
| save | |
| savestate |
specifies the input data table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the target variable.
when set to True, stores data in memory when the model is trained.
| Default | False |
|---|
specifies the regularization prior for the sum-of-trees ensemble.
The bart_treePrior value can be one or more of the following:
specifies the base probability for splitting an internal node as a function of its depth from the root. A larger base probability value makes splitting a node more likely.
| Default | 0.95 |
|---|---|
| Range | (0, 1) |
specifies the power parameter used to compute the probability of splitting an internal node as a function of its depth from the root. A larger depth power value decreases the probability of splitting a node.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the probability of sampling the operation of pruning a pair of terminal nodes for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
specifies the probability of sampling the operation of splitting a terminal node for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
Fits probit Bayesian additive regression trees (BART) models to binary distributed response data..
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the input data table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit. |
|
|
— |
||
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
required parametercasout |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples. |
|
|
— |
stores the model in a binary table object that you can use for scoring. |
specifies the significance level to use for constructing equal-tail credible limits for predictive margins.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
| Default | FALSE |
|---|
changes the attributes of variables used in the action. Currently, attributes specified on the inputs and nominal parameters are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | attribute |
|---|
names the classification variables to use as explanatory variables in the analysis.
| Alias | classVars |
|---|
The classStatement value can be one or more of the following:
when set to True, reverses the sort order that is imposed by the order parameter.
| Default | FALSE |
|---|
specifies the sort order for the levels of the classification variable. This ordering determines which parameters in the model correspond to each level in the data.
specifies the reference level to use when you specify a nonsingular parameterization in the param parameter. For an individual variable, you can specify the level of the variable to use as the reference level. If the action supports the global class options parameter, then you can specify FIRST or LAST.
specifies the classification variables.
| Alias | name |
|---|
specifies differences of predictive margins.
| Alias | diffs |
|---|
The bartScoreMargin_scoreDiff value can be one or more of the following:
specifies the event predictive margin by its name.
| Alias | evtScen |
|---|
labels the difference in predictive margins in output tables.
names the difference in predictive margins in output tables.
specifies the reference predictive margin by its name.
| Alias | refScen |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
specifies a distributed mode that divides the MCMC sampling in a grid environment. This mode distributes the training data to workers so that the specified number of workers have a full copy of the training data and run a separate chain. This parameter is not applicable when you are in single-machine mode. When you specify a value of 0, a single chain is run, and each worker node is assigned a portion of the training data.
| Minimum value | 0 |
|---|
names the numeric variable that contains the frequency of occurrence for each observation.
specifies the input variables to use in the analysis.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | input |
|---|
specifies the value used to determine the prior variance for the leaf parameter.
| Default | 2 |
|---|---|
| Minimum value (exclusive) | 0 |
specifies a predictive margin.
| Alias | scenarios |
|---|
The bartScoreMargin_evaluate value can be one or more of the following:
specifies the variables to modify in a predictive margin and the values they are set to.
| Alias | evaluate |
|---|
The bartScoreMargin_varValue value can be one or more of the following:
specifies the value a variable is set to in the predictive margin. For continuous variables, a numeric value is specified. For classification variables, the formatted level is specified.
names a variable to modify in a predictive margin.
| Alias | variable |
|---|
labels the predictive margin in output tables.
names the predictive margin in output tables.
specifies an upper limit (in seconds) on the time for MCMC sampling.
| Alias | maxTime |
|---|---|
| Minimum value (exclusive) | 0 |
specifies the minimum number of observations that each child of a split must contain in the training data in order for the split to be considered.
| Alias | leafSize |
|---|---|
| Default | 5 |
| Minimum value | 1 |
specifies how to handle missing values in predictor variables.
| Default | SEPARATE |
|---|
during the training phase, treats missing values for continuous predictors as the largest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the right branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as the smallest machine value and treats missing values for categorical predictors as a separate level. In the scoring phase, observations that have missing continuous predictor values are assigned to the left branch of the split, and observations that have an unknown categorical predictor level are assigned to the larger branch of the split.
during the training phase, excludes all observations that have a missing predictor value. In the scoring phase, observations that have missing values or observations whose unknown categorical predictor level is unknown are assigned to the larger branch of the split.
during the training phase, treats missing values for continuous predictors as a separate group and treats missing values for categorical predictors as a separate level. In the training phase, when a split operation is sampled for a continuous predictor and there are observations that have a missing value of the splitting variable on the node, a primary rule for routing missing values is sampled before the primary splitting rule for nonmissing values is sampled. If a continuous predictor does not have a missing value on the node that you are splitting, a primary rule for routing missing values is not sampled. In the scoring phase, observations that have an unknown categorical predictor level or have a missing continuous predictor value for a node without a primary rule for routing missing values are assigned to the larger branch of the split.
names the dependent variable and explanatory effects.
The bartProbitModel value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of burn-in iterations to perform before the action starts to save samples for prediction.
| Alias | burnin |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the number of bins to use for binning continuous input variables.
| Default | 50 |
|---|---|
| Minimum value | 2 |
limits the display of class levels. The value 0 suppresses all levels.
| Minimum value | 0 |
|---|
specifies the number of MCMC iterations, excluding the burn-in iterations. This is the MCMC sample size if the thinning rate is 1. This option is ignored if you specify the nMCDist parameter and you run distributed chains.
| Default | 1000 |
|---|---|
| Minimum value | 1 |
specifies the nominal input variables to use in the analysis.
For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | nominal |
|---|
specifies the thinning rate of the simulation.
| Alias | thin |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the number of trees in a sample of the sum-of-trees ensemble.
| Default | 200 |
|---|---|
| Minimum value | 1 |
when set to True, stores a mapping of each observation to terminal nodes in memory when the model is trained.
| Default | FALSE |
|---|
specifies a numeric offset variable. This variable cannot be a classification variable, a response variable, or one of the explanatory variables.
specifies the minimum cardinality for which a categorical input uses splitting rules according to level ordering.
| Default | 50 |
|---|---|
| Minimum value (exclusive) | 0 |
creates a table on the server that contains observationwise statistics, which are computed after the model is fit.
The bartBinOutputStatement value can be one or more of the following:
specifies the significance level to use for the construction of all equal-tail credible limits.
| Default | 0.05 |
|---|---|
| Range | (0, 1) |
when set to FALSE, predictions from each MCMC sample are included in the output table in addition to the sample average predictions.
| Alias | averageOnly |
|---|---|
| Default | TRUE |
specifies the settings for an output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.
names the predicted response level. The default name is Into.
specifies the predicted event probability that determines the predicted binary response level.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
names the equal-tail lower credible limit.
names the predicted value. If you do not specify any output statistics, then the predicted value is named Pred by default.
| Aliases | p |
|---|---|
| predicted |
names the residual.
| Aliases | r |
|---|---|
| residual |
identifies the training and test roles for observations.
names the equal-tail upper credible limit.
For more information about specifying the outputMargins parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies the fraction of the data to be used for testing.
The partByFracStatement value can be one or more of the following:
specifies the seed to use in the random number generator that is used for partitioning the data.
| Default | 0 |
|---|
randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.
| Range | 0–1 |
|---|
names the variable and its values used to partition the data into training and testing roles.
| Long form | partByVar=list(name="variable-name") |
|---|---|
| Shortcut form | partByVar="variable-name" |
The partByVarStatement value can be one or more of the following:
names the variable in the input table whose values are used to assign roles to each observation.
specifies the formatted value of the variable that is used to assign observations to the testing role.
specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.
when set to True, specifies that bin boundaries are set at quantiles of numeric inputs instead of bins of equal width.
| Aliases | qbin |
|---|---|
| qtbin | |
| Default | TRUE |
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
The bartProbit_sampleSummary value can be one or more of the following:
names the variable that contains average number of nodes per tree in the sample.
creates a table on the server that contains a summary of the sum-of-trees ensemble samples.
For more information about specifying the casout parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
names the variable that contains proportion of accepted tree modifications.
names the variable that contains an indicator for whether the sample is saved for prediction.
specifies a seed for starting the pseudorandom number generator.
| Default | 0 |
|---|---|
| Range | 0–4294967295 |
stores the model in a binary table object that you can use for scoring.
For more information about specifying the store parameter, see the common casouttablebasic parameter (Appendix A: Common Parameters).
| Aliases | savemodel |
|---|---|
| save | |
| savestate |
specifies the input data table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the target variable.
when set to True, stores data in memory when the model is trained.
| Default | FALSE |
|---|
specifies the regularization prior for the sum-of-trees ensemble.
The bart_treePrior value can be one or more of the following:
specifies the base probability for splitting an internal node as a function of its depth from the root. A larger base probability value makes splitting a node more likely.
| Default | 0.95 |
|---|---|
| Range | (0, 1) |
specifies the power parameter used to compute the probability of splitting an internal node as a function of its depth from the root. A larger depth power value decreases the probability of splitting a node.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the probability of sampling the operation of pruning a pair of terminal nodes for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |
specifies the probability of sampling the operation of splitting a terminal node for the tree sampling algorithm. If you specify the pSplit and pPrune parameters, their values must sum to 1.
| Default | 0.5 |
|---|---|
| Range | (0, 1) |