Super Learner Action Set

Provides actions for training and scoring super learner models

slTrain Action

Trains super learner models.

CASL Syntax

superLearner.slTrain <result=results> <status=rc> /
applyRowOrder=TRUE | FALSE,
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
cvStratVar="variable-name",
display={
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
},
inputs={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
k=integer,
required parameter library={{
required parameter name="string",
trainOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}
}, {...}},
nominals={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
outputTables={
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=TRUE | FALSE,
replace=TRUE | FALSE
},
seed=64-bit-integer,
store={
caslib="string",
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
name="table-name",
onDemand=TRUE | FALSE,
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
},
required parameter table={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
onDemand=TRUE | FALSE,
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
required parameter target="variable-name"
;
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores the model in a binary table object that you can use for scoring.

Parameter Descriptions

applyRowOrder=TRUE | FALSE

when set to True, uses the available groupBy and orderBy information to group and order the data.

Default FALSE

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables that this action uses. Currently, attributes that are specified for the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

cvStratVar="variable-name"

specifies a categorical variable to use in the k-fold partitioning process for stratified cross-validation. This ensures that each fold contains a similar proportion of samples from each class of the specified variable.

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the input variables to use in the analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

k=integer

specifies the number (k) to use in the k-fold partitioning process for cross-validation.

Alias kFolds

* library={{slTrain_library-1} <, {slTrain_library-2}, ...>}

specifies the base learner library.

The slTrain_library value can be one or more of the following:

* modelType="ANN" | "BARTGAUSS" | "BARTPROBIT" | "BNET" | "DTREE" | "FACTMAC" | "FOREST" | "GAMPL" | "GAMSELECT" | "GBTREE" | "GLM" | "GPCLASS" | "GPREG" | "LGBM" | "LOGISTIC" | "SVM"

specifies the base learner model type.

ANN

specifies an artificial neural network model.

BARTGAUSS

specifies a Bayesian additive regression trees model.

BARTPROBIT

specifies a probit Bayesian additive regression trees model.

BNET

specifies a Bayesian network model.

DTREE

specifies a decision tree model.

FACTMAC

specifies a factorization machine model.

FOREST

specifies a forest model.

GAMPL

specifies a generalized additive model by penalized likelihood.

GAMSELECT

specifies a generalized additive model with model selection.

GBTREE

specifies a gradient boosting tree model.

GLM

specifies an ordinary linear least squares model.

GPCLASS

specifies a Gaussian process classification model.

GPREG

specifies a Gaussian process regression model.

LGBM

specifies a light gradient boosting tree model.

LOGISTIC

specifies a logistic regression model.

SVM

specifies a support vector machine model.

* name="string"

specifies the name of the base learner model.

trainOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies a list of parameters for the base learner model training action to use.

method="CCLS" | "CCNLL" | "CVSELECTOR"

specifies the meta-learning method. By default, the convex-constrained binomial log-likelihood maximization method is used for a binary target variable, and the convex-constrained least squares method is used for an interval target variable.

CCLS

specifies the convex-constrained least squares method. If you specify this method, the target variable must be continuous.

CCNLL

specifies the convex-constrained binomial log-likelihood maximization method. If you specify this method, the target variable must be binary.

CVSELECTOR

specifies the cross-validated selector.

nominals={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the nominal variables to use in the analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

seed=64-bit-integer

specifies the seed to use in the pseudorandom number generator that is used for k-fold partitioning.

Default 0
Range 0–4294967295

store={casouttablebasic}

stores the model in a binary table object that you can use for scoring.

Aliases savemodel
save
savestate
Long form store={name="table-name"}
Shortcut form store="table-name"

The casouttablebasic value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

indexVars={"variable-name-1" <, "variable-name-2", ...>}

specifies the list of variables to create indexes for in the output data.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

onDemand=TRUE | FALSE

This parameter is deprecated.

Default TRUE
promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default FALSE
replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default FALSE
replication=integer

specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.

Default 1
Minimum value 0
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

Long form table={name="table-name"}
Shortcut form table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default FALSE
computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases options
dataSource
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

onDemand=TRUE | FALSE

This parameter is deprecated.

Default TRUE
singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default FALSE
vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

slTrain Action

Trains super learner models.

Lua Syntax

results, info = s:superLearner_slTrain{
applyRowOrder=true | false,
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
cvStratVar="variable-name",
display={
caseSensitive=true | false,
exclude=true | false,
excludeAll=true | false,
keyIsPath=true | false,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=true | false
},
inputs={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
k=integer,
required parameter library={{
required parameter name="string",
trainOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}
}, {...}},
nominals={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
outputTables={
groupByVarsRaw=true | false,
includeAll=true | false,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=true | false,
replace=true | false
},
seed=64-bit-integer,
store={
caslib="string",
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
name="table-name",
onDemand=true | false,
promote=true | false,
replace=true | false,
replication=integer,
},
required parameter table={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
onDemand=true | false,
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
required parameter target="variable-name"
}
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores the model in a binary table object that you can use for scoring.

Parameter Descriptions

applyRowOrder=true | false

when set to True, uses the available groupBy and orderBy information to group and order the data.

Default false

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables that this action uses. Currently, attributes that are specified for the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

cvStratVar="variable-name"

specifies a categorical variable to use in the k-fold partitioning process for stratified cross-validation. This ensures that each fold contains a similar proportion of samples from each class of the specified variable.

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the input variables to use in the analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

k=integer

specifies the number (k) to use in the k-fold partitioning process for cross-validation.

Alias kFolds

* library={{slTrain_library-1} <, {slTrain_library-2}, ...>}

specifies the base learner library.

The slTrain_library value can be one or more of the following:

* modelType="ANN" | "BARTGAUSS" | "BARTPROBIT" | "BNET" | "DTREE" | "FACTMAC" | "FOREST" | "GAMPL" | "GAMSELECT" | "GBTREE" | "GLM" | "GPCLASS" | "GPREG" | "LGBM" | "LOGISTIC" | "SVM"

specifies the base learner model type.

ANN

specifies an artificial neural network model.

BARTGAUSS

specifies a Bayesian additive regression trees model.

BARTPROBIT

specifies a probit Bayesian additive regression trees model.

BNET

specifies a Bayesian network model.

DTREE

specifies a decision tree model.

FACTMAC

specifies a factorization machine model.

FOREST

specifies a forest model.

GAMPL

specifies a generalized additive model by penalized likelihood.

GAMSELECT

specifies a generalized additive model with model selection.

GBTREE

specifies a gradient boosting tree model.

GLM

specifies an ordinary linear least squares model.

GPCLASS

specifies a Gaussian process classification model.

GPREG

specifies a Gaussian process regression model.

LGBM

specifies a light gradient boosting tree model.

LOGISTIC

specifies a logistic regression model.

SVM

specifies a support vector machine model.

* name="string"

specifies the name of the base learner model.

trainOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies a list of parameters for the base learner model training action to use.

method="CCLS" | "CCNLL" | "CVSELECTOR"

specifies the meta-learning method. By default, the convex-constrained binomial log-likelihood maximization method is used for a binary target variable, and the convex-constrained least squares method is used for an interval target variable.

CCLS

specifies the convex-constrained least squares method. If you specify this method, the target variable must be continuous.

CCNLL

specifies the convex-constrained binomial log-likelihood maximization method. If you specify this method, the target variable must be binary.

CVSELECTOR

specifies the cross-validated selector.

nominals={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the nominal variables to use in the analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

seed=64-bit-integer

specifies the seed to use in the pseudorandom number generator that is used for k-fold partitioning.

Default 0
Range 0–4294967295

store={casouttablebasic}

stores the model in a binary table object that you can use for scoring.

Aliases savemodel
save
savestate
Long form store={name="table-name"}
Shortcut form store="table-name"

The casouttablebasic value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

indexVars={"variable-name-1" <, "variable-name-2", ...>}

specifies the list of variables to create indexes for in the output data.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

onDemand=true | false

This parameter is deprecated.

Default true
promote=true | false

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default false
replace=true | false

when set to True, overwrites an existing table that has the same name.

Default false
replication=integer

specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.

Default 1
Minimum value 0
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

Long form table={name="table-name"}
Shortcut form table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=true | false

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default false
computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases options
dataSource
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

onDemand=true | false

This parameter is deprecated.

Default true
singlePass=true | false

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default false
vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

slTrain Action

Trains super learner models.

Python Syntax

results=s.superLearner.slTrain(
applyRowOrder=True | False,
attributes=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
cvStratVar="variable-name",
display={
"caseSensitive":True | False,
"exclude":True | False,
"excludeAll":True | False,
"keyIsPath":True | False,
"names":["string-1" <, "string-2", ...>],
"pathType":"LABEL" | "NAME",
"traceNames":True | False
},
inputs=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
k=integer,
required parameter library=[{
required parameter "name":"string",
"trainOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>}
}<, {...}>],
nominals=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
outputTables={
"groupByVarsRaw":True | False,
"includeAll":True | False,
"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},
"repeated":True | False,
"replace":True | False
},
seed=64-bit-integer,
store={
"caslib":"string",
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"name":"table-name",
"onDemand":True | False,
"promote":True | False,
"replace":True | False,
"replication":integer,
},
required parameter table={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"onDemand":True | False,
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
required parameter target="variable-name"
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores the model in a binary table object that you can use for scoring.

Parameter Descriptions

applyRowOrder=True | False

when set to True, uses the available groupBy and orderBy information to group and order the data.

Default False

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables that this action uses. Currently, attributes that are specified for the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

cvStratVar="variable-name"

specifies a categorical variable to use in the k-fold partitioning process for stratified cross-validation. This ensures that each fold contains a similar proportion of samples from each class of the specified variable.

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

inputs=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the input variables to use in the analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

k=integer

specifies the number (k) to use in the k-fold partitioning process for cross-validation.

Alias kFolds

* library=[{slTrain_library-1} <, {slTrain_library-2}, ...>]

specifies the base learner library.

The slTrain_library value can be one or more of the following:

* "modelType":"ANN" | "BARTGAUSS" | "BARTPROBIT" | "BNET" | "DTREE" | "FACTMAC" | "FOREST" | "GAMPL" | "GAMSELECT" | "GBTREE" | "GLM" | "GPCLASS" | "GPREG" | "LGBM" | "LOGISTIC" | "SVM"

specifies the base learner model type.

ANN

specifies an artificial neural network model.

BARTGAUSS

specifies a Bayesian additive regression trees model.

BARTPROBIT

specifies a probit Bayesian additive regression trees model.

BNET

specifies a Bayesian network model.

DTREE

specifies a decision tree model.

FACTMAC

specifies a factorization machine model.

FOREST

specifies a forest model.

GAMPL

specifies a generalized additive model by penalized likelihood.

GAMSELECT

specifies a generalized additive model with model selection.

GBTREE

specifies a gradient boosting tree model.

GLM

specifies an ordinary linear least squares model.

GPCLASS

specifies a Gaussian process classification model.

GPREG

specifies a Gaussian process regression model.

LGBM

specifies a light gradient boosting tree model.

LOGISTIC

specifies a logistic regression model.

SVM

specifies a support vector machine model.

* "name":"string"

specifies the name of the base learner model.

"trainOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>}

specifies a list of parameters for the base learner model training action to use.

method="CCLS" | "CCNLL" | "CVSELECTOR"

specifies the meta-learning method. By default, the convex-constrained binomial log-likelihood maximization method is used for a binary target variable, and the convex-constrained least squares method is used for an interval target variable.

CCLS

specifies the convex-constrained least squares method. If you specify this method, the target variable must be continuous.

CCNLL

specifies the convex-constrained binomial log-likelihood maximization method. If you specify this method, the target variable must be binary.

CVSELECTOR

specifies the cross-validated selector.

nominals=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the nominal variables to use in the analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

seed=64-bit-integer

specifies the seed to use in the pseudorandom number generator that is used for k-fold partitioning.

Default 0
Range 0–4294967295

store={casouttablebasic}

stores the model in a binary table object that you can use for scoring.

Aliases savemodel
save
savestate
Long form store={"name":"table-name"}
Shortcut form store="table-name"

The casouttablebasic value can be one or more of the following:

"caslib":"string"

specifies the name of the caslib for the output table.

"indexVars":["variable-name-1" <, "variable-name-2", ...>]

specifies the list of variables to create indexes for in the output data.

"label":"string"

specifies the descriptive label to associate with the table.

"lifetime":64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

"name":"table-name"

specifies the name for the output table.

"onDemand":True | False

This parameter is deprecated.

Default True
"promote":True | False

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default False
"replace":True | False

when set to True, overwrites an existing table that has the same name.

Default False
"replication":integer

specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.

Default 1
Minimum value 0
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

Long form table={"name":"table-name"}
Shortcut form table="table-name"

The castable value can be one or more of the following:

"caslib":"string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

"computedOnDemand":True | False

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default False
"computedVars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"computedVarsProgram":"string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>}

specifies data source options.

Aliases options
dataSource
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the input table.

"onDemand":True | False

This parameter is deprecated.

Default True
"singlePass":True | False

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default False
"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the input data.

"whereTable":{groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

"casLib":"string"

specifies the caslib for the filter table. By default, the active caslib is used.

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the filter table.

"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

slTrain Action

Trains super learner models.

R Syntax

results <– cas.superLearner.slTrain(s,
applyRowOrder=TRUE | FALSE,
attributes=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
cvStratVar="variable-name",
display=list(
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>),
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
),
inputs=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
k=integer,
required parameter library=list( list(
required parameter name="string",
trainOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>)
) <, list(...)>),
nominals=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
outputTables=list(
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),
repeated=TRUE | FALSE,
replace=TRUE | FALSE
),
seed=64-bit-integer,
store=list(
caslib="string",
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
name="table-name",
onDemand=TRUE | FALSE,
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
),
required parameter table=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
onDemand=TRUE | FALSE,
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
required parameter target="variable-name"
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores the model in a binary table object that you can use for scoring.

Parameter Descriptions

applyRowOrder=TRUE | FALSE

when set to True, uses the available groupBy and orderBy information to group and order the data.

Default FALSE

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables that this action uses. Currently, attributes that are specified for the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

cvStratVar="variable-name"

specifies a categorical variable to use in the k-fold partitioning process for stratified cross-validation. This ensures that each fold contains a similar proportion of samples from each class of the specified variable.

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

inputs=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the input variables to use in the analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

k=integer

specifies the number (k) to use in the k-fold partitioning process for cross-validation.

Alias kFolds

* library=list( list(slTrain_library-1) <, list(slTrain_library-2), ...>)

specifies the base learner library.

The slTrain_library value can be one or more of the following:

* modelType="ANN" | "BARTGAUSS" | "BARTPROBIT" | "BNET" | "DTREE" | "FACTMAC" | "FOREST" | "GAMPL" | "GAMSELECT" | "GBTREE" | "GLM" | "GPCLASS" | "GPREG" | "LGBM" | "LOGISTIC" | "SVM"

specifies the base learner model type.

ANN

specifies an artificial neural network model.

BARTGAUSS

specifies a Bayesian additive regression trees model.

BARTPROBIT

specifies a probit Bayesian additive regression trees model.

BNET

specifies a Bayesian network model.

DTREE

specifies a decision tree model.

FACTMAC

specifies a factorization machine model.

FOREST

specifies a forest model.

GAMPL

specifies a generalized additive model by penalized likelihood.

GAMSELECT

specifies a generalized additive model with model selection.

GBTREE

specifies a gradient boosting tree model.

GLM

specifies an ordinary linear least squares model.

GPCLASS

specifies a Gaussian process classification model.

GPREG

specifies a Gaussian process regression model.

LGBM

specifies a light gradient boosting tree model.

LOGISTIC

specifies a logistic regression model.

SVM

specifies a support vector machine model.

* name="string"

specifies the name of the base learner model.

trainOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>)

specifies a list of parameters for the base learner model training action to use.

method="CCLS" | "CCNLL" | "CVSELECTOR"

specifies the meta-learning method. By default, the convex-constrained binomial log-likelihood maximization method is used for a binary target variable, and the convex-constrained least squares method is used for an interval target variable.

CCLS

specifies the convex-constrained least squares method. If you specify this method, the target variable must be continuous.

CCNLL

specifies the convex-constrained binomial log-likelihood maximization method. If you specify this method, the target variable must be binary.

CVSELECTOR

specifies the cross-validated selector.

nominals=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the nominal variables to use in the analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

seed=64-bit-integer

specifies the seed to use in the pseudorandom number generator that is used for k-fold partitioning.

Default 0
Range 0–4294967295

store=list(casouttablebasic)

stores the model in a binary table object that you can use for scoring.

Aliases savemodel
save
savestate
Long form store=list(name="table-name")
Shortcut form store="table-name"

The casouttablebasic value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

indexVars=list("variable-name-1" <, "variable-name-2", ...>)

specifies the list of variables to create indexes for in the output data.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

onDemand=TRUE | FALSE

This parameter is deprecated.

Default TRUE
promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default FALSE
replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default FALSE
replication=integer

specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.

Default 1
Minimum value 0
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table=list(castable)

specifies the input data table.

Long form table=list(name="table-name")
Shortcut form table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default FALSE
computedVars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>)

specifies data source options.

Aliases options
dataSource
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

onDemand=TRUE | FALSE

This parameter is deprecated.

Default TRUE
singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default FALSE
vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable=list(groupbytable)

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

Last updated: March 05, 2026