Synthetic Minority Oversampling Technique Action Set

Provides an action that uses the synthetic minority oversampling technique (SMOTE)

smoteSample Action

Uses the synthetic minority oversampling technique (SMOTE).

CASL Syntax

smote.smoteSample <result=results> <status=rc> /
required parameter casOut={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
classColumn="string",
classToAugment="string" | 64-bit-integer | double,
inputs={"variable-name-1" <, "variable-name-2", ...>},
k=integer,
nominals={"variable-name-1" <, "variable-name-2", ...>},
nThreads=integer,
required parameter numSamples=integer,
seed=double,
required parameter table={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
}
;
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input table of samples.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

required parametercasOut

specifies the output table to contain the synthetic samples.

Parameter Descriptions

* casOut={casouttable}

specifies the output table to contain the synthetic samples.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

classColumn="string"

specifies the name of the input table column that contains the class of each sample.

classToAugment="string" | 64-bit-integer | double

specifies the class to generate synthetic samples for.

extrapolationFactor=double

specifies the standard deviation to use in adding Gaussian perturbation noise to the boundaries of the input data.

Default 0
Minimum value 0

inputs={"variable-name-1" <, "variable-name-2", ...>}

specifies which variables from the input table to use.

Alias input

k=integer

specifies the number of nearest neighbors to use in generating the new instance.

Default 5
Range 1–1000

nominals={"variable-name-1" <, "variable-name-2", ...>}

specifies the nominal variables in the input table.

nThreads=integer

specifies the number of CPU threads to use.

Default 0
Range 0–1024

* numSamples=integer

specifies the number of synthetic samples to generate.

Range 1–100000000

seed=double

specifies the seed to use for the random number generator.

Default 0
Minimum value 0

* table={castable}

specifies the input table of samples.

Long form table={name="table-name"}
Shortcut form table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default FALSE
computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases options
dataSource
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

orderBy={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default FALSE
vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

smoteSample Action

Uses the synthetic minority oversampling technique (SMOTE).

Lua Syntax

results, info = s:smote_smoteSample{
required parameter casOut={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
classColumn="string",
classToAugment="string" | 64-bit-integer | double,
inputs={"variable-name-1" <, "variable-name-2", ...>},
k=integer,
nominals={"variable-name-1" <, "variable-name-2", ...>},
nThreads=integer,
required parameter numSamples=integer,
seed=double,
required parameter table={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
}
}
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input table of samples.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

required parametercasOut

specifies the output table to contain the synthetic samples.

Parameter Descriptions

* casOut={casouttable}

specifies the output table to contain the synthetic samples.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

classColumn="string"

specifies the name of the input table column that contains the class of each sample.

classToAugment="string" | 64-bit-integer | double

specifies the class to generate synthetic samples for.

extrapolationFactor=double

specifies the standard deviation to use in adding Gaussian perturbation noise to the boundaries of the input data.

Default 0
Minimum value 0

inputs={"variable-name-1" <, "variable-name-2", ...>}

specifies which variables from the input table to use.

Alias input

k=integer

specifies the number of nearest neighbors to use in generating the new instance.

Default 5
Range 1–1000

nominals={"variable-name-1" <, "variable-name-2", ...>}

specifies the nominal variables in the input table.

nThreads=integer

specifies the number of CPU threads to use.

Default 0
Range 0–1024

* numSamples=integer

specifies the number of synthetic samples to generate.

Range 1–100000000

seed=double

specifies the seed to use for the random number generator.

Default 0
Minimum value 0

* table={castable}

specifies the input table of samples.

Long form table={name="table-name"}
Shortcut form table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=true | false

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default false
computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases options
dataSource
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

orderBy={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

singlePass=true | false

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default false
vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

smoteSample Action

Uses the synthetic minority oversampling technique (SMOTE).

Python Syntax

results=s.smote.smoteSample(
required parameter casOut={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
classColumn="string",
classToAugment="string" | 64-bit-integer | double,
inputs=["variable-name-1" <, "variable-name-2", ...>],
k=integer,
nominals=["variable-name-1" <, "variable-name-2", ...>],
nThreads=integer,
required parameter numSamples=integer,
seed=double,
required parameter table={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
}
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input table of samples.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

required parametercasOut

specifies the output table to contain the synthetic samples.

Parameter Descriptions

* casOut={casouttable}

specifies the output table to contain the synthetic samples.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

classColumn="string"

specifies the name of the input table column that contains the class of each sample.

classToAugment="string" | 64-bit-integer | double

specifies the class to generate synthetic samples for.

extrapolationFactor=double

specifies the standard deviation to use in adding Gaussian perturbation noise to the boundaries of the input data.

Default 0
Minimum value 0

inputs=["variable-name-1" <, "variable-name-2", ...>]

specifies which variables from the input table to use.

Alias input

k=integer

specifies the number of nearest neighbors to use in generating the new instance.

Default 5
Range 1–1000

nominals=["variable-name-1" <, "variable-name-2", ...>]

specifies the nominal variables in the input table.

nThreads=integer

specifies the number of CPU threads to use.

Default 0
Range 0–1024

* numSamples=integer

specifies the number of synthetic samples to generate.

Range 1–100000000

seed=double

specifies the seed to use for the random number generator.

Default 0
Minimum value 0

* table={castable}

specifies the input table of samples.

Long form table={"name":"table-name"}
Shortcut form table="table-name"

The castable value can be one or more of the following:

"caslib":"string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

"computedOnDemand":True | False

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default False
"computedVars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"computedVarsProgram":"string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>}

specifies data source options.

Aliases options
dataSource
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the input table.

"orderBy":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"singlePass":True | False

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default False
"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the input data.

"whereTable":{groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

"casLib":"string"

specifies the caslib for the filter table. By default, the active caslib is used.

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the filter table.

"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the data from the filter table.

smoteSample Action

Uses the synthetic minority oversampling technique (SMOTE).

R Syntax

results <– cas.smote.smoteSample(s,
required parameter casOut=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
classColumn="string",
classToAugment="string" | 64-bit-integer | double,
inputs=list("variable-name-1" <, "variable-name-2", ...>),
k=integer,
nominals=list("variable-name-1" <, "variable-name-2", ...>),
nThreads=integer,
required parameter numSamples=integer,
seed=double,
required parameter table=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
)
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input table of samples.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

required parametercasOut

specifies the output table to contain the synthetic samples.

Parameter Descriptions

* casOut=list(casouttable)

specifies the output table to contain the synthetic samples.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

classColumn="string"

specifies the name of the input table column that contains the class of each sample.

classToAugment="string" | 64-bit-integer | double

specifies the class to generate synthetic samples for.

extrapolationFactor=double

specifies the standard deviation to use in adding Gaussian perturbation noise to the boundaries of the input data.

Default 0
Minimum value 0

inputs=list("variable-name-1" <, "variable-name-2", ...>)

specifies which variables from the input table to use.

Alias input

k=integer

specifies the number of nearest neighbors to use in generating the new instance.

Default 5
Range 1–1000

nominals=list("variable-name-1" <, "variable-name-2", ...>)

specifies the nominal variables in the input table.

nThreads=integer

specifies the number of CPU threads to use.

Default 0
Range 0–1024

* numSamples=integer

specifies the number of synthetic samples to generate.

Range 1–100000000

seed=double

specifies the seed to use for the random number generator.

Default 0
Minimum value 0

* table=list(castable)

specifies the input table of samples.

Long form table=list(name="table-name")
Shortcut form table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias compOnDemand
Default FALSE
computedVars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias compPgm
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>)

specifies data source options.

Aliases options
dataSource
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

orderBy=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default FALSE
vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable=list(groupbytable)

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)

specifies data source options.

Aliases options
dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

Last updated: November 23, 2025