importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

name="table-name",

singlePass=TRUE | FALSE,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

target="variable-name",

weight="variable-name"

;

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the table name, caslib, and other common parameters.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
required parametercasOut	—	specifies the CAS table to store the analysis results.

Parameter Descriptions

* casOut={casouttable}

specifies the CAS table to store the analysis results.

Long form	casOut={name="table-name"}
Shortcut form	casOut="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

indexVars={"variable-name-1" <, "variable-name-2", ...>}

specifies the list of variables to create indexes for in the output data.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	FALSE

replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default	FALSE

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

distinctCountLimit=integer

specifies the distinct count limit. If the limit is exceeded, and the misraGries parameter is set to True, the Misra-Gries frequency sketch algorithm is used to estimate the frequency distribution. Otherwise, the distinct count operation is aborted.

Default	10000
Minimum value	256

ecdfTolerance=double

specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.

Default	0.001
Range	1E-06–0.1

freq="variable-name"

specifies the frequency variable.

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	vars

misraGries=TRUE | FALSE

when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.

Default	TRUE

nominals={"variable-name-1" <, "variable-name-2", ...>}

specifies the nominal variables.

screenPolicy={sweeperPolicy}

specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.

Alias	sweeperPolicy

The sweeperPolicy value can be one or more of the following:

constant=TRUE | FALSE

when set to True, uses the variable screening policy to identify variables that have constant values.

Alias	unique
Default	TRUE

groupRareLevels=TRUE | FALSE

when set to True, uses the variable screening policy to identify nominal variables that have rare levels.

Alias	groupRare
Default	TRUE

leakagePercentThreshold=double

specifies the variable screening policy for variables that have a very high level of information about the target. Variables that have a greater target entropy percentage reduction than the specified threshold are flagged as leakage variables.

Alias	leakagePercentageThreshold
Default	90
Range	(0–100]

lowCv=TRUE | FALSE

when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).

Alias	lowCoefficientVariation
Default	TRUE

lowMutualInformation=double

specifies the variable screening policy for variables that have a low level of information about the target.

Alias	lowInformation
Default	0.05
Minimum value	0

missingIndicatorPercent=double

specifies the variable screening policy for generating missing indicator variables.

Alias	missingIndicatorPercentage
Default	75
Range	[10–100)

missingPercentThreshold=double

specifies the variable screening policy for identifying variables that have a very high missing rate.

Alias	missingPercentageThreshold
Default	90
Range	[10–100)

redundant=double

specifies the symmetric uncertainty (SU) threshold for identifying redundant variables. If the SU for two variables exceeds the threshold, the variable that has less information about the target is flagged as redundant.

Default	1
Range	(0–1]

* table={castable}

specifies the table name, caslib, and other common parameters.

Long form	table={name="table-name"}
Shortcut form	table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	FALSE

computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias	compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases	options
Aliases	dataSource

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	FALSE

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

Alias	evalVar

weight="variable-name"

specifies the weight variable.

screenVariables Action

Screens noise variables and variables that need special transformations to be useful in the downstream analytics..

Lua Syntax
Summary: Input and Output Tables
Parameter Descriptions

Lua Syntax

results, info = s:dataSciencePilot_screenVariables{

casOut={

caslib="string",

indexVars={"variable-name-1" <, "variable-name-2", ...>},

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=true | false,

replace=true | false,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

distinctCountLimit=integer,

ecdfTolerance=double,

freq="variable-name",

inputs={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

misraGries=true | false,

nominals={"variable-name-1" <, "variable-name-2", ...>},

screenPolicy={

constant=true | false,

groupRareLevels=true | false,

leakagePercentThreshold=double,

lowCv=true | false,

lowMutualInformation=double,

missingIndicatorPercent=double,

missingPercentThreshold=double,

redundant=double

table={

caslib="string",

computedOnDemand=true | false,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

name="table-name",

singlePass=true | false,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

target="variable-name",

weight="variable-name"

}

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the table name, caslib, and other common parameters.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
required parametercasOut	—	specifies the CAS table to store the analysis results.

Parameter Descriptions

* casOut={casouttable}

specifies the CAS table to store the analysis results.

Long form	casOut={name="table-name"}
Shortcut form	casOut="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

indexVars={"variable-name-1" <, "variable-name-2", ...>}

specifies the list of variables to create indexes for in the output data.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=true | false

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	false

replace=true | false

when set to True, overwrites an existing table that has the same name.

Default	false

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

distinctCountLimit=integer

Default	10000
Minimum value	256

ecdfTolerance=double

specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.

Default	0.001
Range	1E-06–0.1

freq="variable-name"

specifies the frequency variable.

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	vars

misraGries=true | false

when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.

Default	true

nominals={"variable-name-1" <, "variable-name-2", ...>}

specifies the nominal variables.

screenPolicy={sweeperPolicy}

specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.

Alias	sweeperPolicy

The sweeperPolicy value can be one or more of the following:

constant=true | false

when set to True, uses the variable screening policy to identify variables that have constant values.

Alias	unique
Default	true

groupRareLevels=true | false

when set to True, uses the variable screening policy to identify nominal variables that have rare levels.

Alias	groupRare
Default	true

leakagePercentThreshold=double

Alias	leakagePercentageThreshold
Default	90
Range	(0–100]

lowCv=true | false

when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).

Alias	lowCoefficientVariation
Default	true

lowMutualInformation=double

specifies the variable screening policy for variables that have a low level of information about the target.

Alias	lowInformation
Default	0.05
Minimum value	0

missingIndicatorPercent=double

specifies the variable screening policy for generating missing indicator variables.

Alias	missingIndicatorPercentage
Default	75
Range	[10–100)

missingPercentThreshold=double

specifies the variable screening policy for identifying variables that have a very high missing rate.

Alias	missingPercentageThreshold
Default	90
Range	[10–100)

redundant=double

Default	1
Range	(0–1]

* table={castable}

specifies the table name, caslib, and other common parameters.

Long form	table={name="table-name"}
Shortcut form	table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=true | false

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	false

computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

Alias	compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases	options
Aliases	dataSource

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

singlePass=true | false

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	false

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

Alias	evalVar

weight="variable-name"

specifies the weight variable.

screenVariables Action

Screens noise variables and variables that need special transformations to be useful in the downstream analytics..

Python Syntax
Summary: Input and Output Tables
Parameter Descriptions

Python Syntax

results=s.dataSciencePilot.screenVariables(

casOut={

"caslib":"string",

"indexVars":["variable-name-1" <, "variable-name-2", ...>],

"lifetime":64-bit-integer,

"memoryFormat":"DVR" | "INHERIT" | "STANDARD",

"name":"table-name",

"promote":True | False,

"replace":True | False,

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

distinctCountLimit=integer,

ecdfTolerance=double,

freq="variable-name",

inputs=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

misraGries=True | False,

nominals=["variable-name-1" <, "variable-name-2", ...>],

screenPolicy={

"constant":True | False,

"groupRareLevels":True | False,

"leakagePercentThreshold":double,

"lowCv":True | False,

"lowMutualInformation":double,

"missingIndicatorPercent":double,

"missingPercentThreshold":double,

"redundant":double

table={

"caslib":"string",

"computedOnDemand":True | False,

"computedVars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"computedVarsProgram":"string",

"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

"name":"table-name",

"singlePass":True | False,

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"where":"where-expression",

"whereTable":{

"casLib":"string"

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

"name":"table-name"

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>]

"where":"where-expression"

}

target="variable-name",

weight="variable-name"

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the table name, caslib, and other common parameters.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
required parametercasOut	—	specifies the CAS table to store the analysis results.

Parameter Descriptions

* casOut={casouttable}

specifies the CAS table to store the analysis results.

Long form	casOut={"name":"table-name"}
Shortcut form	casOut="table-name"

The casouttable value can be one or more of the following:

"caslib":"string"

specifies the name of the caslib for the output table.

"indexVars":["variable-name-1" <, "variable-name-2", ...>]

specifies the list of variables to create indexes for in the output data.

"lifetime":64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

"name":"table-name"

specifies the name for the output table.

"promote":True | False

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	False

"replace":True | False

when set to True, overwrites an existing table that has the same name.

Default	False

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

distinctCountLimit=integer

Default	10000
Minimum value	256

ecdfTolerance=double

specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.

Default	0.001
Range	1E-06–0.1

freq="variable-name"

specifies the frequency variable.

inputs=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	vars

misraGries=True | False

when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.

Default	True

nominals=["variable-name-1" <, "variable-name-2", ...>]

specifies the nominal variables.

screenPolicy={sweeperPolicy}

specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.

Alias	sweeperPolicy

The sweeperPolicy value can be one or more of the following:

"constant":True | False

when set to True, uses the variable screening policy to identify variables that have constant values.

Alias	unique
Default	True

"groupRareLevels":True | False

when set to True, uses the variable screening policy to identify nominal variables that have rare levels.

Alias	groupRare
Default	True

"leakagePercentThreshold":double

Alias	leakagePercentageThreshold
Default	90
Range	(0–100]

"lowCv":True | False

when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).

Alias	lowCoefficientVariation
Default	True

"lowMutualInformation":double

specifies the variable screening policy for variables that have a low level of information about the target.

Alias	lowInformation
Default	0.05
Minimum value	0

"missingIndicatorPercent":double

specifies the variable screening policy for generating missing indicator variables.

Alias	missingIndicatorPercentage
Default	75
Range	[10–100)

"missingPercentThreshold":double

specifies the variable screening policy for identifying variables that have a very high missing rate.

Alias	missingPercentageThreshold
Default	90
Range	[10–100)

"redundant":double

Default	1
Range	(0–1]

* table={castable}

specifies the table name, caslib, and other common parameters.

Long form	table={"name":"table-name"}
Shortcut form	table="table-name"

The castable value can be one or more of the following:

"caslib":"string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

"computedOnDemand":True | False

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	False

"computedVars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

Alias	compVars

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"computedVarsProgram":"string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>}

specifies data source options.

Aliases	options
Aliases	dataSource

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the input table.

"singlePass":True | False

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	False

"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the input data.

"whereTable":{groupbytable}

The groupbytable value can be one or more of the following:

"casLib":"string"

specifies the caslib for the filter table. By default, the active caslib is used.

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the filter table.

"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

Alias	evalVar

weight="variable-name"

specifies the weight variable.

screenVariables Action

Screens noise variables and variables that need special transformations to be useful in the downstream analytics..

R Syntax
Summary: Input and Output Tables
Parameter Descriptions

R Syntax

results <– cas.dataSciencePilot.screenVariables(s,

casOut=list(

caslib="string",

indexVars=list("variable-name-1" <, "variable-name-2", ...>),

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

distinctCountLimit=integer,

ecdfTolerance=double,

freq="variable-name",

inputs=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

misraGries=TRUE | FALSE,

nominals=list("variable-name-1" <, "variable-name-2", ...>),

screenPolicy=list(

constant=TRUE | FALSE,

groupRareLevels=TRUE | FALSE,

leakagePercentThreshold=double,

lowCv=TRUE | FALSE,

lowMutualInformation=double,

missingIndicatorPercent=double,

missingPercentThreshold=double,

redundant=double

table=list(

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

computedVarsProgram="string",

dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),

name="table-name",

singlePass=TRUE | FALSE,

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

where="where-expression",

whereTable=list(

casLib="string"

name="table-name"

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>)

where="where-expression"

)

target="variable-name",

weight="variable-name"

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the table name, caslib, and other common parameters.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
required parametercasOut	—	specifies the CAS table to store the analysis results.

Parameter Descriptions

* casOut=list(casouttable)

specifies the CAS table to store the analysis results.

Long form	casOut=list(name="table-name")
Shortcut form	casOut="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

indexVars=list("variable-name-1" <, "variable-name-2", ...>)

specifies the list of variables to create indexes for in the output data.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	FALSE

replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default	FALSE

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

distinctCountLimit=integer

Default	10000
Minimum value	256

ecdfTolerance=double

specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.

Default	0.001
Range	1E-06–0.1

freq="variable-name"

specifies the frequency variable.

inputs=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	vars

misraGries=TRUE | FALSE

when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.

Default	TRUE

nominals=list("variable-name-1" <, "variable-name-2", ...>)

specifies the nominal variables.

screenPolicy=list(sweeperPolicy)

specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.

Alias	sweeperPolicy

The sweeperPolicy value can be one or more of the following:

constant=TRUE | FALSE

when set to True, uses the variable screening policy to identify variables that have constant values.

Alias	unique
Default	TRUE

groupRareLevels=TRUE | FALSE

when set to True, uses the variable screening policy to identify nominal variables that have rare levels.

Alias	groupRare
Default	TRUE

leakagePercentThreshold=double

Alias	leakagePercentageThreshold
Default	90
Range	(0–100]

lowCv=TRUE | FALSE

when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).

Alias	lowCoefficientVariation
Default	TRUE

lowMutualInformation=double

specifies the variable screening policy for variables that have a low level of information about the target.

Alias	lowInformation
Default	0.05
Minimum value	0

missingIndicatorPercent=double

specifies the variable screening policy for generating missing indicator variables.

Alias	missingIndicatorPercentage
Default	75
Range	[10–100)

missingPercentThreshold=double

specifies the variable screening policy for identifying variables that have a very high missing rate.

Alias	missingPercentageThreshold
Default	90
Range	[10–100)

redundant=double

Default	1
Range	(0–1]

* table=list(castable)

specifies the table name, caslib, and other common parameters.

Long form	table=list(name="table-name")
Shortcut form	table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	FALSE

computedVars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

Alias	compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>)

specifies data source options.

Aliases	options
Aliases	dataSource

importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	FALSE

vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable=list(groupbytable)

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

* target="variable-name"

specifies the target variable.

Alias	evalVar

weight="variable-name"

specifies the weight variable.

Last updated: November 23, 2025