Provides actions for automating data science workflows, including automatic machine learning pipeline exploration, execution and ranking.
Automated feature transformation and generation engine.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the table name, caslib, and other common parameters. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the CAS table to store the analysis results. |
|
|
required parameterfeatureOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
|
— |
specifies the CAS table to store the feature transformation and generation model. |
|
|
required parametertransformationOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
specifies the CAS table to store the analysis results.
| Long form | casout={name="table-name"} |
|---|---|
| Shortcut form | casout="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the names of variables to be copied to the output table.
specifies the distinct count limit. If the limit is exceeded, and the misraGries parameter is set to True, the Misra-Gries frequency sketch algorithm is used to estimate the frequency distribution. Otherwise, the distinct count operation is aborted.
| Default | 10000 |
|---|---|
| Minimum value | 256 |
specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.
| Default | 0.001 |
|---|---|
| Range | 1E-06–0.1 |
specifies the target variable level that you want to model. Multilevel classification problems are cast into a one-versus-all binary classification problem, where the value of the event parameter denotes the level that you are modeling.
specifies the automatic variable analysis and grouping (AVAPT) policy.
| Alias | avaptPolicy |
|---|
The avaptPolicy value can be one or more of the following:
specifies the automatic variable analysis and grouping (AVAPT) cardinality policy.
The cardinalityAvaptPolicy value can be one or more of the following:
specifies the cardinality threshold for the low-medium cutoff.
| Default | 32 |
|---|---|
| Range | 2–256 |
specifies the cardinality threshold for the medium-high cutoff.
| Default | 64 |
|---|---|
| Range | 2–1024 |
specifies the minimum number of observations for each target level.
| Default | 10 |
|---|---|
| Range | 5–100 |
specifies the automatic variable analysis and grouping (AVAPT) coefficient of variation policy.
| Alias | coefficientVariation |
|---|
The cvAvaptPolicy value can be one or more of the following:
specifies the absolute value of the low-high percentage threshold for the moment coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the low-high percentage threshold for the robust coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the datetime variables.
| Alias | dateTime |
|---|
specifies the date variables.
| Alias | date |
|---|
specifies the automatic variable analysis and grouping (AVAPT) entropy policy.
The entropyAvaptPolicy value can be one or more of the following:
specifies the Gini entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Gini entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the automatic variable analysis and grouping (AVAPT) index of qualitative variation policy.
| Alias | qualitativeVariationIndex |
|---|
The iqvAvaptPolicy value can be one or more of the following:
specifies the low-high cutoff frequency ratio threshold between the most frequent and least frequent levels of a nominal variable.
| Alias | highTop1Bottom1 |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the low-high cutoff frequency ratio threshold between the most frequent and second most frequent levels of a nominal variable.
| Alias | highTop1Top2 |
|---|---|
| Default | 10 |
| Minimum value | 1 |
specifies the variation ratio threshold for the low-high cutoff.
| Alias | highModVr |
|---|---|
| Default | 0.5 |
| Range | (0–1] |
specifies the automatic variable analysis and grouping (AVAPT) kurtosis policy.
The kurtosisAvaptPolicy value can be one or more of the following:
specifies the absolute value of the moment kurtosis threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the moment kurtosis threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the medium-high cutoff.
| Default | 3 |
|---|---|
| Minimum value | 0 |
specifies the automatic variable analysis and grouping (AVAPT) missing grouping policy.
The missingAvaptPolicy value can be one or more of the following:
specifies the missing percentage threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Range | 0–100 |
specifies the missing percentage threshold for the medium-high cutoff.
| Default | 25 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) nominal policy.
The nominalAvaptPolicy value can be one or more of the following:
specifies the AVAPT nominal policy cardinality ratio threshold.
| Default | 0.25 |
|---|---|
| Range | (0–1] |
specifies the AVAPT nominal policy cardinality threshold.
| Default | 1024 |
|---|---|
| Minimum value | 32 |
when set to True, includes numeric variables with some negative values in the nominal analysis.
| Default | FALSE |
|---|
when set to True, includes numeric variables with some nonintegral values in the nominal analysis.
| Default | FALSE |
|---|
specifies variables to consider as intervals.
specifies variables to consider as nominals.
specifies the automatic variable analysis and grouping (AVAPT) outlier policy.
The outlierAvaptPolicy value can be one or more of the following:
specifies the z-score outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the z-score outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) skewness policy.
The skewnessAvaptPolicy value can be one or more of the following:
specifies the moment skewness threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Range | 0–100 |
specifies the moment skewness threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Range | 0–100 |
specifies the robust skewness threshold for the low-medium cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–3 |
specifies the robust skewness threshold for the medium-high cutoff.
| Default | 2 |
|---|---|
| Range | 0–3 |
specifies the time variables.
| Alias | time |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | featureOut={name="table-name"} |
|---|---|
| Shortcut form | featureOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the frequency variable.
specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | vars |
|---|
when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.
| Default | TRUE |
|---|
specifies the feature ranking policy.
| Alias | rank |
|---|
| Long form | rankPolicy={intervalStat="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU"} |
|---|---|
| Shortcut form | rankPolicy="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU" |
The rankPolicy value can be one or more of the following:
specifies the interval variable ranking statistic.
| Alias | interval |
|---|---|
| Default | SU |
specifies the nominal variable ranking statistic.
| Alias | nominal |
|---|---|
| Default | SU |
when set to True, missing indicator features take part in the feature ranking. Otherwise, they are excluded from the ranking, and hence will always appear in the final feature set.
| Default | TRUE |
|---|
when set to True, performs a separate feature ranking for interval and nominal features.
| Alias | separate |
|---|---|
| Default | TRUE |
specifies the number of top-ranked interaction features to generate and save.
| Alias | topKInteract |
|---|---|
| Minimum value | 1 |
specifies the number of features per variable to save.
| Alias | topK |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the CAS table to store the feature transformation and generation model.
| Alias | saveModel |
|---|
| Long form | saveState={name="table-name"} |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.
| Alias | sweeperPolicy |
|---|
The sweeperPolicy value can be one or more of the following:
when set to True, uses the variable screening policy to identify variables that have constant values.
| Alias | unique |
|---|---|
| Default | TRUE |
when set to True, uses the variable screening policy to identify nominal variables that have rare levels.
| Alias | groupRare |
|---|---|
| Default | TRUE |
specifies the variable screening policy for variables that have a very high level of information about the target. Variables that have a greater target entropy percentage reduction than the specified threshold are flagged as leakage variables.
| Alias | leakagePercentageThreshold |
|---|---|
| Default | 90 |
| Range | (0–100] |
when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).
| Alias | lowCoefficientVariation |
|---|---|
| Default | TRUE |
specifies the variable screening policy for variables that have a low level of information about the target.
| Alias | lowInformation |
|---|---|
| Default | 0.05 |
| Minimum value | 0 |
specifies the variable screening policy for generating missing indicator variables.
| Alias | missingIndicatorPercentage |
|---|---|
| Default | 75 |
| Range | [10–100) |
specifies the variable screening policy for identifying variables that have a very high missing rate.
| Alias | missingPercentageThreshold |
|---|---|
| Default | 90 |
| Range | [10–100) |
specifies the symmetric uncertainty (SU) threshold for identifying redundant variables. If the SU for two variables exceeds the threshold, the variable that has less information about the target is flagged as redundant.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies a seed value for random number generation. This value is used for repeatable random number generation in some scenarios.
| Default | 0 |
|---|
specifies the table name, caslib, and other common parameters.
| Long form | table={name="table-name"} |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | FALSE |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | FALSE |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the target variable.
| Alias | evalVar |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | transformationOut={name="table-name"} |
|---|---|
| Shortcut form | transformationOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the feature transformation and generation space in which the feature machine operates.
| Alias | ftgPolicy |
|---|
The transformationSpace value can be one or more of the following:
when set to True, includes cardinality-reducing transformations.
| Default | TRUE |
|---|
when set to True, includes transformations for the treatment of low entropy.
| Default | FALSE |
|---|
when set to True, detects and generates interaction features.
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of low indices of qualitative variation (IQV).
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of high kurtosis.
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of missing values.
| Default | TRUE |
|---|
when set to True, includes transformations for the treatment of outliers.
| Default | FALSE |
|---|
when set to True, includes up to third-order polynomial transformations.
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of high skewness.
| Default | TRUE |
|---|
specifies the weight variable.
Automated feature transformation and generation engine.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the table name, caslib, and other common parameters. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the CAS table to store the analysis results. |
|
|
required parameterfeatureOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
|
— |
specifies the CAS table to store the feature transformation and generation model. |
|
|
required parametertransformationOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
specifies the CAS table to store the analysis results.
| Long form | casout={name="table-name"} |
|---|---|
| Shortcut form | casout="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | false |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | false |
|---|
specifies the names of variables to be copied to the output table.
specifies the distinct count limit. If the limit is exceeded, and the misraGries parameter is set to True, the Misra-Gries frequency sketch algorithm is used to estimate the frequency distribution. Otherwise, the distinct count operation is aborted.
| Default | 10000 |
|---|---|
| Minimum value | 256 |
specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.
| Default | 0.001 |
|---|---|
| Range | 1E-06–0.1 |
specifies the target variable level that you want to model. Multilevel classification problems are cast into a one-versus-all binary classification problem, where the value of the event parameter denotes the level that you are modeling.
specifies the automatic variable analysis and grouping (AVAPT) policy.
| Alias | avaptPolicy |
|---|
The avaptPolicy value can be one or more of the following:
specifies the automatic variable analysis and grouping (AVAPT) cardinality policy.
The cardinalityAvaptPolicy value can be one or more of the following:
specifies the cardinality threshold for the low-medium cutoff.
| Default | 32 |
|---|---|
| Range | 2–256 |
specifies the cardinality threshold for the medium-high cutoff.
| Default | 64 |
|---|---|
| Range | 2–1024 |
specifies the minimum number of observations for each target level.
| Default | 10 |
|---|---|
| Range | 5–100 |
specifies the automatic variable analysis and grouping (AVAPT) coefficient of variation policy.
| Alias | coefficientVariation |
|---|
The cvAvaptPolicy value can be one or more of the following:
specifies the absolute value of the low-high percentage threshold for the moment coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the low-high percentage threshold for the robust coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the datetime variables.
| Alias | dateTime |
|---|
specifies the date variables.
| Alias | date |
|---|
specifies the automatic variable analysis and grouping (AVAPT) entropy policy.
The entropyAvaptPolicy value can be one or more of the following:
specifies the Gini entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Gini entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the automatic variable analysis and grouping (AVAPT) index of qualitative variation policy.
| Alias | qualitativeVariationIndex |
|---|
The iqvAvaptPolicy value can be one or more of the following:
specifies the low-high cutoff frequency ratio threshold between the most frequent and least frequent levels of a nominal variable.
| Alias | highTop1Bottom1 |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the low-high cutoff frequency ratio threshold between the most frequent and second most frequent levels of a nominal variable.
| Alias | highTop1Top2 |
|---|---|
| Default | 10 |
| Minimum value | 1 |
specifies the variation ratio threshold for the low-high cutoff.
| Alias | highModVr |
|---|---|
| Default | 0.5 |
| Range | (0–1] |
specifies the automatic variable analysis and grouping (AVAPT) kurtosis policy.
The kurtosisAvaptPolicy value can be one or more of the following:
specifies the absolute value of the moment kurtosis threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the moment kurtosis threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the medium-high cutoff.
| Default | 3 |
|---|---|
| Minimum value | 0 |
specifies the automatic variable analysis and grouping (AVAPT) missing grouping policy.
The missingAvaptPolicy value can be one or more of the following:
specifies the missing percentage threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Range | 0–100 |
specifies the missing percentage threshold for the medium-high cutoff.
| Default | 25 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) nominal policy.
The nominalAvaptPolicy value can be one or more of the following:
specifies the AVAPT nominal policy cardinality ratio threshold.
| Default | 0.25 |
|---|---|
| Range | (0–1] |
specifies the AVAPT nominal policy cardinality threshold.
| Default | 1024 |
|---|---|
| Minimum value | 32 |
when set to True, includes numeric variables with some negative values in the nominal analysis.
| Default | false |
|---|
when set to True, includes numeric variables with some nonintegral values in the nominal analysis.
| Default | false |
|---|
specifies variables to consider as intervals.
specifies variables to consider as nominals.
specifies the automatic variable analysis and grouping (AVAPT) outlier policy.
The outlierAvaptPolicy value can be one or more of the following:
specifies the z-score outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the z-score outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) skewness policy.
The skewnessAvaptPolicy value can be one or more of the following:
specifies the moment skewness threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Range | 0–100 |
specifies the moment skewness threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Range | 0–100 |
specifies the robust skewness threshold for the low-medium cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–3 |
specifies the robust skewness threshold for the medium-high cutoff.
| Default | 2 |
|---|---|
| Range | 0–3 |
specifies the time variables.
| Alias | time |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | featureOut={name="table-name"} |
|---|---|
| Shortcut form | featureOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | false |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | false |
|---|
specifies the frequency variable.
specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | vars |
|---|
when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.
| Default | true |
|---|
specifies the feature ranking policy.
| Alias | rank |
|---|
| Long form | rankPolicy={intervalStat="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU"} |
|---|---|
| Shortcut form | rankPolicy="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU" |
The rankPolicy value can be one or more of the following:
specifies the interval variable ranking statistic.
| Alias | interval |
|---|---|
| Default | SU |
specifies the nominal variable ranking statistic.
| Alias | nominal |
|---|---|
| Default | SU |
when set to True, missing indicator features take part in the feature ranking. Otherwise, they are excluded from the ranking, and hence will always appear in the final feature set.
| Default | true |
|---|
when set to True, performs a separate feature ranking for interval and nominal features.
| Alias | separate |
|---|---|
| Default | true |
specifies the number of top-ranked interaction features to generate and save.
| Alias | topKInteract |
|---|---|
| Minimum value | 1 |
specifies the number of features per variable to save.
| Alias | topK |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the CAS table to store the feature transformation and generation model.
| Alias | saveModel |
|---|
| Long form | saveState={name="table-name"} |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | false |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | false |
|---|
specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.
| Alias | sweeperPolicy |
|---|
The sweeperPolicy value can be one or more of the following:
when set to True, uses the variable screening policy to identify variables that have constant values.
| Alias | unique |
|---|---|
| Default | true |
when set to True, uses the variable screening policy to identify nominal variables that have rare levels.
| Alias | groupRare |
|---|---|
| Default | true |
specifies the variable screening policy for variables that have a very high level of information about the target. Variables that have a greater target entropy percentage reduction than the specified threshold are flagged as leakage variables.
| Alias | leakagePercentageThreshold |
|---|---|
| Default | 90 |
| Range | (0–100] |
when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).
| Alias | lowCoefficientVariation |
|---|---|
| Default | true |
specifies the variable screening policy for variables that have a low level of information about the target.
| Alias | lowInformation |
|---|---|
| Default | 0.05 |
| Minimum value | 0 |
specifies the variable screening policy for generating missing indicator variables.
| Alias | missingIndicatorPercentage |
|---|---|
| Default | 75 |
| Range | [10–100) |
specifies the variable screening policy for identifying variables that have a very high missing rate.
| Alias | missingPercentageThreshold |
|---|---|
| Default | 90 |
| Range | [10–100) |
specifies the symmetric uncertainty (SU) threshold for identifying redundant variables. If the SU for two variables exceeds the threshold, the variable that has less information about the target is flagged as redundant.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies a seed value for random number generation. This value is used for repeatable random number generation in some scenarios.
| Default | 0 |
|---|
specifies the table name, caslib, and other common parameters.
| Long form | table={name="table-name"} |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | false |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | false |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the target variable.
| Alias | evalVar |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | transformationOut={name="table-name"} |
|---|---|
| Shortcut form | transformationOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | false |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | false |
|---|
specifies the feature transformation and generation space in which the feature machine operates.
| Alias | ftgPolicy |
|---|
The transformationSpace value can be one or more of the following:
when set to True, includes cardinality-reducing transformations.
| Default | true |
|---|
when set to True, includes transformations for the treatment of low entropy.
| Default | false |
|---|
when set to True, detects and generates interaction features.
| Default | false |
|---|
when set to True, includes transformations for the treatment of low indices of qualitative variation (IQV).
| Default | false |
|---|
when set to True, includes transformations for the treatment of high kurtosis.
| Default | false |
|---|
when set to True, includes transformations for the treatment of missing values.
| Default | true |
|---|
when set to True, includes transformations for the treatment of outliers.
| Default | false |
|---|
when set to True, includes up to third-order polynomial transformations.
| Default | false |
|---|
when set to True, includes transformations for the treatment of high skewness.
| Default | true |
|---|
specifies the weight variable.
Automated feature transformation and generation engine.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the table name, caslib, and other common parameters. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the CAS table to store the analysis results. |
|
|
required parameterfeatureOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
|
— |
specifies the CAS table to store the feature transformation and generation model. |
|
|
required parametertransformationOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
specifies the CAS table to store the analysis results.
| Long form | casout={"name":"table-name"} |
|---|---|
| Shortcut form | casout="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | False |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | False |
|---|
specifies the names of variables to be copied to the output table.
specifies the distinct count limit. If the limit is exceeded, and the misraGries parameter is set to True, the Misra-Gries frequency sketch algorithm is used to estimate the frequency distribution. Otherwise, the distinct count operation is aborted.
| Default | 10000 |
|---|---|
| Minimum value | 256 |
specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.
| Default | 0.001 |
|---|---|
| Range | 1E-06–0.1 |
specifies the target variable level that you want to model. Multilevel classification problems are cast into a one-versus-all binary classification problem, where the value of the event parameter denotes the level that you are modeling.
specifies the automatic variable analysis and grouping (AVAPT) policy.
| Alias | avaptPolicy |
|---|
The avaptPolicy value can be one or more of the following:
specifies the automatic variable analysis and grouping (AVAPT) cardinality policy.
The cardinalityAvaptPolicy value can be one or more of the following:
specifies the cardinality threshold for the low-medium cutoff.
| Default | 32 |
|---|---|
| Range | 2–256 |
specifies the cardinality threshold for the medium-high cutoff.
| Default | 64 |
|---|---|
| Range | 2–1024 |
specifies the minimum number of observations for each target level.
| Default | 10 |
|---|---|
| Range | 5–100 |
specifies the automatic variable analysis and grouping (AVAPT) coefficient of variation policy.
| Alias | coefficientVariation |
|---|
The cvAvaptPolicy value can be one or more of the following:
specifies the absolute value of the low-high percentage threshold for the moment coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the low-high percentage threshold for the robust coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the datetime variables.
| Alias | dateTime |
|---|
specifies the date variables.
| Alias | date |
|---|
specifies the automatic variable analysis and grouping (AVAPT) entropy policy.
The entropyAvaptPolicy value can be one or more of the following:
specifies the Gini entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Gini entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the automatic variable analysis and grouping (AVAPT) index of qualitative variation policy.
| Alias | qualitativeVariationIndex |
|---|
The iqvAvaptPolicy value can be one or more of the following:
specifies the low-high cutoff frequency ratio threshold between the most frequent and least frequent levels of a nominal variable.
| Alias | highTop1Bottom1 |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the low-high cutoff frequency ratio threshold between the most frequent and second most frequent levels of a nominal variable.
| Alias | highTop1Top2 |
|---|---|
| Default | 10 |
| Minimum value | 1 |
specifies the variation ratio threshold for the low-high cutoff.
| Alias | highModVr |
|---|---|
| Default | 0.5 |
| Range | (0–1] |
specifies the automatic variable analysis and grouping (AVAPT) kurtosis policy.
The kurtosisAvaptPolicy value can be one or more of the following:
specifies the absolute value of the moment kurtosis threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the moment kurtosis threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the medium-high cutoff.
| Default | 3 |
|---|---|
| Minimum value | 0 |
specifies the automatic variable analysis and grouping (AVAPT) missing grouping policy.
The missingAvaptPolicy value can be one or more of the following:
specifies the missing percentage threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Range | 0–100 |
specifies the missing percentage threshold for the medium-high cutoff.
| Default | 25 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) nominal policy.
The nominalAvaptPolicy value can be one or more of the following:
specifies the AVAPT nominal policy cardinality ratio threshold.
| Default | 0.25 |
|---|---|
| Range | (0–1] |
specifies the AVAPT nominal policy cardinality threshold.
| Default | 1024 |
|---|---|
| Minimum value | 32 |
when set to True, includes numeric variables with some negative values in the nominal analysis.
| Default | False |
|---|
when set to True, includes numeric variables with some nonintegral values in the nominal analysis.
| Default | False |
|---|
specifies variables to consider as intervals.
specifies variables to consider as nominals.
specifies the automatic variable analysis and grouping (AVAPT) outlier policy.
The outlierAvaptPolicy value can be one or more of the following:
specifies the z-score outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the z-score outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) skewness policy.
The skewnessAvaptPolicy value can be one or more of the following:
specifies the moment skewness threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Range | 0–100 |
specifies the moment skewness threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Range | 0–100 |
specifies the robust skewness threshold for the low-medium cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–3 |
specifies the robust skewness threshold for the medium-high cutoff.
| Default | 2 |
|---|---|
| Range | 0–3 |
specifies the time variables.
| Alias | time |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | featureOut={"name":"table-name"} |
|---|---|
| Shortcut form | featureOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | False |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | False |
|---|
specifies the frequency variable.
specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | vars |
|---|
when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.
| Default | True |
|---|
specifies the feature ranking policy.
| Alias | rank |
|---|
| Long form | rankPolicy={"intervalStat":"AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU"} |
|---|---|
| Shortcut form | rankPolicy="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU" |
The rankPolicy value can be one or more of the following:
specifies the interval variable ranking statistic.
| Alias | interval |
|---|---|
| Default | SU |
specifies the nominal variable ranking statistic.
| Alias | nominal |
|---|---|
| Default | SU |
when set to True, missing indicator features take part in the feature ranking. Otherwise, they are excluded from the ranking, and hence will always appear in the final feature set.
| Default | True |
|---|
when set to True, performs a separate feature ranking for interval and nominal features.
| Alias | separate |
|---|---|
| Default | True |
specifies the number of top-ranked interaction features to generate and save.
| Alias | topKInteract |
|---|---|
| Minimum value | 1 |
specifies the number of features per variable to save.
| Alias | topK |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the CAS table to store the feature transformation and generation model.
| Alias | saveModel |
|---|
| Long form | saveState={"name":"table-name"} |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | False |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | False |
|---|
specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.
| Alias | sweeperPolicy |
|---|
The sweeperPolicy value can be one or more of the following:
when set to True, uses the variable screening policy to identify variables that have constant values.
| Alias | unique |
|---|---|
| Default | True |
when set to True, uses the variable screening policy to identify nominal variables that have rare levels.
| Alias | groupRare |
|---|---|
| Default | True |
specifies the variable screening policy for variables that have a very high level of information about the target. Variables that have a greater target entropy percentage reduction than the specified threshold are flagged as leakage variables.
| Alias | leakagePercentageThreshold |
|---|---|
| Default | 90 |
| Range | (0–100] |
when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).
| Alias | lowCoefficientVariation |
|---|---|
| Default | True |
specifies the variable screening policy for variables that have a low level of information about the target.
| Alias | lowInformation |
|---|---|
| Default | 0.05 |
| Minimum value | 0 |
specifies the variable screening policy for generating missing indicator variables.
| Alias | missingIndicatorPercentage |
|---|---|
| Default | 75 |
| Range | [10–100) |
specifies the variable screening policy for identifying variables that have a very high missing rate.
| Alias | missingPercentageThreshold |
|---|---|
| Default | 90 |
| Range | [10–100) |
specifies the symmetric uncertainty (SU) threshold for identifying redundant variables. If the SU for two variables exceeds the threshold, the variable that has less information about the target is flagged as redundant.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies a seed value for random number generation. This value is used for repeatable random number generation in some scenarios.
| Default | 0 |
|---|
specifies the table name, caslib, and other common parameters.
| Long form | table={"name":"table-name"} |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | False |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import_ |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | False |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import_ |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the target variable.
| Alias | evalVar |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | transformationOut={"name":"table-name"} |
|---|---|
| Shortcut form | transformationOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | False |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | False |
|---|
specifies the feature transformation and generation space in which the feature machine operates.
| Alias | ftgPolicy |
|---|
The transformationSpace value can be one or more of the following:
when set to True, includes cardinality-reducing transformations.
| Default | True |
|---|
when set to True, includes transformations for the treatment of low entropy.
| Default | False |
|---|
when set to True, detects and generates interaction features.
| Default | False |
|---|
when set to True, includes transformations for the treatment of low indices of qualitative variation (IQV).
| Default | False |
|---|
when set to True, includes transformations for the treatment of high kurtosis.
| Default | False |
|---|
when set to True, includes transformations for the treatment of missing values.
| Default | True |
|---|
when set to True, includes transformations for the treatment of outliers.
| Default | False |
|---|
when set to True, includes up to third-order polynomial transformations.
| Default | False |
|---|
when set to True, includes transformations for the treatment of high skewness.
| Default | True |
|---|
specifies the weight variable.
Automated feature transformation and generation engine.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the table name, caslib, and other common parameters. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the CAS table to store the analysis results. |
|
|
required parameterfeatureOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
|
— |
specifies the CAS table to store the feature transformation and generation model. |
|
|
required parametertransformationOut |
— |
specifies the CAS table to store the feature transformation and generation pipelines. |
specifies the CAS table to store the analysis results.
| Long form | casout=list(name="table-name") |
|---|---|
| Shortcut form | casout="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the names of variables to be copied to the output table.
specifies the distinct count limit. If the limit is exceeded, and the misraGries parameter is set to True, the Misra-Gries frequency sketch algorithm is used to estimate the frequency distribution. Otherwise, the distinct count operation is aborted.
| Default | 10000 |
|---|---|
| Minimum value | 256 |
specifies the tolerance value for the empirical cumulative distribution function. This value is used by the quantile sketch algorithm.
| Default | 0.001 |
|---|---|
| Range | 1E-06–0.1 |
specifies the target variable level that you want to model. Multilevel classification problems are cast into a one-versus-all binary classification problem, where the value of the event parameter denotes the level that you are modeling.
specifies the automatic variable analysis and grouping (AVAPT) policy.
| Alias | avaptPolicy |
|---|
The avaptPolicy value can be one or more of the following:
specifies the automatic variable analysis and grouping (AVAPT) cardinality policy.
The cardinalityAvaptPolicy value can be one or more of the following:
specifies the cardinality threshold for the low-medium cutoff.
| Default | 32 |
|---|---|
| Range | 2–256 |
specifies the cardinality threshold for the medium-high cutoff.
| Default | 64 |
|---|---|
| Range | 2–1024 |
specifies the minimum number of observations for each target level.
| Default | 10 |
|---|---|
| Range | 5–100 |
specifies the automatic variable analysis and grouping (AVAPT) coefficient of variation policy.
| Alias | coefficientVariation |
|---|
The cvAvaptPolicy value can be one or more of the following:
specifies the absolute value of the low-high percentage threshold for the moment coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the low-high percentage threshold for the robust coefficient of variation (CV).
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the datetime variables.
| Alias | dateTime |
|---|
specifies the date variables.
| Alias | date |
|---|
specifies the automatic variable analysis and grouping (AVAPT) entropy policy.
The entropyAvaptPolicy value can be one or more of the following:
specifies the Gini entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Gini entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the low-medium cutoff.
| Default | 0.25 |
|---|---|
| Range | 0–1 |
specifies the Shannon entropy threshold for the medium-high cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–1 |
specifies the automatic variable analysis and grouping (AVAPT) index of qualitative variation policy.
| Alias | qualitativeVariationIndex |
|---|
The iqvAvaptPolicy value can be one or more of the following:
specifies the low-high cutoff frequency ratio threshold between the most frequent and least frequent levels of a nominal variable.
| Alias | highTop1Bottom1 |
|---|---|
| Default | 100 |
| Minimum value | 1 |
specifies the low-high cutoff frequency ratio threshold between the most frequent and second most frequent levels of a nominal variable.
| Alias | highTop1Top2 |
|---|---|
| Default | 10 |
| Minimum value | 1 |
specifies the variation ratio threshold for the low-high cutoff.
| Alias | highModVr |
|---|---|
| Default | 0.5 |
| Range | (0–1] |
specifies the automatic variable analysis and grouping (AVAPT) kurtosis policy.
The kurtosisAvaptPolicy value can be one or more of the following:
specifies the absolute value of the moment kurtosis threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the moment kurtosis threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Minimum value | 0 |
specifies the absolute value of the robust kurtosis threshold for the medium-high cutoff.
| Default | 3 |
|---|---|
| Minimum value | 0 |
specifies the automatic variable analysis and grouping (AVAPT) missing grouping policy.
The missingAvaptPolicy value can be one or more of the following:
specifies the missing percentage threshold for the low-medium cutoff.
| Default | 5 |
|---|---|
| Range | 0–100 |
specifies the missing percentage threshold for the medium-high cutoff.
| Default | 25 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) nominal policy.
The nominalAvaptPolicy value can be one or more of the following:
specifies the AVAPT nominal policy cardinality ratio threshold.
| Default | 0.25 |
|---|---|
| Range | (0–1] |
specifies the AVAPT nominal policy cardinality threshold.
| Default | 1024 |
|---|---|
| Minimum value | 32 |
when set to True, includes numeric variables with some negative values in the nominal analysis.
| Default | FALSE |
|---|
when set to True, includes numeric variables with some nonintegral values in the nominal analysis.
| Default | FALSE |
|---|
specifies variables to consider as intervals.
specifies variables to consider as nominals.
specifies the automatic variable analysis and grouping (AVAPT) outlier policy.
The outlierAvaptPolicy value can be one or more of the following:
specifies the z-score outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the z-score outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the low-medium cutoff.
| Default | 1 |
|---|---|
| Range | 0–100 |
specifies the modified interquartile range outlier percentage threshold for the medium-high cutoff.
| Default | 2.5 |
|---|---|
| Range | 0–100 |
specifies the automatic variable analysis and grouping (AVAPT) skewness policy.
The skewnessAvaptPolicy value can be one or more of the following:
specifies the moment skewness threshold for the low-medium cutoff.
| Default | 2 |
|---|---|
| Range | 0–100 |
specifies the moment skewness threshold for the medium-high cutoff.
| Default | 10 |
|---|---|
| Range | 0–100 |
specifies the robust skewness threshold for the low-medium cutoff.
| Default | 0.75 |
|---|---|
| Range | 0–3 |
specifies the robust skewness threshold for the medium-high cutoff.
| Default | 2 |
|---|---|
| Range | 0–3 |
specifies the time variables.
| Alias | time |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | featureOut=list(name="table-name") |
|---|---|
| Shortcut form | featureOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the frequency variable.
specifies the variables to use for the analysis. You can specify a subset of the variables from the input table.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Alias | vars |
|---|
when set to True, uses the Misra-Gries algorithm for the frequency distribution estimation, if the distinct count limit is exceeded.
| Default | TRUE |
|---|
specifies the feature ranking policy.
| Alias | rank |
|---|
| Long form | rankPolicy=list(intervalStat="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU") |
|---|---|
| Shortcut form | rankPolicy="AVGQUANKURT" | "AVGQUANSKEW" | "CLASSICALKURT" | "CLASSICALSKEW" | "ENTROPY" | "MI" | "NORMMI" | "PEARSON" | "SU" |
The rankPolicy value can be one or more of the following:
specifies the interval variable ranking statistic.
| Alias | interval |
|---|---|
| Default | SU |
specifies the nominal variable ranking statistic.
| Alias | nominal |
|---|---|
| Default | SU |
when set to True, missing indicator features take part in the feature ranking. Otherwise, they are excluded from the ranking, and hence will always appear in the final feature set.
| Default | TRUE |
|---|
when set to True, performs a separate feature ranking for interval and nominal features.
| Alias | separate |
|---|---|
| Default | TRUE |
specifies the number of top-ranked interaction features to generate and save.
| Alias | topKInteract |
|---|---|
| Minimum value | 1 |
specifies the number of features per variable to save.
| Alias | topK |
|---|---|
| Default | 1 |
| Minimum value | 1 |
specifies the CAS table to store the feature transformation and generation model.
| Alias | saveModel |
|---|
| Long form | saveState=list(name="table-name") |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the variable screening policy to use for recommending that variables be screened out, transformed, or copied.
| Alias | sweeperPolicy |
|---|
The sweeperPolicy value can be one or more of the following:
when set to True, uses the variable screening policy to identify variables that have constant values.
| Alias | unique |
|---|---|
| Default | TRUE |
when set to True, uses the variable screening policy to identify nominal variables that have rare levels.
| Alias | groupRare |
|---|---|
| Default | TRUE |
specifies the variable screening policy for variables that have a very high level of information about the target. Variables that have a greater target entropy percentage reduction than the specified threshold are flagged as leakage variables.
| Alias | leakagePercentageThreshold |
|---|---|
| Default | 90 |
| Range | (0–100] |
when set to True, uses the variable screening policy to identify variables that have a low coefficient of variation (CV).
| Alias | lowCoefficientVariation |
|---|---|
| Default | TRUE |
specifies the variable screening policy for variables that have a low level of information about the target.
| Alias | lowInformation |
|---|---|
| Default | 0.05 |
| Minimum value | 0 |
specifies the variable screening policy for generating missing indicator variables.
| Alias | missingIndicatorPercentage |
|---|---|
| Default | 75 |
| Range | [10–100) |
specifies the variable screening policy for identifying variables that have a very high missing rate.
| Alias | missingPercentageThreshold |
|---|---|
| Default | 90 |
| Range | [10–100) |
specifies the symmetric uncertainty (SU) threshold for identifying redundant variables. If the SU for two variables exceeds the threshold, the variable that has less information about the target is flagged as redundant.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies a seed value for random number generation. This value is used for repeatable random number generation in some scenarios.
| Default | 0 |
|---|
specifies the table name, caslib, and other common parameters.
| Long form | table=list(name="table-name") |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | FALSE |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | FALSE |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the target variable.
| Alias | evalVar |
|---|
specifies the CAS table to store the feature transformation and generation pipelines.
| Long form | transformationOut=list(name="table-name") |
|---|---|
| Shortcut form | transformationOut="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the list of variables to create indexes for in the output data.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the feature transformation and generation space in which the feature machine operates.
| Alias | ftgPolicy |
|---|
The transformationSpace value can be one or more of the following:
when set to True, includes cardinality-reducing transformations.
| Default | TRUE |
|---|
when set to True, includes transformations for the treatment of low entropy.
| Default | FALSE |
|---|
when set to True, detects and generates interaction features.
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of low indices of qualitative variation (IQV).
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of high kurtosis.
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of missing values.
| Default | TRUE |
|---|
when set to True, includes transformations for the treatment of outliers.
| Default | FALSE |
|---|
when set to True, includes up to third-order polynomial transformations.
| Default | FALSE |
|---|
when set to True, includes transformations for the treatment of high skewness.
| Default | TRUE |
|---|
specifies the weight variable.