Model-Based Clustering Action Set

Provides actions for performing model-based clustering

mbcFit Action

Performs model-based clustering using the EM algorithm.

CASL Syntax

mbc.mbcFit <result=results> <status=rc> /
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
convergenceTest="AITKEN" | "LOGL",
covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"},
criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",
display={
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
},
emEpsilon=double,
factorDetails=TRUE | FALSE,
groupByLimit=64-bit-integer,
initMethod="KMEANS" | "RANDOM",
maxIter=integer,
required parameter model={
depVars={{
name="variable-name"
}, {...}},
effects={{
interaction="BAR" | "CROSS" | "NONE",
maxInteract=integer,
nest={"string-1" <, "string-2", ...>},
required parameter vars={"string-1" <, "string-2", ...>}
}, {...}}
},
nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},
nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},
noise="N" | "Y" | {"N", "Y"},
output={
allstats=TRUE | FALSE,
required parameter casOut={
caslib="string"
compress=TRUE | FALSE
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>},
currClus="string",
loglik="string",
maxpost="string",
nextClus="string",
pred="string",
role="string"
},
outputTables={
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=TRUE | FALSE,
replace=TRUE | FALSE
},
seed=integer,
store={
caslib="string",
label="string",
lifetime=64-bit-integer,
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
},
required parameter table={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
technique="CEM" | "EM",
topModels=64-bit-integer
;
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates a table that contains observationwise cluster membership probability estimates.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores models in a blob (binary large object).

Parameter Descriptions

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"}

specifies the covariance model.

Aliases covModel
covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default BIC

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases emEps
convergence
conv
Default 1E-05
Range 0–1

factorDetails=TRUE | FALSE

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default FALSE

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value 1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default NONE
DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default 500
Range 0–MACINT

* model={modelStatement}

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | {"N", "Y"}

specifies whether to include a noise cluster in the model.

Alias hasNoiseCluster

output={mbcOutput}

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

allstats=TRUE | FALSE

when set to True, adds all statistics to the output table.

Default FALSE
* casOut={casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

currClus="string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

loglik="string"

specifies a prefix for naming the cluster log likelihoods.

maxpost="string"

specifies a prefix for naming the maximum posterior probability cluster.

nextClus="string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default "NEXT"
pred="string"

specifies a prefix for naming the predicted values.

role="string"

specifies the name for the column that contains the observation role.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias parmEps
Default 1E-08
Range 1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value 1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias singEps
Default 1E-08
Range 1E-15–1

store={casouttable}

stores models in a blob (binary large object).

Alias savestate
Long form store={name="table-name"}
Shortcut form store="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default FALSE
replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default FALSE
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default 10
Minimum value 1

mbcFit Action

Performs model-based clustering using the EM algorithm.

Lua Syntax

results, info = s:mbc_mbcFit{
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
convergenceTest="AITKEN" | "LOGL",
covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"},
criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",
display={
caseSensitive=true | false,
exclude=true | false,
excludeAll=true | false,
keyIsPath=true | false,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=true | false
},
emEpsilon=double,
factorDetails=true | false,
groupByLimit=64-bit-integer,
initMethod="KMEANS" | "RANDOM",
maxIter=integer,
required parameter model={
depVars={{
name="variable-name"
}, {...}},
effects={{
interaction="BAR" | "CROSS" | "NONE",
maxInteract=integer,
nest={"string-1" <, "string-2", ...>},
required parameter vars={"string-1" <, "string-2", ...>}
}, {...}}
},
nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},
nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},
noise="N" | "Y" | {"N", "Y"},
output={
allstats=true | false,
required parameter casOut={
caslib="string"
compress=true | false
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=true | false
replace=true | false
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>},
currClus="string",
loglik="string",
maxpost="string",
nextClus="string",
pred="string",
role="string"
},
outputTables={
groupByVarsRaw=true | false,
includeAll=true | false,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=true | false,
replace=true | false
},
seed=integer,
store={
caslib="string",
label="string",
lifetime=64-bit-integer,
name="table-name",
promote=true | false,
replace=true | false,
},
required parameter table={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
technique="CEM" | "EM",
topModels=64-bit-integer
}
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates a table that contains observationwise cluster membership probability estimates.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores models in a blob (binary large object).

Parameter Descriptions

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"}

specifies the covariance model.

Aliases covModel
covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default BIC

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases emEps
convergence
conv
Default 1E-05
Range 0–1

factorDetails=true | false

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default false

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value 1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default NONE
DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default 500
Range 0–MACINT

* model={modelStatement}

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | {"N", "Y"}

specifies whether to include a noise cluster in the model.

Alias hasNoiseCluster

output={mbcOutput}

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

allstats=true | false

when set to True, adds all statistics to the output table.

Default false
* casOut={casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

currClus="string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

loglik="string"

specifies a prefix for naming the cluster log likelihoods.

maxpost="string"

specifies a prefix for naming the maximum posterior probability cluster.

nextClus="string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default "NEXT"
pred="string"

specifies a prefix for naming the predicted values.

role="string"

specifies the name for the column that contains the observation role.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias parmEps
Default 1E-08
Range 1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value 1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias singEps
Default 1E-08
Range 1E-15–1

store={casouttable}

stores models in a blob (binary large object).

Alias savestate
Long form store={name="table-name"}
Shortcut form store="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=true | false

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default false
replace=true | false

when set to True, overwrites an existing table that has the same name.

Default false
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default 10
Minimum value 1

mbcFit Action

Performs model-based clustering using the EM algorithm.

Python Syntax

results=s.mbc.mbcFit(
attributes=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
convergenceTest="AITKEN" | "LOGL",
covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | ["ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"],
criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",
display={
"caseSensitive":True | False,
"exclude":True | False,
"excludeAll":True | False,
"keyIsPath":True | False,
"names":["string-1" <, "string-2", ...>],
"pathType":"LABEL" | "NAME",
"traceNames":True | False
},
emEpsilon=double,
factorDetails=True | False,
groupByLimit=64-bit-integer,
initMethod="KMEANS" | "RANDOM",
maxIter=integer,
required parameter model={
"depVars":[{
"name":"variable-name"
}<, {...}>],
"effects":[{
"interaction":"BAR" | "CROSS" | "NONE",
"maxInteract":integer,
"nest":["string-1" <, "string-2", ...>],
required parameter "vars":["string-1" <, "string-2", ...>]
}<, {...}>]
},
nClusters=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>],
nFactors=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>],
noise="N" | "Y" | ["N", "Y"],
output={
"allstats":True | False,
required parameter "casOut":{
"caslib":"string"
"compress":True | False
"indexVars":["variable-name-1" <, "variable-name-2", ...>]
"label":"string"
"lifetime":64-bit-integer
"maxMemSize":64-bit-integer
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"
"name":"table-name"
"promote":True | False
"replace":True | False
"replication":integer
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"
"threadBlockSize":64-bit-integer
"timeStamp":"string"
"where":["string-1" <, "string-2", ...>]
},
"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>],
"currClus":"string",
"loglik":"string",
"maxpost":"string",
"nextClus":"string",
"pred":"string",
"role":"string"
},
outputTables={
"groupByVarsRaw":True | False,
"includeAll":True | False,
"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},
"repeated":True | False,
"replace":True | False
},
seed=integer,
store={
"caslib":"string",
"label":"string",
"lifetime":64-bit-integer,
"name":"table-name",
"promote":True | False,
"replace":True | False,
},
required parameter table={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
technique="CEM" | "EM",
topModels=64-bit-integer
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates a table that contains observationwise cluster membership probability estimates.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores models in a blob (binary large object).

Parameter Descriptions

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | ["ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"]

specifies the covariance model.

Aliases covModel
covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default BIC

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases emEps
convergence
conv
Default 1E-05
Range 0–1

factorDetails=True | False

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default False

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value 1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default NONE
DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default 500
Range 0–MACINT

* model={modelStatement}

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

"depVars":[{responsevar-1} <, {responsevar-2}, ...>]

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
"name":"variable-name"

names the response variable.

"effects":[{effect-1} <, {effect-2}, ...>]

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

"interaction":"BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
"maxInteract":integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

"nest":["string-1" <, "string-2", ...>]

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* "vars":["string-1" <, "string-2", ...>]

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>]

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>]

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | ["N", "Y"]

specifies whether to include a noise cluster in the model.

Alias hasNoiseCluster

output={mbcOutput}

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

"allstats":True | False

when set to True, adds all statistics to the output table.

Default False
* "casOut":{casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>]

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

"currClus":"string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

"loglik":"string"

specifies a prefix for naming the cluster log likelihoods.

"maxpost":"string"

specifies a prefix for naming the maximum posterior probability cluster.

"nextClus":"string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default "NEXT"
"pred":"string"

specifies a prefix for naming the predicted values.

"role":"string"

specifies the name for the column that contains the observation role.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias parmEps
Default 1E-08
Range 1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value 1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias singEps
Default 1E-08
Range 1E-15–1

store={casouttable}

stores models in a blob (binary large object).

Alias savestate
Long form store={"name":"table-name"}
Shortcut form store="table-name"

The casouttable value can be one or more of the following:

"caslib":"string"

specifies the name of the caslib for the output table.

"label":"string"

specifies the descriptive label to associate with the table.

"lifetime":64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

"name":"table-name"

specifies the name for the output table.

"promote":True | False

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default False
"replace":True | False

when set to True, overwrites an existing table that has the same name.

Default False
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default 10
Minimum value 1

mbcFit Action

Performs model-based clustering using the EM algorithm.

R Syntax

results <– cas.mbc.mbcFit(s,
attributes=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
convergenceTest="AITKEN" | "LOGL",
covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | list("ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"),
criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",
display=list(
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>),
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
),
emEpsilon=double,
factorDetails=TRUE | FALSE,
groupByLimit=64-bit-integer,
initMethod="KMEANS" | "RANDOM",
maxIter=integer,
required parameter model=list(
depVars=list( list(
name="variable-name"
) <, list(...)>),
effects=list( list(
interaction="BAR" | "CROSS" | "NONE",
maxInteract=integer,
nest=list("string-1" <, "string-2", ...>),
required parameter vars=list("string-1" <, "string-2", ...>)
) <, list(...)>)
),
nClusters=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>),
nFactors=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>),
noise="N" | "Y" | list("N", "Y"),
output=list(
allstats=TRUE | FALSE,
required parameter casOut=list(
caslib="string"
compress=TRUE | FALSE
indexVars=list("variable-name-1" <, "variable-name-2", ...>)
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where=list("string-1" <, "string-2", ...>)
),
copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>),
currClus="string",
loglik="string",
maxpost="string",
nextClus="string",
pred="string",
role="string"
),
outputTables=list(
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),
repeated=TRUE | FALSE,
replace=TRUE | FALSE
),
seed=integer,
store=list(
caslib="string",
label="string",
lifetime=64-bit-integer,
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
),
required parameter table=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
technique="CEM" | "EM",
topModels=64-bit-integer
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the input data table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates a table that contains observationwise cluster membership probability estimates.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 store

stores models in a blob (binary large object).

Parameter Descriptions

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | list("ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV")

specifies the covariance model.

Aliases covModel
covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default BIC

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases emEps
convergence
conv
Default 1E-05
Range 0–1

factorDetails=TRUE | FALSE

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default FALSE

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value 1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default NONE
DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default 500
Range 0–MACINT

* model=list(modelStatement)

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

depVars=list( list(responsevar-1) <, list(responsevar-2), ...>)

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
name="variable-name"

names the response variable.

effects=list( list(effect-1) <, list(effect-2), ...>)

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest=list("string-1" <, "string-2", ...>)

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars=list("string-1" <, "string-2", ...>)

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>)

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>)

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | list("N", "Y")

specifies whether to include a noise cluster in the model.

Alias hasNoiseCluster

output=list(mbcOutput)

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

allstats=TRUE | FALSE

when set to True, adds all statistics to the output table.

Default FALSE
* casOut=list(casouttable)

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>)

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

currClus="string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

loglik="string"

specifies a prefix for naming the cluster log likelihoods.

maxpost="string"

specifies a prefix for naming the maximum posterior probability cluster.

nextClus="string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default "NEXT"
pred="string"

specifies a prefix for naming the predicted values.

role="string"

specifies the name for the column that contains the observation role.

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias parmEps
Default 1E-08
Range 1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value 1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias singEps
Default 1E-08
Range 1E-15–1

store=list(casouttable)

stores models in a blob (binary large object).

Alias savestate
Long form store=list(name="table-name")
Shortcut form store="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default FALSE
replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default FALSE
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table=list(castable)

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default 10
Minimum value 1
Last updated: March 05, 2026