Model-Based Clustering Action Set

Provides actions for performing model-based clustering

mbcFit Action

Performs model-based clustering using the EM algorithm.

CASL Syntax
Summary: Input and Output Tables
Parameter Descriptions

CASL Syntax

mbc.mbcFit <result=results> <status=rc> /

attributes={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

convergenceTest="AITKEN" | "LOGL",

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"},

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",

display={

caseSensitive=TRUE | FALSE,

exclude=TRUE | FALSE,

excludeAll=TRUE | FALSE,

keyIsPath=TRUE | FALSE,

names={"string-1" <, "string-2", ...>},

pathType="LABEL" | "NAME",

traceNames=TRUE | FALSE

emEpsilon=double,

factorDetails=TRUE | FALSE,

groupByLimit=64-bit-integer,

initMethod="KMEANS" | "RANDOM",

itHist="DETAILS" | "NONE" | "SUMMARY",

maxIter=integer,

model={

depVars={{

name="variable-name"

}, {...}},

effects={{

interaction="BAR" | "CROSS" | "NONE",

maxInteract=integer,

nest={"string-1" <, "string-2", ...>},

vars={"string-1" <, "string-2", ...>}

}, {...}}

nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},

nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},

noise="N" | "Y" | {"N", "Y"},

output={

allstats=TRUE | FALSE,

casOut={

caslib="string"

compress=TRUE | FALSE

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>},

currClus="string",

loglik="string",

maxpost="string",

nextClus="string",

pred="string",

role="string"

outputTables={

groupByVarsRaw=TRUE | FALSE,

includeAll=TRUE | FALSE,

names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},

repeated=TRUE | FALSE,

replace=TRUE | FALSE

parameterEpsilon=double,

seed=integer,

singularEpsilon=double,

store={

caslib="string",

label="string",

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

table={

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

groupBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

groupByMode="NOSORT" | "REDISTRIBUTE",

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

name="table-name",

orderBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

singlePass=TRUE | FALSE,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

technique="CEM" | "EM",

topModels=64-bit-integer

;

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates a table that contains observationwise cluster membership probability estimates.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
store	—	stores models in a blob (binary large object).

Parameter Descriptions

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default	LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"}

specifies the covariance model.

Aliases	covModel
Aliases	covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default	BIC

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases	emEps
	convergence
	conv
Default	1E-05
Range	0–1

factorDetails=TRUE | FALSE

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default	FALSE

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value	1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default	RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default	NONE

DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default	500
Range	0–MACINT

* model={modelStatement}

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | {"N", "Y"}

specifies whether to include a noise cluster in the model.

Alias	hasNoiseCluster

output={mbcOutput}

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

allstats=TRUE | FALSE

when set to True, adds all statistics to the output table.

Default	FALSE

* casOut={casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

currClus="string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

loglik="string"

specifies a prefix for naming the cluster log likelihoods.

maxpost="string"

specifies a prefix for naming the maximum posterior probability cluster.

nextClus="string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default	"NEXT"

pred="string"

specifies a prefix for naming the predicted values.

role="string"

specifies the name for the column that contains the observation role.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias	parmEps
Default	1E-08
Range	1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value	1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias	singEps
Default	1E-08
Range	1E-15–1

store={casouttable}

stores models in a blob (binary large object).

Alias	savestate

Long form	store={name="table-name"}
Shortcut form	store="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	FALSE

replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default	FALSE

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default	EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default	10
Minimum value	1

mbcFit Action

Performs model-based clustering using the EM algorithm.

Lua Syntax
Summary: Input and Output Tables
Parameter Descriptions

Lua Syntax

results, info = s:mbc_mbcFit{

attributes={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

convergenceTest="AITKEN" | "LOGL",

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",

display={

caseSensitive=true | false,

exclude=true | false,

excludeAll=true | false,

keyIsPath=true | false,

names={"string-1" <, "string-2", ...>},

pathType="LABEL" | "NAME",

traceNames=true | false

emEpsilon=double,

factorDetails=true | false,

groupByLimit=64-bit-integer,

initMethod="KMEANS" | "RANDOM",

itHist="DETAILS" | "NONE" | "SUMMARY",

maxIter=integer,

model={

depVars={{

name="variable-name"

}, {...}},

effects={{

interaction="BAR" | "CROSS" | "NONE",

maxInteract=integer,

nest={"string-1" <, "string-2", ...>},

vars={"string-1" <, "string-2", ...>}

}, {...}}

nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},

nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>},

noise="N" | "Y" | {"N", "Y"},

output={

allstats=true | false,

casOut={

caslib="string"

compress=true | false

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

promote=true | false

replace=true | false

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>},

currClus="string",

loglik="string",

maxpost="string",

nextClus="string",

pred="string",

role="string"

outputTables={

groupByVarsRaw=true | false,

includeAll=true | false,

names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},

repeated=true | false,

replace=true | false

parameterEpsilon=double,

seed=integer,

singularEpsilon=double,

store={

caslib="string",

label="string",

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=true | false,

replace=true | false,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

table={

caslib="string",

computedOnDemand=true | false,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

groupBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

groupByMode="NOSORT" | "REDISTRIBUTE",

name="table-name",

orderBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

singlePass=true | false,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

technique="CEM" | "EM",

topModels=64-bit-integer

}

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates a table that contains observationwise cluster membership probability estimates.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
store	—	stores models in a blob (binary large object).

Parameter Descriptions

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default	LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | {"ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"}

specifies the covariance model.

Aliases	covModel
Aliases	covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default	BIC

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases	emEps
	convergence
	conv
Default	1E-05
Range	0–1

factorDetails=true | false

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default	false

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value	1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default	RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default	NONE

DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default	500
Range	0–MACINT

* model={modelStatement}

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | {64-bit-integer-1 <, 64-bit-integer-2, ...>}

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | {"N", "Y"}

specifies whether to include a noise cluster in the model.

Alias	hasNoiseCluster

output={mbcOutput}

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

allstats=true | false

when set to True, adds all statistics to the output table.

Default	false

* casOut={casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

currClus="string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

loglik="string"

specifies a prefix for naming the cluster log likelihoods.

maxpost="string"

specifies a prefix for naming the maximum posterior probability cluster.

nextClus="string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default	"NEXT"

pred="string"

specifies a prefix for naming the predicted values.

role="string"

specifies the name for the column that contains the observation role.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias	parmEps
Default	1E-08
Range	1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value	1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias	singEps
Default	1E-08
Range	1E-15–1

store={casouttable}

stores models in a blob (binary large object).

Alias	savestate

Long form	store={name="table-name"}
Shortcut form	store="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=true | false

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	false

replace=true | false

when set to True, overwrites an existing table that has the same name.

Default	false

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default	EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default	10
Minimum value	1

mbcFit Action

Performs model-based clustering using the EM algorithm.

Python Syntax
Summary: Input and Output Tables
Parameter Descriptions

Python Syntax

results=s.mbc.mbcFit(

attributes=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

convergenceTest="AITKEN" | "LOGL",

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | ["ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"],

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",

display={

"caseSensitive":True | False,

"exclude":True | False,

"excludeAll":True | False,

"keyIsPath":True | False,

"names":["string-1" <, "string-2", ...>],

"pathType":"LABEL" | "NAME",

"traceNames":True | False

emEpsilon=double,

factorDetails=True | False,

groupByLimit=64-bit-integer,

initMethod="KMEANS" | "RANDOM",

itHist="DETAILS" | "NONE" | "SUMMARY",

maxIter=integer,

model={

"depVars":[{

"name":"variable-name"

}<, {...}>],

"effects":[{

"interaction":"BAR" | "CROSS" | "NONE",

"maxInteract":integer,

"nest":["string-1" <, "string-2", ...>],

"vars":["string-1" <, "string-2", ...>]

}<, {...}>]

nClusters=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>],

nFactors=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>],

noise="N" | "Y" | ["N", "Y"],

output={

"allstats":True | False,

"casOut":{

"caslib":"string"

"compress":True | False

"indexVars":["variable-name-1" <, "variable-name-2", ...>]

"label":"string"

"lifetime":64-bit-integer

"maxMemSize":64-bit-integer

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

"name":"table-name"

"promote":True | False

"replace":True | False

"replication":integer

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

"threadBlockSize":64-bit-integer

"timeStamp":"string"

"where":["string-1" <, "string-2", ...>]

"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>],

"currClus":"string",

"loglik":"string",

"maxpost":"string",

"nextClus":"string",

"pred":"string",

"role":"string"

outputTables={

"groupByVarsRaw":True | False,

"includeAll":True | False,

"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},

"repeated":True | False,

"replace":True | False

parameterEpsilon=double,

seed=integer,

singularEpsilon=double,

store={

"caslib":"string",

"label":"string",

"lifetime":64-bit-integer,

"memoryFormat":"DVR" | "INHERIT" | "STANDARD",

"name":"table-name",

"promote":True | False,

"replace":True | False,

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

table={

"caslib":"string",

"computedOnDemand":True | False,

"computedVars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"computedVarsProgram":"string",

"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},

"groupBy":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"groupByMode":"NOSORT" | "REDISTRIBUTE",

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

"name":"table-name",

"orderBy":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"singlePass":True | False,

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"where":"where-expression",

"whereTable":{

"casLib":"string"

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

"name":"table-name"

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>]

"where":"where-expression"

}

technique="CEM" | "EM",

topModels=64-bit-integer

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates a table that contains observationwise cluster membership probability estimates.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
store	—	stores models in a blob (binary large object).

Parameter Descriptions

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default	LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | ["ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"]

specifies the covariance model.

Aliases	covModel
Aliases	covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default	BIC

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases	emEps
	convergence
	conv
Default	1E-05
Range	0–1

factorDetails=True | False

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default	False

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value	1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default	RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default	NONE

DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default	500
Range	0–MACINT

* model={modelStatement}

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

"depVars":[{responsevar-1} <, {responsevar-2}, ...>]

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

"name":"variable-name"

names the response variable.

"effects":[{effect-1} <, {effect-2}, ...>]

The effect value can be one or more of the following:

"interaction":"BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

"maxInteract":integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

"nest":["string-1" <, "string-2", ...>]

* "vars":["string-1" <, "string-2", ...>]

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>]

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | [64-bit-integer-1 <, 64-bit-integer-2, ...>]

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | ["N", "Y"]

specifies whether to include a noise cluster in the model.

Alias	hasNoiseCluster

output={mbcOutput}

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

"allstats":True | False

when set to True, adds all statistics to the output table.

Default	False

* "casOut":{casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>]

"currClus":"string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

"loglik":"string"

specifies a prefix for naming the cluster log likelihoods.

"maxpost":"string"

specifies a prefix for naming the maximum posterior probability cluster.

"nextClus":"string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default	"NEXT"

"pred":"string"

specifies a prefix for naming the predicted values.

"role":"string"

specifies the name for the column that contains the observation role.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias	parmEps
Default	1E-08
Range	1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value	1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias	singEps
Default	1E-08
Range	1E-15–1

store={casouttable}

stores models in a blob (binary large object).

Alias	savestate

Long form	store={"name":"table-name"}
Shortcut form	store="table-name"

The casouttable value can be one or more of the following:

"caslib":"string"

specifies the name of the caslib for the output table.

"label":"string"

specifies the descriptive label to associate with the table.

"lifetime":64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

"name":"table-name"

specifies the name for the output table.

"promote":True | False

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	False

"replace":True | False

when set to True, overwrites an existing table that has the same name.

Default	False

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table={castable}

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default	EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default	10
Minimum value	1

mbcFit Action

Performs model-based clustering using the EM algorithm.

R Syntax
Summary: Input and Output Tables
Parameter Descriptions

R Syntax

results <– cas.mbc.mbcFit(s,

attributes=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

convergenceTest="AITKEN" | "LOGL",

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | list("ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV"),

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE",

display=list(

caseSensitive=TRUE | FALSE,

exclude=TRUE | FALSE,

excludeAll=TRUE | FALSE,

keyIsPath=TRUE | FALSE,

names=list("string-1" <, "string-2", ...>),

pathType="LABEL" | "NAME",

traceNames=TRUE | FALSE

emEpsilon=double,

factorDetails=TRUE | FALSE,

groupByLimit=64-bit-integer,

initMethod="KMEANS" | "RANDOM",

itHist="DETAILS" | "NONE" | "SUMMARY",

maxIter=integer,

model=list(

depVars=list( list(

name="variable-name"

) <, list(...)>),

effects=list( list(

interaction="BAR" | "CROSS" | "NONE",

maxInteract=integer,

nest=list("string-1" <, "string-2", ...>),

vars=list("string-1" <, "string-2", ...>)

) <, list(...)>)

nClusters=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>),

nFactors=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>),

noise="N" | "Y" | list("N", "Y"),

output=list(

allstats=TRUE | FALSE,

casOut=list(

caslib="string"

compress=TRUE | FALSE

indexVars=list("variable-name-1" <, "variable-name-2", ...>)

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where=list("string-1" <, "string-2", ...>)

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>),

currClus="string",

loglik="string",

maxpost="string",

nextClus="string",

pred="string",

role="string"

outputTables=list(

groupByVarsRaw=TRUE | FALSE,

includeAll=TRUE | FALSE,

names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),

repeated=TRUE | FALSE,

replace=TRUE | FALSE

parameterEpsilon=double,

seed=integer,

singularEpsilon=double,

store=list(

caslib="string",

label="string",

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

table=list(

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

computedVarsProgram="string",

dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),

groupBy=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

groupByMode="NOSORT" | "REDISTRIBUTE",

name="table-name",

orderBy=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

singlePass=TRUE | FALSE,

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

where="where-expression",

whereTable=list(

casLib="string"

name="table-name"

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>)

where="where-expression"

)

technique="CEM" | "EM",

topModels=64-bit-integer

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates a table that contains observationwise cluster membership probability estimates.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
store	—	stores models in a blob (binary large object).

Parameter Descriptions

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

convergenceTest="AITKEN" | "LOGL"

specifies the convergence test to use.

Default	LOGL

covStruct="ALL" | "ALLGMIX" | "ALLPGMIX" | "CCC" | "CCU" | "CUC" | "CUU" | "EEE" | "EEI" | "EEV" | "EII" | "EVI" | "EVV" | "UCC" | "UCU" | "UUC" | "UUU" | "VII" | "VVI" | "VVV" | list("ALL", "ALLGMIX", "ALLPGMIX", "CCC", "CCU", "CUC", "CUU", "EEE", "EEI", "EEV", "EII", "EVI", "EVV", "UCC", "UCU", "UUC", "UUU", "VII", "VVI", "VVV")

specifies the covariance model.

Aliases	covModel
Aliases	covType

criterion="AIC" | "AICC" | "BIC" | "LOGL" | "NONE"

specifies the model selection criterion.

Default	BIC

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

emEpsilon=double

specifies the convergence criterion for the log likelihood in the expectation-maximization (EM) algorithm.

Aliases	emEps
	convergence
	conv
Default	1E-05
Range	0–1

factorDetails=TRUE | FALSE

if set to true, causes factor pattern and unique variances to be added to the parameter estimates table.

Default	FALSE

groupByLimit=64-bit-integer

suppresses the analysis if the number of BY groups exceeds the specified value.

Minimum value	1

initMethod="KMEANS" | "RANDOM"

specifies the initialization method to use if no initialization variables are specified.

Default	RANDOM

itHist="DETAILS" | "NONE" | "SUMMARY"

specifies the level of iteration history detail to include.

Default	NONE

DETAILS

includes detailed iteration history.

NONE

produces no iteration history.

SUMMARY

includes summary iteration history.

maxIter=integer

specifies the maximum number of iterations for the expectation-maximization (EM) algorithm.

Default	500
Range	0–MACINT

* model=list(modelStatement)

specifies the variables to use for analysis (effects) and the initial cluster membership probability variables (dependents).

The modelStatement value can be one or more of the following:

depVars=list( list(responsevar-1) <, list(responsevar-2), ...>)

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

name="variable-name"

names the response variable.

effects=list( list(effect-1) <, list(effect-2), ...>)

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest=list("string-1" <, "string-2", ...>)

* vars=list("string-1" <, "string-2", ...>)

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nClusters=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>)

specifies the number of Gaussian clusters.

nFactors=64-bit-integer | list(64-bit-integer-1 <, 64-bit-integer-2, ...>)

specifies the number of factors to use in parsimonious Gaussian mixture models.

noise="N" | "Y" | list("N", "Y")

specifies whether to include a noise cluster in the model.

Alias	hasNoiseCluster

output=list(mbcOutput)

creates a table that contains observationwise cluster membership probability estimates.

The mbcOutput value can be one or more of the following:

allstats=TRUE | FALSE

when set to True, adds all statistics to the output table.

Default	FALSE

* casOut=list(casouttable)

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>)

currClus="string"

specifies a prefix for naming the cluster membership probability estimates from the expectation (E) step that produced the mean and covariance estimates in the final maximization (M) step.

loglik="string"

specifies a prefix for naming the cluster log likelihoods.

maxpost="string"

specifies a prefix for naming the maximum posterior probability cluster.

nextClus="string"

specifies a prefix for naming the cluster membership probability estimates from an extra expectation (E) step that uses the mean and covariance estimates from the final maximization (M) step.

Default	"NEXT"

pred="string"

specifies a prefix for naming the predicted values.

role="string"

specifies the name for the column that contains the observation role.

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

parameterEpsilon=double

specifies the bound below which a mixture weight is treated as zero.

Alias	parmEps
Default	1E-08
Range	1E-15–1

seed=integer

specifies the seed to use for generating initial cluster memberships when initial cluster memberships are not provided.

Minimum value	1

singularEpsilon=double

specifies the singularity criterion for the covariance matrices.

Alias	singEps
Default	1E-08
Range	1E-15–1

store=list(casouttable)

stores models in a blob (binary large object).

Alias	savestate

Long form	store=list(name="table-name")
Shortcut form	store="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	FALSE

replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default	FALSE

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

* table=list(castable)

specifies the input data table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

technique="CEM" | "EM"

specifies the expectation-maximization (EM) technique to use. CEM refers to the classification EM technique.

Default	EM

topModels=64-bit-integer

specifies the number of fitted models to show in the summary table after model selection.

Default	10
Minimum value	1

Last updated: March 05, 2026