Clustering Action Set

Provides actions for clustering

kClus Action

Provides k-means clustering.

CASL Syntax
Summary: Input and Output Tables
Parameter Descriptions

CASL Syntax

clustering.kClus <result=results> <status=rc> /

applyRowOrder=TRUE | FALSE,

attributes={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

code={

casOut={

caslib="string"

compress=TRUE | FALSE

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

onDemand=TRUE | FALSE

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

comment=TRUE | FALSE,

fmtWdth=integer,

indentSize=integer,

intoCutPt=double,

iProb=TRUE | FALSE,

labelId=integer,

lineSize=integer,

noTrim=TRUE | FALSE,

pCatAll=TRUE | FALSE,

tabForm=TRUE | FALSE

display={

caseSensitive=TRUE | FALSE,

exclude=TRUE | FALSE,

excludeAll=TRUE | FALSE,

keyIsPath=TRUE | FALSE,

names={"string-1" <, "string-2", ...>},

pathType="LABEL" | "NAME",

traceNames=TRUE | FALSE

distance="EUCLIDEAN" | "MANHATTAN",

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ",

estimateNClusters={

align="NONE" | "PCA",

B=integer,

criterion="ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE",

method="ABC" | "NONE",

minClusters=integer

freq="variable-name",

impute="MEAN" | "NONE",

imputeNom="MODE" | "NONE",

init="FORGY" | "RAND",

inputs={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

kPrototypeParams={

gammaUserVal=double,

method="AUTOGAMMA" | "USERGAMMA"

maxClusters=integer,

maxIters=integer,

nominals={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

output={

casOut={

caslib="string"

compress=TRUE | FALSE

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

onDemand=TRUE | FALSE

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

outputTables={

groupByVarsRaw=TRUE | FALSE,

includeAll=TRUE | FALSE,

names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},

repeated=TRUE | FALSE,

replace=TRUE | FALSE

outStat={

caslib="string",

compress=TRUE | FALSE,

indexVars={"variable-name-1" <, "variable-name-2", ...>},

label="string",

lifetime=64-bit-integer,

maxMemSize=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

replication=integer,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",

threadBlockSize=64-bit-integer,

timeStamp="string",

where={"string-1" <, "string-2", ...>}

printIter=TRUE | FALSE,

saveState={

caslib="string",

label="string",

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

seed=double,

standardize="NONE" | "RANGE" | "STD",

stopCriterion={

method="CLUSTER_CHANGE" | "WCSS_CHANGE",

value=double

table={

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

name="table-name",

singlePass=TRUE | FALSE,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

weight="variable-name"

;

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
code	casOut	writes SAS DATA step code for computing the cluster assignments by using the cluster centers.
outStat	—	specifies the cluster centers table.
output	required parametercasOut	creates a table on the server that contains observationwise clustering information, which is computed after clustering.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
saveState	—	specifies to the table in which to save the model state for future model prediction.

Parameter Descriptions

applyRowOrder=TRUE | FALSE

specifies that you wish that the action uses a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call.

Alias	reproducibleRowOrder
Default	FALSE

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	attribute

code={aircodegen}

writes SAS DATA step code for computing the cluster assignments by using the cluster centers.

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

distance="EUCLIDEAN" | "MANHATTAN"

specifies the distance measure for similarity that is used for interval input variables.

Default	EUCLIDEAN

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ"

specifies the distance measure for similarity that is used for nominal input variables.

Default	BINARY

estimateNClusters={nClustersStmt}

specifies the method and the values for that method to be used for estimating the number of clusters.

Long form	estimateNClusters={method="ABC" \| "NONE"}
Shortcut form	estimateNClusters="ABC" \| "NONE"

The nClustersStmt value can be one or more of the following:

align="NONE" | "PCA"

specifies the method for aligning the reference data based on the input data.

Default	NONE

B=integer

specifies the amount of reference data to be created for each cluster candidate when the ABC method is used.

Default	1

criterion="ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE"

specifies the criterion to be used to estimate the number of clusters that use the statistics that are obtained by the ABC method.

Default	GLOBALPEAK

method="ABC" | "NONE"

specifies the method for estimating the number of clusters. ABC estimates the number of clusters by using the aligned box criterion (ABC) method. NONE does not estimate the number of clusters and uses the value specified in the nClusters parameter.

Default	NONE

minClusters=integer

specifies the minimum number of clusters to use in searching for the best number of clusters.

Default	2

freq="variable-name"

names the numeric variable that contains the frequency of occurrence for each observation.

impute="MEAN" | "NONE"

specifies the imputation method to be used when the input variables are interval.

Default	NONE

imputeNom="MODE" | "NONE"

specifies the imputation method to be used when the input variables are nominal.

Default	NONE

init="FORGY" | "RAND"

specifies the method for obtaining the initial estimate of cluster centers.

Default	FORGY

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	input

kPrototypeParams={kProtoStmt}

specifies the parameters to use when the input variables are nominal and interval.

Long form	kPrototypeParams={method="AUTOGAMMA" \| "USERGAMMA"}
Shortcut form	kPrototypeParams="AUTOGAMMA" \| "USERGAMMA"

The kProtoStmt value can be one or more of the following:

gammaUserVal=double

specifies the value of the gamma parameter in the k-prototypes algorithm.

Alias	value
Default	0.5

method="AUTOGAMMA" | "USERGAMMA"

specifies the method for generating the gamma parameter in the k-prototypes algorithm.

maxClusters=integer

specifies either the number of clusters to use or the maximum number of clusters to search when you estimate the number of clusters.

Alias	nClusters
Default	6

maxIters=integer

specifies the maximum number of iterations for the algorithm to perform.

Default	10

nominals={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	nominal

output={outputStatement}

creates a table on the server that contains observationwise clustering information, which is computed after clustering.

For more information about specifying the output parameter, see the common outputStatement parameter (Appendix A: Common Parameters).

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias	displayOut

outStat={casouttable}

specifies the cluster centers table.

For more information about specifying the outStat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

printIter=TRUE | FALSE

When set to True, outputs the cluster centers to the output cluster centers table for each iteration.

Default	FALSE

saveState={casouttable}

specifies to the table in which to save the model state for future model prediction.

Long form	saveState={name="table-name"}
Shortcut form	saveState="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	FALSE

replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default	FALSE

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

seed=double

specifies an integer to be used to start the pseudorandom number generator for initialization.

Default	0

standardize="NONE" | "RANGE" | "STD"

specifies the method for standardizing the interval input variables.

Default	NONE

stopCriterion={stopCriterionStmt}

specifies the method and the value for that method to be used for convergence. If you do not specify this parameter, the algorithm stops after it reaches the maximum number of iterations.

Long form	stopCriterion={method="CLUSTER_CHANGE" \| "WCSS_CHANGE"}
Shortcut form	stopCriterion="CLUSTER_CHANGE" \| "WCSS_CHANGE"

The stopCriterionStmt value can be one or more of the following:

method="CLUSTER_CHANGE" | "WCSS_CHANGE"

specifies the method to be used for convergence. CLUSTER_CHANGE uses the percentile of observations that do not change their cluster membership for that iteration. WCSS_CHANGE uses the within-cluster distance change as a convergence criterion.

Default	CLUSTER_CHANGE

value=double

specifies the value to be used with the specified convergence method. When this value is used with CLUSTER_CHANGE, it enables you to specify the percentile of observations. When it is used with WCSS_CHANGE, it enables you to specify the change in SSE for the k-means algorithm or the sum of within-cluster distances for the k-modes algorithm.

Default	0

* table={castable}

specifies the input data table.

Long form	table={name="table-name"}
Shortcut form	table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	FALSE

computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.

Alias	compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases	options
Aliases	dataSource

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	FALSE

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

weight="variable-name"

specifies the numeric variable to use to perform a weighted analysis of the data.

kClus Action

Provides k-means clustering.

Lua Syntax
Summary: Input and Output Tables
Parameter Descriptions

Lua Syntax

results, info = s:clustering_kClus{

applyRowOrder=true | false,

attributes={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

code={

casOut={

caslib="string"

compress=true | false

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

onDemand=true | false

promote=true | false

replace=true | false

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

comment=true | false,

fmtWdth=integer,

indentSize=integer,

intoCutPt=double,

iProb=true | false,

labelId=integer,

lineSize=integer,

noTrim=true | false,

pCatAll=true | false,

tabForm=true | false

display={

caseSensitive=true | false,

exclude=true | false,

excludeAll=true | false,

keyIsPath=true | false,

names={"string-1" <, "string-2", ...>},

pathType="LABEL" | "NAME",

traceNames=true | false

distance="EUCLIDEAN" | "MANHATTAN",

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ",

estimateNClusters={

align="NONE" | "PCA",

B=integer,

criterion="ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE",

method="ABC" | "NONE",

minClusters=integer

freq="variable-name",

impute="MEAN" | "NONE",

imputeNom="MODE" | "NONE",

init="FORGY" | "RAND",

inputs={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

kPrototypeParams={

gammaUserVal=double,

method="AUTOGAMMA" | "USERGAMMA"

maxClusters=integer,

maxIters=integer,

nominals={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

output={

casOut={

caslib="string"

compress=true | false

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

onDemand=true | false

promote=true | false

replace=true | false

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

outputTables={

groupByVarsRaw=true | false,

includeAll=true | false,

names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},

repeated=true | false,

replace=true | false

outStat={

caslib="string",

compress=true | false,

indexVars={"variable-name-1" <, "variable-name-2", ...>},

label="string",

lifetime=64-bit-integer,

maxMemSize=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=true | false,

replace=true | false,

replication=integer,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",

threadBlockSize=64-bit-integer,

timeStamp="string",

where={"string-1" <, "string-2", ...>}

printIter=true | false,

saveState={

caslib="string",

label="string",

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=true | false,

replace=true | false,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

seed=double,

standardize="NONE" | "RANGE" | "STD",

stopCriterion={

method="CLUSTER_CHANGE" | "WCSS_CHANGE",

value=double

table={

caslib="string",

computedOnDemand=true | false,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

name="table-name",

singlePass=true | false,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

weight="variable-name"

}

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
code	casOut	writes SAS DATA step code for computing the cluster assignments by using the cluster centers.
outStat	—	specifies the cluster centers table.
output	required parametercasOut	creates a table on the server that contains observationwise clustering information, which is computed after clustering.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
saveState	—	specifies to the table in which to save the model state for future model prediction.

Parameter Descriptions

applyRowOrder=true | false

specifies that you wish that the action uses a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call.

Alias	reproducibleRowOrder
Default	false

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	attribute

code={aircodegen}

writes SAS DATA step code for computing the cluster assignments by using the cluster centers.

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

distance="EUCLIDEAN" | "MANHATTAN"

specifies the distance measure for similarity that is used for interval input variables.

Default	EUCLIDEAN

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ"

specifies the distance measure for similarity that is used for nominal input variables.

Default	BINARY

estimateNClusters={nClustersStmt}

specifies the method and the values for that method to be used for estimating the number of clusters.

Long form	estimateNClusters={method="ABC" \| "NONE"}
Shortcut form	estimateNClusters="ABC" \| "NONE"

The nClustersStmt value can be one or more of the following:

align="NONE" | "PCA"

specifies the method for aligning the reference data based on the input data.

Default	NONE

B=integer

specifies the amount of reference data to be created for each cluster candidate when the ABC method is used.

Default	1

criterion="ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE"

specifies the criterion to be used to estimate the number of clusters that use the statistics that are obtained by the ABC method.

Default	GLOBALPEAK

method="ABC" | "NONE"

Default	NONE

minClusters=integer

specifies the minimum number of clusters to use in searching for the best number of clusters.

Default	2

freq="variable-name"

names the numeric variable that contains the frequency of occurrence for each observation.

impute="MEAN" | "NONE"

specifies the imputation method to be used when the input variables are interval.

Default	NONE

imputeNom="MODE" | "NONE"

specifies the imputation method to be used when the input variables are nominal.

Default	NONE

init="FORGY" | "RAND"

specifies the method for obtaining the initial estimate of cluster centers.

Default	FORGY

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	input

kPrototypeParams={kProtoStmt}

specifies the parameters to use when the input variables are nominal and interval.

Long form	kPrototypeParams={method="AUTOGAMMA" \| "USERGAMMA"}
Shortcut form	kPrototypeParams="AUTOGAMMA" \| "USERGAMMA"

The kProtoStmt value can be one or more of the following:

gammaUserVal=double

specifies the value of the gamma parameter in the k-prototypes algorithm.

Alias	value
Default	0.5

method="AUTOGAMMA" | "USERGAMMA"

specifies the method for generating the gamma parameter in the k-prototypes algorithm.

maxClusters=integer

specifies either the number of clusters to use or the maximum number of clusters to search when you estimate the number of clusters.

Alias	nClusters
Default	6

maxIters=integer

specifies the maximum number of iterations for the algorithm to perform.

Default	10

nominals={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	nominal

output={outputStatement}

creates a table on the server that contains observationwise clustering information, which is computed after clustering.

For more information about specifying the output parameter, see the common outputStatement parameter (Appendix A: Common Parameters).

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias	displayOut

outStat={casouttable}

specifies the cluster centers table.

For more information about specifying the outStat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

printIter=true | false

When set to True, outputs the cluster centers to the output cluster centers table for each iteration.

Default	false

saveState={casouttable}

specifies to the table in which to save the model state for future model prediction.

Long form	saveState={name="table-name"}
Shortcut form	saveState="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=true | false

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	false

replace=true | false

when set to True, overwrites an existing table that has the same name.

Default	false

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

seed=double

specifies an integer to be used to start the pseudorandom number generator for initialization.

Default	0

standardize="NONE" | "RANGE" | "STD"

specifies the method for standardizing the interval input variables.

Default	NONE

stopCriterion={stopCriterionStmt}

specifies the method and the value for that method to be used for convergence. If you do not specify this parameter, the algorithm stops after it reaches the maximum number of iterations.

Long form	stopCriterion={method="CLUSTER_CHANGE" \| "WCSS_CHANGE"}
Shortcut form	stopCriterion="CLUSTER_CHANGE" \| "WCSS_CHANGE"

The stopCriterionStmt value can be one or more of the following:

method="CLUSTER_CHANGE" | "WCSS_CHANGE"

Default	CLUSTER_CHANGE

value=double

Default	0

* table={castable}

specifies the input data table.

Long form	table={name="table-name"}
Shortcut form	table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=true | false

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	false

computedVars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

Alias	compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>}

specifies data source options.

Aliases	options
Aliases	dataSource

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

singlePass=true | false

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	false

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable={groupbytable}

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

weight="variable-name"

specifies the numeric variable to use to perform a weighted analysis of the data.

kClus Action

Provides k-means clustering.

Python Syntax
Summary: Input and Output Tables
Parameter Descriptions

Python Syntax

results=s.clustering.kClus(

applyRowOrder=True | False,

attributes=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

code={

"casOut":{

"caslib":"string"

"compress":True | False

"indexVars":["variable-name-1" <, "variable-name-2", ...>]

"label":"string"

"lifetime":64-bit-integer

"maxMemSize":64-bit-integer

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

"name":"table-name"

"onDemand":True | False

"promote":True | False

"replace":True | False

"replication":integer

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

"threadBlockSize":64-bit-integer

"timeStamp":"string"

"where":["string-1" <, "string-2", ...>]

"comment":True | False,

"fmtWdth":integer,

"indentSize":integer,

"intoCutPt":double,

"iProb":True | False,

"labelId":integer,

"lineSize":integer,

"noTrim":True | False,

"pCatAll":True | False,

"tabForm":True | False

display={

"caseSensitive":True | False,

"exclude":True | False,

"excludeAll":True | False,

"keyIsPath":True | False,

"names":["string-1" <, "string-2", ...>],

"pathType":"LABEL" | "NAME",

"traceNames":True | False

distance="EUCLIDEAN" | "MANHATTAN",

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ",

estimateNClusters={

"align":"NONE" | "PCA",

"B":integer,

"criterion":"ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE",

"method":"ABC" | "NONE",

"minClusters":integer

freq="variable-name",

impute="MEAN" | "NONE",

imputeNom="MODE" | "NONE",

init="FORGY" | "RAND",

inputs=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

kPrototypeParams={

"gammaUserVal":double,

"method":"AUTOGAMMA" | "USERGAMMA"

maxClusters=integer,

maxIters=integer,

nominals=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

output={

"casOut":{

"caslib":"string"

"compress":True | False

"indexVars":["variable-name-1" <, "variable-name-2", ...>]

"label":"string"

"lifetime":64-bit-integer

"maxMemSize":64-bit-integer

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

"name":"table-name"

"onDemand":True | False

"promote":True | False

"replace":True | False

"replication":integer

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

"threadBlockSize":64-bit-integer

"timeStamp":"string"

"where":["string-1" <, "string-2", ...>]

"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>]

outputTables={

"groupByVarsRaw":True | False,

"includeAll":True | False,

"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},

"repeated":True | False,

"replace":True | False

outStat={

"caslib":"string",

"compress":True | False,

"indexVars":["variable-name-1" <, "variable-name-2", ...>],

"label":"string",

"lifetime":64-bit-integer,

"maxMemSize":64-bit-integer,

"memoryFormat":"DVR" | "INHERIT" | "STANDARD",

"name":"table-name",

"promote":True | False,

"replace":True | False,

"replication":integer,

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",

"threadBlockSize":64-bit-integer,

"timeStamp":"string",

"where":["string-1" <, "string-2", ...>]

printIter=True | False,

saveState={

"caslib":"string",

"label":"string",

"lifetime":64-bit-integer,

"memoryFormat":"DVR" | "INHERIT" | "STANDARD",

"name":"table-name",

"promote":True | False,

"replace":True | False,

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

seed=double,

standardize="NONE" | "RANGE" | "STD",

stopCriterion={

"method":"CLUSTER_CHANGE" | "WCSS_CHANGE",

"value":double

table={

"caslib":"string",

"computedOnDemand":True | False,

"computedVars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"computedVarsProgram":"string",

"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

"name":"table-name",

"singlePass":True | False,

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"where":"where-expression",

"whereTable":{

"casLib":"string"

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

"name":"table-name"

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>]

"where":"where-expression"

}

weight="variable-name"

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
code	casOut	writes SAS DATA step code for computing the cluster assignments by using the cluster centers.
outStat	—	specifies the cluster centers table.
output	required parametercasOut	creates a table on the server that contains observationwise clustering information, which is computed after clustering.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
saveState	—	specifies to the table in which to save the model state for future model prediction.

Parameter Descriptions

applyRowOrder=True | False

specifies that you wish that the action uses a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call.

Alias	reproducibleRowOrder
Default	False

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	attribute

code={aircodegen}

writes SAS DATA step code for computing the cluster assignments by using the cluster centers.

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

distance="EUCLIDEAN" | "MANHATTAN"

specifies the distance measure for similarity that is used for interval input variables.

Default	EUCLIDEAN

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ"

specifies the distance measure for similarity that is used for nominal input variables.

Default	BINARY

estimateNClusters={nClustersStmt}

specifies the method and the values for that method to be used for estimating the number of clusters.

Long form	estimateNClusters={"method":"ABC" \| "NONE"}
Shortcut form	estimateNClusters="ABC" \| "NONE"

The nClustersStmt value can be one or more of the following:

"align":"NONE" | "PCA"

specifies the method for aligning the reference data based on the input data.

Default	NONE

"B":integer

specifies the amount of reference data to be created for each cluster candidate when the ABC method is used.

Default	1

"criterion":"ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE"

specifies the criterion to be used to estimate the number of clusters that use the statistics that are obtained by the ABC method.

Default	GLOBALPEAK

"method":"ABC" | "NONE"

Default	NONE

"minClusters":integer

specifies the minimum number of clusters to use in searching for the best number of clusters.

Default	2

freq="variable-name"

names the numeric variable that contains the frequency of occurrence for each observation.

impute="MEAN" | "NONE"

specifies the imputation method to be used when the input variables are interval.

Default	NONE

imputeNom="MODE" | "NONE"

specifies the imputation method to be used when the input variables are nominal.

Default	NONE

init="FORGY" | "RAND"

specifies the method for obtaining the initial estimate of cluster centers.

Default	FORGY

inputs=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	input

kPrototypeParams={kProtoStmt}

specifies the parameters to use when the input variables are nominal and interval.

Long form	kPrototypeParams={"method":"AUTOGAMMA" \| "USERGAMMA"}
Shortcut form	kPrototypeParams="AUTOGAMMA" \| "USERGAMMA"

The kProtoStmt value can be one or more of the following:

"gammaUserVal":double

specifies the value of the gamma parameter in the k-prototypes algorithm.

Alias	value
Default	0.5

"method":"AUTOGAMMA" | "USERGAMMA"

specifies the method for generating the gamma parameter in the k-prototypes algorithm.

maxClusters=integer

specifies either the number of clusters to use or the maximum number of clusters to search when you estimate the number of clusters.

Alias	nClusters
Default	6

maxIters=integer

specifies the maximum number of iterations for the algorithm to perform.

Default	10

nominals=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	nominal

output={outputStatement}

creates a table on the server that contains observationwise clustering information, which is computed after clustering.

For more information about specifying the output parameter, see the common outputStatement parameter (Appendix A: Common Parameters).

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias	displayOut

outStat={casouttable}

specifies the cluster centers table.

For more information about specifying the outStat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

printIter=True | False

When set to True, outputs the cluster centers to the output cluster centers table for each iteration.

Default	False

saveState={casouttable}

specifies to the table in which to save the model state for future model prediction.

Long form	saveState={"name":"table-name"}
Shortcut form	saveState="table-name"

The casouttable value can be one or more of the following:

"caslib":"string"

specifies the name of the caslib for the output table.

"label":"string"

specifies the descriptive label to associate with the table.

"lifetime":64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

"name":"table-name"

specifies the name for the output table.

"promote":True | False

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	False

"replace":True | False

when set to True, overwrites an existing table that has the same name.

Default	False

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

seed=double

specifies an integer to be used to start the pseudorandom number generator for initialization.

Default	0

standardize="NONE" | "RANGE" | "STD"

specifies the method for standardizing the interval input variables.

Default	NONE

stopCriterion={stopCriterionStmt}

specifies the method and the value for that method to be used for convergence. If you do not specify this parameter, the algorithm stops after it reaches the maximum number of iterations.

Long form	stopCriterion={"method":"CLUSTER_CHANGE" \| "WCSS_CHANGE"}
Shortcut form	stopCriterion="CLUSTER_CHANGE" \| "WCSS_CHANGE"

The stopCriterionStmt value can be one or more of the following:

"method":"CLUSTER_CHANGE" | "WCSS_CHANGE"

Default	CLUSTER_CHANGE

"value":double

Default	0

* table={castable}

specifies the input data table.

Long form	table={"name":"table-name"}
Shortcut form	table="table-name"

The castable value can be one or more of the following:

"caslib":"string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

"computedOnDemand":True | False

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	False

"computedVars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

Alias	compVars

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"computedVarsProgram":"string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>}

specifies data source options.

Aliases	options
Aliases	dataSource

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the input table.

"singlePass":True | False

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	False

"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the input data.

"whereTable":{groupbytable}

The groupbytable value can be one or more of the following:

"casLib":"string"

specifies the caslib for the filter table. By default, the active caslib is used.

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}

specifies the settings for reading a table from a data source.

Alias	import_

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* "name":"table-name"

specifies the name of the filter table.

"vars":[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

"format":"string"

specifies the format to apply to the variable.

"formattedLength":integer

specifies the length of the format field plus the length of the format precision.

"label":"string"

specifies the descriptive label for the variable.

* "name":"variable-name"

specifies the name for the variable.

"nfd":integer

specifies the length of the format precision.

"nfl":integer

specifies the length of the format field.

"where":"where-expression"

specifies an expression for subsetting the data from the filter table.

weight="variable-name"

specifies the numeric variable to use to perform a weighted analysis of the data.

kClus Action

Provides k-means clustering.

R Syntax
Summary: Input and Output Tables
Parameter Descriptions

R Syntax

results <– cas.clustering.kClus(s,

applyRowOrder=TRUE | FALSE,

attributes=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

code=list(

casOut=list(

caslib="string"

compress=TRUE | FALSE

indexVars=list("variable-name-1" <, "variable-name-2", ...>)

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

onDemand=TRUE | FALSE

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where=list("string-1" <, "string-2", ...>)

comment=TRUE | FALSE,

fmtWdth=integer,

indentSize=integer,

intoCutPt=double,

iProb=TRUE | FALSE,

labelId=integer,

lineSize=integer,

noTrim=TRUE | FALSE,

pCatAll=TRUE | FALSE,

tabForm=TRUE | FALSE

display=list(

caseSensitive=TRUE | FALSE,

exclude=TRUE | FALSE,

excludeAll=TRUE | FALSE,

keyIsPath=TRUE | FALSE,

names=list("string-1" <, "string-2", ...>),

pathType="LABEL" | "NAME",

traceNames=TRUE | FALSE

distance="EUCLIDEAN" | "MANHATTAN",

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ",

estimateNClusters=list(

align="NONE" | "PCA",

B=integer,

criterion="ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE",

method="ABC" | "NONE",

minClusters=integer

freq="variable-name",

impute="MEAN" | "NONE",

imputeNom="MODE" | "NONE",

init="FORGY" | "RAND",

inputs=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

kPrototypeParams=list(

gammaUserVal=double,

method="AUTOGAMMA" | "USERGAMMA"

maxClusters=integer,

maxIters=integer,

nominals=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

output=list(

casOut=list(

caslib="string"

compress=TRUE | FALSE

indexVars=list("variable-name-1" <, "variable-name-2", ...>)

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

onDemand=TRUE | FALSE

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where=list("string-1" <, "string-2", ...>)

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>)

outputTables=list(

groupByVarsRaw=TRUE | FALSE,

includeAll=TRUE | FALSE,

names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),

repeated=TRUE | FALSE,

replace=TRUE | FALSE

outStat=list(

caslib="string",

compress=TRUE | FALSE,

indexVars=list("variable-name-1" <, "variable-name-2", ...>),

label="string",

lifetime=64-bit-integer,

maxMemSize=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

replication=integer,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",

threadBlockSize=64-bit-integer,

timeStamp="string",

where=list("string-1" <, "string-2", ...>)

printIter=TRUE | FALSE,

saveState=list(

caslib="string",

label="string",

lifetime=64-bit-integer,

memoryFormat="DVR" | "INHERIT" | "STANDARD",

name="table-name",

promote=TRUE | FALSE,

replace=TRUE | FALSE,

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

seed=double,

standardize="NONE" | "RANGE" | "STD",

stopCriterion=list(

method="CLUSTER_CHANGE" | "WCSS_CHANGE",

value=double

table=list(

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

computedVarsProgram="string",

dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),

name="table-name",

singlePass=TRUE | FALSE,

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

where="where-expression",

whereTable=list(

casLib="string"

name="table-name"

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>)

where="where-expression"

)

weight="variable-name"

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the input data table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
code	casOut	writes SAS DATA step code for computing the cluster assignments by using the cluster centers.
outStat	—	specifies the cluster centers table.
output	required parametercasOut	creates a table on the server that contains observationwise clustering information, which is computed after clustering.
outputTables	names	lists the names of results tables to save as CAS tables on the server.
saveState	—	specifies to the table in which to save the model state for future model prediction.

Parameter Descriptions

applyRowOrder=TRUE | FALSE

specifies that you wish that the action uses a prespecified row ordering. This requires using the orderby and groupby parameters on a preliminary table.partition action call.

Alias	reproducibleRowOrder
Default	FALSE

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	attribute

code=list(aircodegen)

writes SAS DATA step code for computing the cluster assignments by using the cluster centers.

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

distance="EUCLIDEAN" | "MANHATTAN"

specifies the distance measure for similarity that is used for interval input variables.

Default	EUCLIDEAN

distanceNom="BINARY" | "GLOBALFREQ" | "RELATIVEFREQ"

specifies the distance measure for similarity that is used for nominal input variables.

Default	BINARY

estimateNClusters=list(nClustersStmt)

specifies the method and the values for that method to be used for estimating the number of clusters.

Long form	estimateNClusters=list(method="ABC" \| "NONE")
Shortcut form	estimateNClusters="ABC" \| "NONE"

The nClustersStmt value can be one or more of the following:

align="NONE" | "PCA"

specifies the method for aligning the reference data based on the input data.

Default	NONE

B=integer

specifies the amount of reference data to be created for each cluster candidate when the ABC method is used.

Default	1

criterion="ALL" | "FIRSTMAXWITHSTD" | "FIRSTPEAK" | "GLOBALPEAK" | "NONE"

specifies the criterion to be used to estimate the number of clusters that use the statistics that are obtained by the ABC method.

Default	GLOBALPEAK

method="ABC" | "NONE"

Default	NONE

minClusters=integer

specifies the minimum number of clusters to use in searching for the best number of clusters.

Default	2

freq="variable-name"

names the numeric variable that contains the frequency of occurrence for each observation.

impute="MEAN" | "NONE"

specifies the imputation method to be used when the input variables are interval.

Default	NONE

imputeNom="MODE" | "NONE"

specifies the imputation method to be used when the input variables are nominal.

Default	NONE

init="FORGY" | "RAND"

specifies the method for obtaining the initial estimate of cluster centers.

Default	FORGY

inputs=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	input

kPrototypeParams=list(kProtoStmt)

specifies the parameters to use when the input variables are nominal and interval.

Long form	kPrototypeParams=list(method="AUTOGAMMA" \| "USERGAMMA")
Shortcut form	kPrototypeParams="AUTOGAMMA" \| "USERGAMMA"

The kProtoStmt value can be one or more of the following:

gammaUserVal=double

specifies the value of the gamma parameter in the k-prototypes algorithm.

Alias	value
Default	0.5

method="AUTOGAMMA" | "USERGAMMA"

specifies the method for generating the gamma parameter in the k-prototypes algorithm.

maxClusters=integer

specifies either the number of clusters to use or the maximum number of clusters to search when you estimate the number of clusters.

Alias	nClusters
Default	6

maxIters=integer

specifies the maximum number of iterations for the algorithm to perform.

Default	10

nominals=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias	nominal

output=list(outputStatement)

creates a table on the server that contains observationwise clustering information, which is computed after clustering.

For more information about specifying the output parameter, see the common outputStatement parameter (Appendix A: Common Parameters).

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias	displayOut

outStat=list(casouttable)

specifies the cluster centers table.

For more information about specifying the outStat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

printIter=TRUE | FALSE

When set to True, outputs the cluster centers to the output cluster centers table for each iteration.

Default	FALSE

saveState=list(casouttable)

specifies to the table in which to save the model state for future model prediction.

Long form	saveState=list(name="table-name")
Shortcut form	saveState="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default	0
Minimum value	0

memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default	INHERIT

DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default	FALSE

replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default	FALSE

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

seed=double

specifies an integer to be used to start the pseudorandom number generator for initialization.

Default	0

standardize="NONE" | "RANGE" | "STD"

specifies the method for standardizing the interval input variables.

Default	NONE

stopCriterion=list(stopCriterionStmt)

specifies the method and the value for that method to be used for convergence. If you do not specify this parameter, the algorithm stops after it reaches the maximum number of iterations.

Long form	stopCriterion=list(method="CLUSTER_CHANGE" \| "WCSS_CHANGE")
Shortcut form	stopCriterion="CLUSTER_CHANGE" \| "WCSS_CHANGE"

The stopCriterionStmt value can be one or more of the following:

method="CLUSTER_CHANGE" | "WCSS_CHANGE"

Default	CLUSTER_CHANGE

value=double

Default	0

* table=list(castable)

specifies the input data table.

Long form	table=list(name="table-name")
Shortcut form	table="table-name"

The castable value can be one or more of the following:

caslib="string"

specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.

computedOnDemand=TRUE | FALSE

when set to True, creates the computed variables when the table is loaded instead of when the action begins.

Alias	compOnDemand
Default	FALSE

computedVars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

Alias	compVars

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

computedVarsProgram="string"

specifies an expression for each computed variable that you include in the computedVars parameter.

Alias	compPgm

dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>)

specifies data source options.

Aliases	options
Aliases	dataSource

importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the input table.

singlePass=TRUE | FALSE

when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.

Default	FALSE

vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to use in the action.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the input data.

whereTable=list(groupbytable)

The groupbytable value can be one or more of the following:

casLib="string"

specifies the caslib for the filter table. By default, the active caslib is used.

dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)

specifies data source options.

Aliases	options
Aliases	dataSource

For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).

importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)

specifies the settings for reading a table from a data source.

Alias	import

For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).

* name="table-name"

specifies the name of the filter table.

vars=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variable names to use from the filter table.

The casinvardesc value can be one or more of the following:

format="string"

specifies the format to apply to the variable.

formattedLength=integer

specifies the length of the format field plus the length of the format precision.

label="string"

specifies the descriptive label for the variable.

* name="variable-name"

specifies the name for the variable.

nfd=integer

specifies the length of the format precision.

nfl=integer

specifies the length of the format field.

where="where-expression"

specifies an expression for subsetting the data from the filter table.

weight="variable-name"

specifies the numeric variable to use to perform a weighted analysis of the data.

Last updated: March 05, 2026