Robust Multivariate Outlier Detection Action Set

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

CASL Syntax
Summary: Input and Output Tables
Parameter Descriptions

CASL Syntax

mvOutlier.mvOutlier <result=results> <status=rc> /

alphaLeverage=double,

alphaMarginalLeverage=double,

alphaMarginalOutlier=double,

alphaOutlier=double,

applyRowOrder=TRUE | FALSE,

attributes={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

contamination=double,

diagnosticOptions={

maxObs=integer,

showObsId=TRUE | FALSE

display={

caseSensitive=TRUE | FALSE,

exclude=TRUE | FALSE,

excludeAll=TRUE | FALSE,

keyIsPath=TRUE | FALSE,

names={"string-1" <, "string-2", ...>},

pathType="LABEL" | "NAME",

traceNames=TRUE | FALSE

eigenvec=TRUE | FALSE,

id={"variable-name-1" <, "variable-name-2", ...>},

initOnly=TRUE | FALSE,

inputs={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

loadings=TRUE | FALSE,

model={

depVars={{

name="variable-name"

}, {...}},

effects={{

interaction="BAR" | "CROSS" | "NONE",

maxInteract=integer,

nest={"string-1" <, "string-2", ...>},

vars={"string-1" <, "string-2", ...>}

}, {...}}

nPrinComp=integer,

nPrinCompMax=integer,

output={

casOut={

caslib="string"

compress=TRUE | FALSE

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

copyVars={"variable-name-1" <, "variable-name-2", ...>},

leverage="string",

nodeid="string",

obsid="string",

orthdist="string",

outlier="string",

score="string",

scoredist="string",

threadid="string"

outputTables={

groupByVarsRaw=TRUE | FALSE,

includeAll=TRUE | FALSE,

names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},

repeated=TRUE | FALSE,

replace=TRUE | FALSE

prefix="string",

propVariance=double,

seed=integer,

table={

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

groupBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

groupByMode="NOSORT" | "REDISTRIBUTE",

importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

name="table-name",

orderBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

singlePass=TRUE | FALSE,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

;

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the settings for an input table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
outputTables	names	lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias	alphaLev
Default	0.025
Range	0–1

alphaMarginalLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.

Aliases	alphaMarginalLev
Aliases	alphaMargLev
Range	0–1

alphaMarginalOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.

Aliases	alphaMarginalOut
Aliases	alphaMargOut
Range	0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias	alphaOut
Default	0.025
Range	0–1

applyRowOrder=TRUE | FALSE

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias	reproducibleRowOrder
Default	FALSE

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases	contam
Aliases	corrupted
Default	0.25
Range	0–0.5

diagnosticOptions={diagOptList}

specifies options for the Diagnostics table.

Aliases	diagOptions
Aliases	diagOpts

The diagOptList value can be one or more of the following:

maxObs=integer

specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.

Minimum value	0

showObsId=TRUE | FALSE

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias	obsId
Default	FALSE

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=TRUE | FALSE

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default	FALSE

id={"variable-name-1" <, "variable-name-2", ...>}

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=TRUE | FALSE

when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.

Alias	initialOnly
Default	FALSE

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	input
	vars
	var

loadings=TRUE | FALSE

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default	FALSE

model={modelStatement}

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases	nComp
	nPC
	n
Minimum value	1

nPrinCompMax=integer

specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.

Aliases	nCompMax
	nPCMax
	nMax
Default	10
Minimum value	1

output={outputOptions}

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

The outputOptions value can be one or more of the following:

* casOut={casouttable}

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars={"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.

Alias	copyVar

leverage="string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

nodeid="string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

obsid="string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

orthdist="string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

outlier="string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

score="string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

scoredist="string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

threadid="string"

adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default	"Prin"

propVariance=double

specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.

Aliases	proportionVariance
Aliases	propVar
Range	0–1

seed=integer

specifies the seed to use for random number generation.

Alias	randomSeed
Default	1
Range	1–MACINT

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

Lua Syntax
Summary: Input and Output Tables
Parameter Descriptions

Lua Syntax

results, info = s:mvOutlier_mvOutlier{

alphaLeverage=double,

alphaMarginalLeverage=double,

alphaMarginalOutlier=double,

alphaOutlier=double,

applyRowOrder=true | false,

attributes={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

contamination=double,

diagnosticOptions={

maxObs=integer,

showObsId=true | false

display={

caseSensitive=true | false,

exclude=true | false,

excludeAll=true | false,

keyIsPath=true | false,

names={"string-1" <, "string-2", ...>},

pathType="LABEL" | "NAME",

traceNames=true | false

eigenvec=true | false,

id={"variable-name-1" <, "variable-name-2", ...>},

initOnly=true | false,

inputs={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

loadings=true | false,

model={

depVars={{

name="variable-name"

}, {...}},

effects={{

interaction="BAR" | "CROSS" | "NONE",

maxInteract=integer,

nest={"string-1" <, "string-2", ...>},

vars={"string-1" <, "string-2", ...>}

}, {...}}

nPrinComp=integer,

nPrinCompMax=integer,

output={

casOut={

caslib="string"

compress=true | false

indexVars={"variable-name-1" <, "variable-name-2", ...>}

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

promote=true | false

replace=true | false

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where={"string-1" <, "string-2", ...>}

copyVars={"variable-name-1" <, "variable-name-2", ...>},

leverage="string",

nodeid="string",

obsid="string",

orthdist="string",

outlier="string",

score="string",

scoredist="string",

threadid="string"

outputTables={

groupByVarsRaw=true | false,

includeAll=true | false,

names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},

repeated=true | false,

replace=true | false

prefix="string",

propVariance=double,

seed=integer,

table={

caslib="string",

computedOnDemand=true | false,

computedVars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

computedVarsProgram="string",

dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},

groupBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

groupByMode="NOSORT" | "REDISTRIBUTE",

name="table-name",

orderBy={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

singlePass=true | false,

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}},

where="where-expression",

whereTable={

casLib="string"

name="table-name"

vars={{

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

}, {...}}

where="where-expression"

}

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the settings for an input table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
outputTables	names	lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias	alphaLev
Default	0.025
Range	0–1

alphaMarginalLeverage=double

Aliases	alphaMarginalLev
Aliases	alphaMargLev
Range	0–1

alphaMarginalOutlier=double

Aliases	alphaMarginalOut
Aliases	alphaMargOut
Range	0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias	alphaOut
Default	0.025
Range	0–1

applyRowOrder=true | false

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias	reproducibleRowOrder
Default	false

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases	contam
Aliases	corrupted
Default	0.25
Range	0–0.5

diagnosticOptions={diagOptList}

specifies options for the Diagnostics table.

Aliases	diagOptions
Aliases	diagOpts

The diagOptList value can be one or more of the following:

maxObs=integer

Minimum value	0

showObsId=true | false

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias	obsId
Default	false

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=true | false

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default	false

id={"variable-name-1" <, "variable-name-2", ...>}

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=true | false

Alias	initialOnly
Default	false

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	input
	vars
	var

loadings=true | false

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default	false

model={modelStatement}

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases	nComp
	nPC
	n
Minimum value	1

nPrinCompMax=integer

Aliases	nCompMax
	nPCMax
	nMax
Default	10
Minimum value	1

output={outputOptions}

The outputOptions value can be one or more of the following:

* casOut={casouttable}

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars={"variable-name-1" <, "variable-name-2", ...>}

Alias	copyVar

leverage="string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

nodeid="string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

obsid="string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

orthdist="string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

outlier="string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

score="string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

scoredist="string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

threadid="string"

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default	"Prin"

propVariance=double

Aliases	proportionVariance
Aliases	propVar
Range	0–1

seed=integer

specifies the seed to use for random number generation.

Alias	randomSeed
Default	1
Range	1–MACINT

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

Python Syntax
Summary: Input and Output Tables
Parameter Descriptions

Python Syntax

results=s.mvOutlier.mvOutlier(

alphaLeverage=double,

alphaMarginalLeverage=double,

alphaMarginalOutlier=double,

alphaOutlier=double,

applyRowOrder=True | False,

attributes=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

contamination=double,

diagnosticOptions={

"maxObs":integer,

"showObsId":True | False

display={

"caseSensitive":True | False,

"exclude":True | False,

"excludeAll":True | False,

"keyIsPath":True | False,

"names":["string-1" <, "string-2", ...>],

"pathType":"LABEL" | "NAME",

"traceNames":True | False

eigenvec=True | False,

id=["variable-name-1" <, "variable-name-2", ...>],

initOnly=True | False,

inputs=[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

loadings=True | False,

model={

"depVars":[{

"name":"variable-name"

}<, {...}>],

"effects":[{

"interaction":"BAR" | "CROSS" | "NONE",

"maxInteract":integer,

"nest":["string-1" <, "string-2", ...>],

"vars":["string-1" <, "string-2", ...>]

}<, {...}>]

nPrinComp=integer,

nPrinCompMax=integer,

output={

"casOut":{

"caslib":"string"

"compress":True | False

"indexVars":["variable-name-1" <, "variable-name-2", ...>]

"label":"string"

"lifetime":64-bit-integer

"maxMemSize":64-bit-integer

"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

"name":"table-name"

"promote":True | False

"replace":True | False

"replication":integer

"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

"threadBlockSize":64-bit-integer

"timeStamp":"string"

"where":["string-1" <, "string-2", ...>]

"copyVars":["variable-name-1" <, "variable-name-2", ...>],

"leverage":"string",

"nodeid":"string",

"obsid":"string",

"orthdist":"string",

"outlier":"string",

"score":"string",

"scoredist":"string",

"threadid":"string"

outputTables={

"groupByVarsRaw":True | False,

"includeAll":True | False,

"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},

"repeated":True | False,

"replace":True | False

prefix="string",

propVariance=double,

seed=integer,

table={

"caslib":"string",

"computedOnDemand":True | False,

"computedVars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"computedVarsProgram":"string",

"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},

"groupBy":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"groupByMode":"NOSORT" | "REDISTRIBUTE",

"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},

"name":"table-name",

"orderBy":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"singlePass":True | False,

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>],

"where":"where-expression",

"whereTable":{

"casLib":"string"

"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}

"name":"table-name"

"vars":[{

"format":"string",

"formattedLength":integer,

"label":"string",

"name":"variable-name",

"nfd":integer,

"nfl":integer

}<, {...}>]

"where":"where-expression"

}

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the settings for an input table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
outputTables	names	lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias	alphaLev
Default	0.025
Range	0–1

alphaMarginalLeverage=double

Aliases	alphaMarginalLev
Aliases	alphaMargLev
Range	0–1

alphaMarginalOutlier=double

Aliases	alphaMarginalOut
Aliases	alphaMargOut
Range	0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias	alphaOut
Default	0.025
Range	0–1

applyRowOrder=True | False

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias	reproducibleRowOrder
Default	False

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases	contam
Aliases	corrupted
Default	0.25
Range	0–0.5

diagnosticOptions={diagOptList}

specifies options for the Diagnostics table.

Aliases	diagOptions
Aliases	diagOpts

The diagOptList value can be one or more of the following:

"maxObs":integer

Minimum value	0

"showObsId":True | False

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias	obsId
Default	False

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=True | False

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default	False

id=["variable-name-1" <, "variable-name-2", ...>]

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=True | False

Alias	initialOnly
Default	False

inputs=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	input
	vars
	var

loadings=True | False

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default	False

model={modelStatement}

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

"depVars":[{responsevar-1} <, {responsevar-2}, ...>]

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

"name":"variable-name"

names the response variable.

"effects":[{effect-1} <, {effect-2}, ...>]

The effect value can be one or more of the following:

"interaction":"BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

"maxInteract":integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

"nest":["string-1" <, "string-2", ...>]

* "vars":["string-1" <, "string-2", ...>]

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases	nComp
	nPC
	n
Minimum value	1

nPrinCompMax=integer

Aliases	nCompMax
	nPCMax
	nMax
Default	10
Minimum value	1

output={outputOptions}

The outputOptions value can be one or more of the following:

* "casOut":{casouttable}

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

"copyVars":["variable-name-1" <, "variable-name-2", ...>]

Alias	copyVar

"leverage":"string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

"nodeid":"string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

"obsid":"string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

"orthdist":"string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

"outlier":"string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

"score":"string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

"scoredist":"string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

"threadid":"string"

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default	"Prin"

propVariance=double

Aliases	proportionVariance
Aliases	propVar
Range	0–1

seed=integer

specifies the seed to use for random number generation.

Alias	randomSeed
Default	1
Range	1–MACINT

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

R Syntax
Summary: Input and Output Tables
Parameter Descriptions

R Syntax

results <– cas.mvOutlier.mvOutlier(s,

alphaLeverage=double,

alphaMarginalLeverage=double,

alphaMarginalOutlier=double,

alphaOutlier=double,

applyRowOrder=TRUE | FALSE,

attributes=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

contamination=double,

diagnosticOptions=list(

maxObs=integer,

showObsId=TRUE | FALSE

display=list(

caseSensitive=TRUE | FALSE,

exclude=TRUE | FALSE,

excludeAll=TRUE | FALSE,

keyIsPath=TRUE | FALSE,

names=list("string-1" <, "string-2", ...>),

pathType="LABEL" | "NAME",

traceNames=TRUE | FALSE

eigenvec=TRUE | FALSE,

id=list("variable-name-1" <, "variable-name-2", ...>),

initOnly=TRUE | FALSE,

inputs=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

loadings=TRUE | FALSE,

model=list(

depVars=list( list(

name="variable-name"

) <, list(...)>),

effects=list( list(

interaction="BAR" | "CROSS" | "NONE",

maxInteract=integer,

nest=list("string-1" <, "string-2", ...>),

vars=list("string-1" <, "string-2", ...>)

) <, list(...)>)

nPrinComp=integer,

nPrinCompMax=integer,

output=list(

casOut=list(

caslib="string"

compress=TRUE | FALSE

indexVars=list("variable-name-1" <, "variable-name-2", ...>)

label="string"

lifetime=64-bit-integer

maxMemSize=64-bit-integer

memoryFormat="DVR" | "INHERIT" | "STANDARD"

name="table-name"

promote=TRUE | FALSE

replace=TRUE | FALSE

replication=integer

tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

threadBlockSize=64-bit-integer

timeStamp="string"

where=list("string-1" <, "string-2", ...>)

copyVars=list("variable-name-1" <, "variable-name-2", ...>),

leverage="string",

nodeid="string",

obsid="string",

orthdist="string",

outlier="string",

score="string",

scoredist="string",

threadid="string"

outputTables=list(

groupByVarsRaw=TRUE | FALSE,

includeAll=TRUE | FALSE,

names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),

repeated=TRUE | FALSE,

replace=TRUE | FALSE

prefix="string",

propVariance=double,

seed=integer,

table=list(

caslib="string",

computedOnDemand=TRUE | FALSE,

computedVars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

computedVarsProgram="string",

dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),

groupBy=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

groupByMode="NOSORT" | "REDISTRIBUTE",

name="table-name",

orderBy=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

singlePass=TRUE | FALSE,

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>),

where="where-expression",

whereTable=list(

casLib="string"

name="table-name"

vars=list( list(

format="string",

formattedLength=integer,

label="string",

name="variable-name",

nfd=integer,

nfl=integer

) <, list(...)>)

where="where-expression"

)

indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables
Parameter	Subparameter	Description
required parametertable	—	specifies the settings for an input table.

Parameters for Creating Output Tables
Parameter	Subparameter	Description
output	required parametercasOut	creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
outputTables	names	lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias	alphaLev
Default	0.025
Range	0–1

alphaMarginalLeverage=double

Aliases	alphaMarginalLev
Aliases	alphaMargLev
Range	0–1

alphaMarginalOutlier=double

Aliases	alphaMarginalOut
Aliases	alphaMargOut
Range	0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias	alphaOut
Default	0.025
Range	0–1

applyRowOrder=TRUE | FALSE

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias	reproducibleRowOrder
Default	FALSE

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	attribute
Aliases	attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases	contam
Aliases	corrupted
Default	0.25
Range	0–0.5

diagnosticOptions=list(diagOptList)

specifies options for the Diagnostics table.

Aliases	diagOptions
Aliases	diagOpts

The diagOptList value can be one or more of the following:

maxObs=integer

Minimum value	0

showObsId=TRUE | FALSE

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias	obsId
Default	FALSE

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=TRUE | FALSE

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default	FALSE

id=list("variable-name-1" <, "variable-name-2", ...>)

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=TRUE | FALSE

Alias	initialOnly
Default	FALSE

inputs=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases	input
	vars
	var

loadings=TRUE | FALSE

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default	FALSE

model=list(modelStatement)

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

depVars=list( list(responsevar-1) <, list(responsevar-2), ...>)

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases	depVar
Aliases	target

name="variable-name"

names the response variable.

effects=list( list(effect-1) <, list(effect-2), ...>)

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias	interact
Default	NONE

maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest=list("string-1" <, "string-2", ...>)

* vars=list("string-1" <, "string-2", ...>)

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases	nComp
	nPC
	n
Minimum value	1

nPrinCompMax=integer

Aliases	nCompMax
	nPCMax
	nMax
Default	10
Minimum value	1

output=list(outputOptions)

The outputOptions value can be one or more of the following:

* casOut=list(casouttable)

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars=list("variable-name-1" <, "variable-name-2", ...>)

Alias	copyVar

leverage="string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

nodeid="string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

obsid="string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

orthdist="string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

outlier="string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

score="string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

scoredist="string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

threadid="string"

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default	"Prin"

propVariance=double

Aliases	proportionVariance
Aliases	propVar
Range	0–1

seed=integer

specifies the seed to use for random number generation.

Alias	randomSeed
Default	1
Range	1–MACINT

* table=list(castable)

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Last updated: March 05, 2026