Robust Multivariate Outlier Detection Action Set

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

CASL Syntax

mvOutlier.mvOutlier <result=results> <status=rc> /
alphaLeverage=double,
alphaOutlier=double,
applyRowOrder=TRUE | FALSE,
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
contamination=double,
diagnosticOptions={
maxObs=integer,
showObsId=TRUE | FALSE
},
display={
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
},
eigenvec=TRUE | FALSE,
id={"variable-name-1" <, "variable-name-2", ...>},
initOnly=TRUE | FALSE,
inputs={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
loadings=TRUE | FALSE,
model={
depVars={{
name="variable-name"
}, {...}},
effects={{
interaction="BAR" | "CROSS" | "NONE",
maxInteract=integer,
nest={"string-1" <, "string-2", ...>},
required parameter vars={"string-1" <, "string-2", ...>}
}, {...}}
},
nPrinComp=integer,
nPrinCompMax=integer,
output={
required parameter casOut={
caslib="string"
compress=TRUE | FALSE
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
copyVars={"variable-name-1" <, "variable-name-2", ...>},
leverage="string",
nodeid="string",
obsid="string",
orthdist="string",
outlier="string",
score="string",
scoredist="string",
threadid="string"
},
outputTables={
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=TRUE | FALSE,
replace=TRUE | FALSE
},
prefix="string",
propVariance=double,
seed=integer,
required parameter table={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
}
;
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias alphaLev
Default 0.025
Range 0–1

alphaMarginalLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalLev
alphaMargLev
Range 0–1

alphaMarginalOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalOut
alphaMargOut
Range 0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias alphaOut
Default 0.025
Range 0–1

applyRowOrder=TRUE | FALSE

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias reproducibleRowOrder
Default FALSE

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases contam
corrupted
Default 0.25
Range 0–0.5

diagnosticOptions={diagOptList}

specifies options for the Diagnostics table.

Aliases diagOptions
diagOpts

The diagOptList value can be one or more of the following:

maxObs=integer

specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.

Minimum value 0
showObsId=TRUE | FALSE

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias obsId
Default FALSE

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=TRUE | FALSE

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default FALSE

id={"variable-name-1" <, "variable-name-2", ...>}

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=TRUE | FALSE

when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.

Alias initialOnly
Default FALSE

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

loadings=TRUE | FALSE

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default FALSE

model={modelStatement}

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases nComp
nPC
n
Minimum value 1

nPrinCompMax=integer

specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.

Aliases nCompMax
nPCMax
nMax
Default 10
Minimum value 1

output={outputOptions}

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

The outputOptions value can be one or more of the following:

* casOut={casouttable}

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars={"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.

Alias copyVar
leverage="string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

nodeid="string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

obsid="string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

orthdist="string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

outlier="string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

score="string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

scoredist="string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

threadid="string"

adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default "Prin"

propVariance=double

specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.

Aliases proportionVariance
propVar
Range 0–1

seed=integer

specifies the seed to use for random number generation.

Alias randomSeed
Default 1
Range 1–MACINT

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

Lua Syntax

results, info = s:mvOutlier_mvOutlier{
alphaLeverage=double,
alphaOutlier=double,
applyRowOrder=true | false,
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
contamination=double,
diagnosticOptions={
maxObs=integer,
showObsId=true | false
},
display={
caseSensitive=true | false,
exclude=true | false,
excludeAll=true | false,
keyIsPath=true | false,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=true | false
},
eigenvec=true | false,
id={"variable-name-1" <, "variable-name-2", ...>},
initOnly=true | false,
inputs={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
loadings=true | false,
model={
depVars={{
name="variable-name"
}, {...}},
effects={{
interaction="BAR" | "CROSS" | "NONE",
maxInteract=integer,
nest={"string-1" <, "string-2", ...>},
required parameter vars={"string-1" <, "string-2", ...>}
}, {...}}
},
nPrinComp=integer,
nPrinCompMax=integer,
output={
required parameter casOut={
caslib="string"
compress=true | false
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=true | false
replace=true | false
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
copyVars={"variable-name-1" <, "variable-name-2", ...>},
leverage="string",
nodeid="string",
obsid="string",
orthdist="string",
outlier="string",
score="string",
scoredist="string",
threadid="string"
},
outputTables={
groupByVarsRaw=true | false,
includeAll=true | false,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=true | false,
replace=true | false
},
prefix="string",
propVariance=double,
seed=integer,
required parameter table={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
}
}
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias alphaLev
Default 0.025
Range 0–1

alphaMarginalLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalLev
alphaMargLev
Range 0–1

alphaMarginalOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalOut
alphaMargOut
Range 0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias alphaOut
Default 0.025
Range 0–1

applyRowOrder=true | false

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias reproducibleRowOrder
Default false

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases contam
corrupted
Default 0.25
Range 0–0.5

diagnosticOptions={diagOptList}

specifies options for the Diagnostics table.

Aliases diagOptions
diagOpts

The diagOptList value can be one or more of the following:

maxObs=integer

specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.

Minimum value 0
showObsId=true | false

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias obsId
Default false

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=true | false

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default false

id={"variable-name-1" <, "variable-name-2", ...>}

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=true | false

when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.

Alias initialOnly
Default false

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

loadings=true | false

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default false

model={modelStatement}

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

depVars={{responsevar-1} <, {responsevar-2}, ...>}

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
name="variable-name"

names the response variable.

effects={{effect-1} <, {effect-2}, ...>}

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest={"string-1" <, "string-2", ...>}

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars={"string-1" <, "string-2", ...>}

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases nComp
nPC
n
Minimum value 1

nPrinCompMax=integer

specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.

Aliases nCompMax
nPCMax
nMax
Default 10
Minimum value 1

output={outputOptions}

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

The outputOptions value can be one or more of the following:

* casOut={casouttable}

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars={"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.

Alias copyVar
leverage="string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

nodeid="string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

obsid="string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

orthdist="string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

outlier="string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

score="string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

scoredist="string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

threadid="string"

adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default "Prin"

propVariance=double

specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.

Aliases proportionVariance
propVar
Range 0–1

seed=integer

specifies the seed to use for random number generation.

Alias randomSeed
Default 1
Range 1–MACINT

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

Python Syntax

results=s.mvOutlier.mvOutlier(
alphaLeverage=double,
alphaOutlier=double,
applyRowOrder=True | False,
attributes=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
contamination=double,
diagnosticOptions={
"maxObs":integer,
"showObsId":True | False
},
display={
"caseSensitive":True | False,
"exclude":True | False,
"excludeAll":True | False,
"keyIsPath":True | False,
"names":["string-1" <, "string-2", ...>],
"pathType":"LABEL" | "NAME",
"traceNames":True | False
},
eigenvec=True | False,
id=["variable-name-1" <, "variable-name-2", ...>],
initOnly=True | False,
inputs=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
loadings=True | False,
model={
"depVars":[{
"name":"variable-name"
}<, {...}>],
"effects":[{
"interaction":"BAR" | "CROSS" | "NONE",
"maxInteract":integer,
"nest":["string-1" <, "string-2", ...>],
required parameter "vars":["string-1" <, "string-2", ...>]
}<, {...}>]
},
nPrinComp=integer,
nPrinCompMax=integer,
output={
required parameter "casOut":{
"caslib":"string"
"compress":True | False
"indexVars":["variable-name-1" <, "variable-name-2", ...>]
"label":"string"
"lifetime":64-bit-integer
"maxMemSize":64-bit-integer
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"
"name":"table-name"
"promote":True | False
"replace":True | False
"replication":integer
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"
"threadBlockSize":64-bit-integer
"timeStamp":"string"
"where":["string-1" <, "string-2", ...>]
},
"copyVars":["variable-name-1" <, "variable-name-2", ...>],
"leverage":"string",
"nodeid":"string",
"obsid":"string",
"orthdist":"string",
"outlier":"string",
"score":"string",
"scoredist":"string",
"threadid":"string"
},
outputTables={
"groupByVarsRaw":True | False,
"includeAll":True | False,
"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},
"repeated":True | False,
"replace":True | False
},
prefix="string",
propVariance=double,
seed=integer,
required parameter table={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
}
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias alphaLev
Default 0.025
Range 0–1

alphaMarginalLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalLev
alphaMargLev
Range 0–1

alphaMarginalOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalOut
alphaMargOut
Range 0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias alphaOut
Default 0.025
Range 0–1

applyRowOrder=True | False

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias reproducibleRowOrder
Default False

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases contam
corrupted
Default 0.25
Range 0–0.5

diagnosticOptions={diagOptList}

specifies options for the Diagnostics table.

Aliases diagOptions
diagOpts

The diagOptList value can be one or more of the following:

"maxObs":integer

specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.

Minimum value 0
"showObsId":True | False

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias obsId
Default False

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=True | False

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default False

id=["variable-name-1" <, "variable-name-2", ...>]

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=True | False

when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.

Alias initialOnly
Default False

inputs=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

loadings=True | False

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default False

model={modelStatement}

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

"depVars":[{responsevar-1} <, {responsevar-2}, ...>]

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
"name":"variable-name"

names the response variable.

"effects":[{effect-1} <, {effect-2}, ...>]

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

"interaction":"BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
"maxInteract":integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

"nest":["string-1" <, "string-2", ...>]

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* "vars":["string-1" <, "string-2", ...>]

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases nComp
nPC
n
Minimum value 1

nPrinCompMax=integer

specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.

Aliases nCompMax
nPCMax
nMax
Default 10
Minimum value 1

output={outputOptions}

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

The outputOptions value can be one or more of the following:

* "casOut":{casouttable}

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

"copyVars":["variable-name-1" <, "variable-name-2", ...>]

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.

Alias copyVar
"leverage":"string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

"nodeid":"string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

"obsid":"string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

"orthdist":"string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

"outlier":"string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

"score":"string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

"scoredist":"string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

"threadid":"string"

adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default "Prin"

propVariance=double

specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.

Aliases proportionVariance
propVar
Range 0–1

seed=integer

specifies the seed to use for random number generation.

Alias randomSeed
Default 1
Range 1–MACINT

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

mvOutlier Action

Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.

R Syntax

results <– cas.mvOutlier.mvOutlier(s,
alphaLeverage=double,
alphaOutlier=double,
applyRowOrder=TRUE | FALSE,
attributes=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
contamination=double,
diagnosticOptions=list(
maxObs=integer,
showObsId=TRUE | FALSE
),
display=list(
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>),
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
),
eigenvec=TRUE | FALSE,
id=list("variable-name-1" <, "variable-name-2", ...>),
initOnly=TRUE | FALSE,
inputs=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
loadings=TRUE | FALSE,
model=list(
depVars=list( list(
name="variable-name"
) <, list(...)>),
effects=list( list(
interaction="BAR" | "CROSS" | "NONE",
maxInteract=integer,
nest=list("string-1" <, "string-2", ...>),
required parameter vars=list("string-1" <, "string-2", ...>)
) <, list(...)>)
),
nPrinComp=integer,
nPrinCompMax=integer,
output=list(
required parameter casOut=list(
caslib="string"
compress=TRUE | FALSE
indexVars=list("variable-name-1" <, "variable-name-2", ...>)
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where=list("string-1" <, "string-2", ...>)
),
copyVars=list("variable-name-1" <, "variable-name-2", ...>),
leverage="string",
nodeid="string",
obsid="string",
orthdist="string",
outlier="string",
score="string",
scoredist="string",
threadid="string"
),
outputTables=list(
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),
repeated=TRUE | FALSE,
replace=TRUE | FALSE
),
prefix="string",
propVariance=double,
seed=integer,
required parameter table=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
)
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 output

required parametercasOut

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

Parameter Descriptions

alphaLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.

Alias alphaLev
Default 0.025
Range 0–1

alphaMarginalLeverage=double

specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalLev
alphaMargLev
Range 0–1

alphaMarginalOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.

Aliases alphaMarginalOut
alphaMargOut
Range 0–1

alphaOutlier=double

specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.

Alias alphaOut
Default 0.025
Range 0–1

applyRowOrder=TRUE | FALSE

when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.

Alias reproducibleRowOrder
Default FALSE

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases attribute
attr

contamination=double

specifies the assumed fraction of observations that are corrupted.

Aliases contam
corrupted
Default 0.25
Range 0–0.5

diagnosticOptions=list(diagOptList)

specifies options for the Diagnostics table.

Aliases diagOptions
diagOpts

The diagOptList value can be one or more of the following:

maxObs=integer

specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.

Minimum value 0
showObsId=TRUE | FALSE

when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.

Alias obsId
Default FALSE

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

eigenvec=TRUE | FALSE

when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.

Default FALSE

id=list("variable-name-1" <, "variable-name-2", ...>)

specifies one or more variables to include in output tables and plots, for identifying observations.

initOnly=TRUE | FALSE

when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.

Alias initialOnly
Default FALSE

inputs=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Aliases input
vars
var

loadings=TRUE | FALSE

when set to True, creates the Loadings table, which is produced only if you specify this parameter.

Default FALSE

model=list(modelStatement)

in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.

The modelStatement value can be one or more of the following:

depVars=list( list(responsevar-1) <, list(responsevar-2), ...>)

specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.

Aliases depVar
target
name="variable-name"

names the response variable.

effects=list( list(effect-1) <, list(effect-2), ...>)

specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.

The effect value can be one or more of the following:

interaction="BAR" | "CROSS" | "NONE"

specifies the type of interaction for the variables.

Alias interact
Default NONE
maxInteract=integer

eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.

nest=list("string-1" <, "string-2", ...>)

specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.

* vars=list("string-1" <, "string-2", ...>)

specifies the variables to use in defining a term of the effect. You must specify at least one variable.

nPrinComp=integer

specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.

Aliases nComp
nPC
n
Minimum value 1

nPrinCompMax=integer

specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.

Aliases nCompMax
nPCMax
nMax
Default 10
Minimum value 1

output=list(outputOptions)

creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.

The outputOptions value can be one or more of the following:

* casOut=list(casouttable)

specifies the settings for the output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars=list("variable-name-1" <, "variable-name-2", ...>)

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.

Alias copyVar
leverage="string"

specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.

nodeid="string"

specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.

obsid="string"

specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.

orthdist="string"

specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.

outlier="string"

specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.

score="string"

specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.

scoredist="string"

specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.

threadid="string"

adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

prefix="string"

specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.

Default "Prin"

propVariance=double

specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.

Aliases proportionVariance
propVar
Range 0–1

seed=integer

specifies the seed to use for random number generation.

Alias randomSeed
Default 1
Range 1–MACINT

* table=list(castable)

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Last updated: March 05, 2026