Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.
Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.
| Alias | alphaLev |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalLev |
|---|---|
| alphaMargLev | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalOut |
|---|---|
| alphaMargOut | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.
| Alias | alphaOut |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.
| Alias | reproducibleRowOrder |
|---|---|
| Default | FALSE |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
specifies the assumed fraction of observations that are corrupted.
| Aliases | contam |
|---|---|
| corrupted | |
| Default | 0.25 |
| Range | 0–0.5 |
specifies options for the Diagnostics table.
| Aliases | diagOptions |
|---|---|
| diagOpts |
The diagOptList value can be one or more of the following:
specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.
| Minimum value | 0 |
|---|
when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.
| Alias | obsId |
|---|---|
| Default | FALSE |
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.
| Default | FALSE |
|---|
specifies one or more variables to include in output tables and plots, for identifying observations.
when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.
| Alias | initialOnly |
|---|---|
| Default | FALSE |
specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
when set to True, creates the Loadings table, which is produced only if you specify this parameter.
| Default | FALSE |
|---|
in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.
The modelStatement value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
The effect value can be one or more of the following:
specifies the type of interaction for the variables.
| Alias | interact |
|---|---|
| Default | NONE |
eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.
specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.
| Aliases | nComp |
|---|---|
| nPC | |
| n | |
| Minimum value | 1 |
specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.
| Aliases | nCompMax |
|---|---|
| nPCMax | |
| nMax | |
| Default | 10 |
| Minimum value | 1 |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
The outputOptions value can be one or more of the following:
specifies the settings for the output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.
| Alias | copyVar |
|---|
specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.
specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.
specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.
specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.
specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.
specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.
specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.
adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.
| Default | "Prin" |
|---|
specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.
| Aliases | proportionVariance |
|---|---|
| propVar | |
| Range | 0–1 |
specifies the seed to use for random number generation.
| Alias | randomSeed |
|---|---|
| Default | 1 |
| Range | 1–MACINT |
specifies the settings for an input table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.
| Alias | alphaLev |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalLev |
|---|---|
| alphaMargLev | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalOut |
|---|---|
| alphaMargOut | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.
| Alias | alphaOut |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.
| Alias | reproducibleRowOrder |
|---|---|
| Default | false |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
specifies the assumed fraction of observations that are corrupted.
| Aliases | contam |
|---|---|
| corrupted | |
| Default | 0.25 |
| Range | 0–0.5 |
specifies options for the Diagnostics table.
| Aliases | diagOptions |
|---|---|
| diagOpts |
The diagOptList value can be one or more of the following:
specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.
| Minimum value | 0 |
|---|
when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.
| Alias | obsId |
|---|---|
| Default | false |
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.
| Default | false |
|---|
specifies one or more variables to include in output tables and plots, for identifying observations.
when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.
| Alias | initialOnly |
|---|---|
| Default | false |
specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
when set to True, creates the Loadings table, which is produced only if you specify this parameter.
| Default | false |
|---|
in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.
The modelStatement value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
The effect value can be one or more of the following:
specifies the type of interaction for the variables.
| Alias | interact |
|---|---|
| Default | NONE |
eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.
specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.
| Aliases | nComp |
|---|---|
| nPC | |
| n | |
| Minimum value | 1 |
specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.
| Aliases | nCompMax |
|---|---|
| nPCMax | |
| nMax | |
| Default | 10 |
| Minimum value | 1 |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
The outputOptions value can be one or more of the following:
specifies the settings for the output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.
| Alias | copyVar |
|---|
specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.
specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.
specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.
specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.
specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.
specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.
specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.
adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.
| Default | "Prin" |
|---|
specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.
| Aliases | proportionVariance |
|---|---|
| propVar | |
| Range | 0–1 |
specifies the seed to use for random number generation.
| Alias | randomSeed |
|---|---|
| Default | 1 |
| Range | 1–MACINT |
specifies the settings for an input table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.
| Alias | alphaLev |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalLev |
|---|---|
| alphaMargLev | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalOut |
|---|---|
| alphaMargOut | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.
| Alias | alphaOut |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.
| Alias | reproducibleRowOrder |
|---|---|
| Default | False |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
specifies the assumed fraction of observations that are corrupted.
| Aliases | contam |
|---|---|
| corrupted | |
| Default | 0.25 |
| Range | 0–0.5 |
specifies options for the Diagnostics table.
| Aliases | diagOptions |
|---|---|
| diagOpts |
The diagOptList value can be one or more of the following:
specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.
| Minimum value | 0 |
|---|
when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.
| Alias | obsId |
|---|---|
| Default | False |
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.
| Default | False |
|---|
specifies one or more variables to include in output tables and plots, for identifying observations.
when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.
| Alias | initialOnly |
|---|---|
| Default | False |
specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
when set to True, creates the Loadings table, which is produced only if you specify this parameter.
| Default | False |
|---|
in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.
The modelStatement value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
The effect value can be one or more of the following:
specifies the type of interaction for the variables.
| Alias | interact |
|---|---|
| Default | NONE |
eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.
specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.
| Aliases | nComp |
|---|---|
| nPC | |
| n | |
| Minimum value | 1 |
specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.
| Aliases | nCompMax |
|---|---|
| nPCMax | |
| nMax | |
| Default | 10 |
| Minimum value | 1 |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
The outputOptions value can be one or more of the following:
specifies the settings for the output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.
| Alias | copyVar |
|---|
specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.
specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.
specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.
specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.
specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.
specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.
specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.
adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.
| Default | "Prin" |
|---|
specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.
| Aliases | proportionVariance |
|---|---|
| propVar | |
| Range | 0–1 |
specifies the seed to use for random number generation.
| Alias | randomSeed |
|---|---|
| Default | 1 |
| Range | 1–MACINT |
specifies the settings for an input table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
Identifies outliers and leverage points in a robust principal component analysis for any numeric multivariate data set.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametercasOut |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
specifies the tail probability that determines the robust score distance cutoff value that is used to identify leverage points.
| Alias | alphaLev |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
specifies the tail probability that determines the robust score distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaLeverage parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaLeverage parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalLev |
|---|---|
| alphaMargLev | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that determines which observations to show in the Diagnostics table. A value greater than the alphaOutlier parameter value adds observations that fall between the standard and marginal cutoffs. A value less than the alphaOutlier parameter value removes observations below the marginal cutoff.
| Aliases | alphaMarginalOut |
|---|---|
| alphaMargOut | |
| Range | 0–1 |
specifies the tail probability that determines the orthogonal distance cutoff value that is used to identify outliers.
| Alias | alphaOut |
|---|---|
| Default | 0.025 |
| Range | 0–1 |
when set to True, reads the data in a reproducible row order. You must use the groupBy and orderBy parameters in a preliminary call to the partition action in the table action set.
| Alias | reproducibleRowOrder |
|---|---|
| Default | FALSE |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
specifies the assumed fraction of observations that are corrupted.
| Aliases | contam |
|---|---|
| corrupted | |
| Default | 0.25 |
| Range | 0–0.5 |
specifies options for the Diagnostics table.
| Aliases | diagOptions |
|---|---|
| diagOpts |
The diagOptList value can be one or more of the following:
specifies the maximum number of observations to include in the Diagnostics table. If the value is less than the number of observations, priority for inclusion goes to observations that are both outliers and leverage points, then observations that are outliers, then observations that are leverage points.
| Minimum value | 0 |
|---|
when set to True, includes the automatically assigned observation ID in the Diagnostics table. This parameter is automatically set to True if you omit the ID parameter.
| Alias | obsId |
|---|---|
| Default | FALSE |
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, creates the Eigenvectors table, which is produced only if you specify this parameter.
| Default | FALSE |
|---|
specifies one or more variables to include in output tables and plots, for identifying observations.
when set to True, stops the analysis just before the point where the final number of principal components is determined. This saves computation time if you want to obtain only the information relevant to determining how many principal components to retain for the final subspace.
| Alias | initialOnly |
|---|---|
| Default | FALSE |
specifies the variables to be analyzed. You must specify either the inputs parameter or the model parameter, and the variables must be numeric.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
when set to True, creates the Loadings table, which is produced only if you specify this parameter.
| Default | FALSE |
|---|
in the effects subparameter, specifies the variables to be analyzed. You must specify either the model parameter or the inputs parameter, and the variables must be numeric.
The modelStatement value can be one or more of the following:
specifies one or more variables to use as response variables in the model. Not all models support more than one response variable.
| Aliases | depVar |
|---|---|
| target |
names the response variable.
specifies a list of effects that define the model. Each term in this list is made up of variables specified in the vars parameter and their interaction (which can be NONE, CROSS, or BAR). When the interaction is BAR, it can be limited by the maxInteract parameter.
The effect value can be one or more of the following:
specifies the type of interaction for the variables.
| Alias | interact |
|---|---|
| Default | NONE |
eliminates interaction effects whose order is higher than the specified integer value when used in conjunction with the BAR interaction.
specifies the variables to be nested within the term that is defined by the vars parameter. For terms with a BAR or CROSS interaction, the nest corresponds to the last variable in the vars parameter. For terms with no interaction, the nest is distributed across all variables that are listed in the vars parameter.
specifies the variables to use in defining a term of the effect. You must specify at least one variable.
specifies the number of principal components to retain for the final subspace. You must specify either this parameter or the propVariance parameter. You cannot specify both.
| Aliases | nComp |
|---|---|
| nPC | |
| n | |
| Minimum value | 1 |
specifies the largest feasible number of principal components that you would expect to retain for the final subspace given the target proportion of variance to explain. This number does not limit the number of components that are actually used; rather, it is used to calculate an observation subset size that must be calculated before the final number of components is determined.
| Aliases | nCompMax |
|---|---|
| nPCMax | |
| nMax | |
| Default | 10 |
| Minimum value | 1 |
creates an output table that contains observationwise statistics. If you do not specify any statistics, then the orthogonal and robust score distances are included. ID variables are automatically included. If no ID variables are specified, the automatically assigned observation ID is included.
The outputOptions value can be one or more of the following:
specifies the settings for the output table.
For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, which copies all variables. Any ID variables that you specify are automatically copied.
| Alias | copyVar |
|---|
specifies and names the leverage indicator. If you set this parameter to an empty string, the name Leverage is used for the output variable.
specifies the ID of the node that processes the observation. If you set this parameter to an empty string, the name NodeId is used for the output variable.
specifies the automatically assigned observation ID. If you set this parameter to an empty string, the name ObsId is used for the output variable.
specifies and names the orthogonal distance. If you set this parameter to an empty string, the name OrthDist is used for the output variable.
specifies and names the outlier indicator. If you set this parameter to an empty string, the name Outlier is used for the output variable.
specifies and names the principal component scores for each principal component. If you set this parameter to an empty string, the prefix Score is used to name the output variables.
specifies and names the robust score distance. If you set this parameter to an empty string, the name ScoreDist is used for the output variable.
adds an output statistic for the ID of the thread that processes the observation. Each node has its own collection of threads. If set to an empty string, the name ThreadId is used for the output variable.
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components in the Eigenvectors and Loadings tables.
| Default | "Prin" |
|---|
specifies the target proportion of variance to be explained by the principal components. You must specify either this parameter or the nPrinComp parameter. You cannot specify both. If you specify the propVariance parameter, the nPrinCompMax parameter also applies.
| Aliases | proportionVariance |
|---|---|
| propVar | |
| Range | 0–1 |
specifies the seed to use for random number generation.
| Alias | randomSeed |
|---|---|
| Default | 1 |
| Range | 1–MACINT |
specifies the settings for an input table.
For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).