Provides actions for robust principal component analysis (RPCA) and moving windows principal component analysis (MWPCA)
Performs robust principal component analysis.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
casOut |
produces SAS score code. This parameter is disabled if you specify the image parameter. |
|
|
— |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter. |
|
|
lowRankMat, sparseMat, errMat |
specifies a list of parameters for the output tables of the robust principal component analysis method. |
|
|
pcLoadings, pcScores |
specifies a list of parameters for the output tables of the principal component analysis. |
|
|
svdDiag, svdLeft, svdRight |
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
— |
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter. |
when set to True, uses a subsequent score action for anomaly detection.
| Aliases | anomaly |
|---|---|
| AD | |
| Default | FALSE |
specifies the method of anomaly detection. If this value is set to 0, the SIGVARS method for anomaly detection is used. If this value is set to 1, the R4S method for anomaly detection is used. If this value is set to 2, the ICA-SIGVARS method for anomaly detection is used. If this value is set to 3, the ICA-NORMS method for anomaly detection is used. For more information about these anomaly detection methods, see the Details section. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_ANOMALYDETECTION_METHOD as the value of the name subparameter, and specify the override value in the value subparameter.
| Alias | ADMethod |
|---|---|
| Default | 0 |
| Range | 0–3 |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
when set to True, centers the numeric variables by the mean of each column.
| Alias | centering |
|---|---|
| Default | FALSE |
produces SAS score code. This parameter is disabled if you specify the image parameter.
The rpcaCodegen value can be one or more of the following:
specifies the settings for an output table.
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
when set to True, applies data compression to the table.
| Default | FALSE |
|---|
specifies the list of variables to create indexes for in the output data.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the maximum amount of memory, in bytes, that each thread should allocate for in-memory blocks before converting to a memory-mapped file. Files are written in the directories that are specified in the CAS_DISK_CACHE environment variable.
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
|---|
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the number of bytes to use for blocks in the output table. The blocks are read by threads. Gradually increase this value when you have a large table with millions or billions of rows and you are tuning for performance. Larger values can increase performance with indexed tables. However, if the value is too large, then you can cause thread starvation due to too few blocks for threads to work on.
| Alias | blockSize |
|---|---|
| Default | 1048576 |
| Minimum value | 0 |
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
specifies to add a timestamp column to the table. Support for timeStamp is action-specific. Specify the value in the form that is appropriate for your session locale.
specifies one or more expressions for subsetting the output data. When multiple expressions are specified, the expressions are effectively combined using AND to form the final output filter. If an expression contains quoted values, use nested quotation marks.
when set to True, adds comments to the DATA step code.
| Default | FALSE |
|---|
specifies the width to use for formatting derived numbers such as parameter estimates in the DATA step code.
| Alias | fmtWidth |
|---|---|
| Default | 20 |
| Range | 0–32 |
specifies the number of spaces to indent the DATA step code for each level.
| Default | 3 |
|---|---|
| Range | 0–10 |
specifies the label ID to use in array names and statement labels in the DATA step code. By default, a random positive integer is used.
specifies the line size for the generated code.
| Default | 120 |
|---|---|
| Range | 64–254 |
when set to True, bases the comparison of variables with formatted values on the full format width with padding. By default, leading and trailing blanks are removed from the formatted values.
| Default | FALSE |
|---|
when set to True, generates the code in a way that is appropriate for storing in a table.
| Alias | tableForm |
|---|---|
| Default | FALSE |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter.
For more information about specifying the colStatistics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the significance level of the eigenvalues that determine the rank of the low-rank matrix.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies the decomposition method for the low-rank matrix. If the value of the maxiter parameter is 0, decomposition is applied to the original input data instead of to the low-rank matrix.
| Default | NONE |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, fixes mu in each iteration of the accelerated proximal gradient method. Otherwise, mu is dynamically updated in each iteration.
| Default | FALSE |
|---|
specifies a numeric variable that contains the frequency of occurrence of each observation.
specifies the maximum number of iterations of Infomax ICA when training.
| Default | 100 |
|---|---|
| Range | 1–500 |
specifies the ICA method for RPCA-ICA anomaly detection.
| Default | FOBI |
|---|
specifies the variables to use as record identifiers.
specifies the name of the column that contains image binaries, encoded as JPG, PNG, TIF, or WIDE. You cannot specify this parameter with the inputs parameter.
| Alias | imageVar |
|---|
specifies the numeric variables to be analyzed. If you omit this parameter, all numeric variables that are not specified in other parameters are analyzed. You cannot specify this parameter with the image parameter.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
specifies the value of the coefficient in the objective function (lambda), which is multiplied by the L1 norm of the sparse matrix in the objective function. The default value is computed as 1 divided by the square root of the number of observations or the number of variables in the input table, whichever is greater.
| Range | (0–10000000000] |
|---|
specifies the weight of lambda.
| Default | 1 |
|---|---|
| Range | (0–10000000000] |
specifies the maximum number of iterations for robust principal component analysis algorithms.
| Default | 1000 |
|---|---|
| Minimum value | 0 |
specifies an initial value of mu in the objective function for the accelerated proximal gradient method.
| Default | 0.001 |
|---|---|
| Range | 0–10000000000 |
specifies the maximum number of threads to use on each computation node.
| Default | 16 |
|---|---|
| Range | 0–1024 |
specifies the minimum number of significant variables in an observation for it to be considered as an anomaly by the SIGVARS and ICA-SIGVARS method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_NUMSIGVARS as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1 |
specifies a list of parameters for the output tables of the robust principal component analysis method.
The outRpcaTabs value can be one or more of the following:
specifies the name of the output table for the error matrix.
For more information about specifying the errMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outError |
|---|
specifies the name of the output table for the low-rank matrix.
For more information about specifying the lowRankMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outLowRank |
|---|
specifies the name of the output table for the sparse matrix.
For more information about specifying the sparseMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outSparse |
|---|
specifies a list of parameters for the output tables of the principal component analysis.
The outPcaTabs value can be one or more of the following:
specifies the name of the output table for the principal component loadings.
For more information about specifying the pcLoadings parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the principal component scores.
For more information about specifying the pcScores parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter.
The outSvdTabs value can be one or more of the following:
specifies the name of the output table for the diagonal vector of the rectangular diagonal matrix.
For more information about specifying the svdDiag parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the left-singular vectors.
For more information about specifying the svdLeft parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the right-singular vectors.
For more information about specifying the svdRight parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components.
| Default | "Prin" |
|---|
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter.
For more information about specifying the saveState parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
when set to True, scales the numeric variables by the standard deviation of each column.
| Alias | scaling |
|---|---|
| Default | FALSE |
specifies the threshold on the standardized sparse value in the SIGVARS method for anomaly detection or a coefficient that is applied to the threshold in the R4S method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_SIGMACOEF as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1E-10 |
specifies the maximum value of rank to be considered in the singular value decomposition solver. The default value is the smaller of the number of observations and the number of variables in the input table.
| Minimum value | 1 |
|---|
specifies a list of parameters to use when the value of the svdMethod parameter is RANDOM.
The randomizedSvd value can be one or more of the following:
specifies the parameter power.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the seed value.
| Default | 0 |
|---|---|
| Minimum value | 1 |
specifies the settings for an input table.
| Long form | table={name="table-name"} |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | FALSE |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | FALSE |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the convergence criterion for the robust principal component analysis algorithms.
| Alias | stopcriterion |
|---|---|
| Default | 1E-07 |
| Minimum value | 1E-10 |
when set to True, uses the standard deviation of the columns of the sparse matrix to standardize the sparse part of the scoring observation in the anomaly detection methods SIGVARS and R4S. When set to False, the action uses the standard deviation of the columns of the original input data for that purpose. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_USEMATRIX as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | FALSE |
|---|
Performs robust principal component analysis.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
casOut |
produces SAS score code. This parameter is disabled if you specify the image parameter. |
|
|
— |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter. |
|
|
lowRankMat, sparseMat, errMat |
specifies a list of parameters for the output tables of the robust principal component analysis method. |
|
|
pcLoadings, pcScores |
specifies a list of parameters for the output tables of the principal component analysis. |
|
|
svdDiag, svdLeft, svdRight |
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
— |
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter. |
when set to True, uses a subsequent score action for anomaly detection.
| Aliases | anomaly |
|---|---|
| AD | |
| Default | false |
specifies the method of anomaly detection. If this value is set to 0, the SIGVARS method for anomaly detection is used. If this value is set to 1, the R4S method for anomaly detection is used. If this value is set to 2, the ICA-SIGVARS method for anomaly detection is used. If this value is set to 3, the ICA-NORMS method for anomaly detection is used. For more information about these anomaly detection methods, see the Details section. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_ANOMALYDETECTION_METHOD as the value of the name subparameter, and specify the override value in the value subparameter.
| Alias | ADMethod |
|---|---|
| Default | 0 |
| Range | 0–3 |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
when set to True, centers the numeric variables by the mean of each column.
| Alias | centering |
|---|---|
| Default | false |
produces SAS score code. This parameter is disabled if you specify the image parameter.
The rpcaCodegen value can be one or more of the following:
specifies the settings for an output table.
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
when set to True, applies data compression to the table.
| Default | false |
|---|
specifies the list of variables to create indexes for in the output data.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the maximum amount of memory, in bytes, that each thread should allocate for in-memory blocks before converting to a memory-mapped file. Files are written in the directories that are specified in the CAS_DISK_CACHE environment variable.
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
|---|
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | false |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | false |
|---|
specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the number of bytes to use for blocks in the output table. The blocks are read by threads. Gradually increase this value when you have a large table with millions or billions of rows and you are tuning for performance. Larger values can increase performance with indexed tables. However, if the value is too large, then you can cause thread starvation due to too few blocks for threads to work on.
| Alias | blockSize |
|---|---|
| Default | 1048576 |
| Minimum value | 0 |
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
specifies to add a timestamp column to the table. Support for timeStamp is action-specific. Specify the value in the form that is appropriate for your session locale.
specifies one or more expressions for subsetting the output data. When multiple expressions are specified, the expressions are effectively combined using AND to form the final output filter. If an expression contains quoted values, use nested quotation marks.
when set to True, adds comments to the DATA step code.
| Default | false |
|---|
specifies the width to use for formatting derived numbers such as parameter estimates in the DATA step code.
| Alias | fmtWidth |
|---|---|
| Default | 20 |
| Range | 0–32 |
specifies the number of spaces to indent the DATA step code for each level.
| Default | 3 |
|---|---|
| Range | 0–10 |
specifies the label ID to use in array names and statement labels in the DATA step code. By default, a random positive integer is used.
specifies the line size for the generated code.
| Default | 120 |
|---|---|
| Range | 64–254 |
when set to True, bases the comparison of variables with formatted values on the full format width with padding. By default, leading and trailing blanks are removed from the formatted values.
| Default | false |
|---|
when set to True, generates the code in a way that is appropriate for storing in a table.
| Alias | tableForm |
|---|---|
| Default | false |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter.
For more information about specifying the colStatistics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the significance level of the eigenvalues that determine the rank of the low-rank matrix.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies the decomposition method for the low-rank matrix. If the value of the maxiter parameter is 0, decomposition is applied to the original input data instead of to the low-rank matrix.
| Default | NONE |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, fixes mu in each iteration of the accelerated proximal gradient method. Otherwise, mu is dynamically updated in each iteration.
| Default | false |
|---|
specifies a numeric variable that contains the frequency of occurrence of each observation.
specifies the maximum number of iterations of Infomax ICA when training.
| Default | 100 |
|---|---|
| Range | 1–500 |
specifies the ICA method for RPCA-ICA anomaly detection.
| Default | FOBI |
|---|
specifies the variables to use as record identifiers.
specifies the name of the column that contains image binaries, encoded as JPG, PNG, TIF, or WIDE. You cannot specify this parameter with the inputs parameter.
| Alias | imageVar |
|---|
specifies the numeric variables to be analyzed. If you omit this parameter, all numeric variables that are not specified in other parameters are analyzed. You cannot specify this parameter with the image parameter.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
specifies the value of the coefficient in the objective function (lambda), which is multiplied by the L1 norm of the sparse matrix in the objective function. The default value is computed as 1 divided by the square root of the number of observations or the number of variables in the input table, whichever is greater.
| Range | (0–10000000000] |
|---|
specifies the weight of lambda.
| Default | 1 |
|---|---|
| Range | (0–10000000000] |
specifies the maximum number of iterations for robust principal component analysis algorithms.
| Default | 1000 |
|---|---|
| Minimum value | 0 |
specifies an initial value of mu in the objective function for the accelerated proximal gradient method.
| Default | 0.001 |
|---|---|
| Range | 0–10000000000 |
specifies the maximum number of threads to use on each computation node.
| Default | 16 |
|---|---|
| Range | 0–1024 |
specifies the minimum number of significant variables in an observation for it to be considered as an anomaly by the SIGVARS and ICA-SIGVARS method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_NUMSIGVARS as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1 |
specifies a list of parameters for the output tables of the robust principal component analysis method.
The outRpcaTabs value can be one or more of the following:
specifies the name of the output table for the error matrix.
For more information about specifying the errMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outError |
|---|
specifies the name of the output table for the low-rank matrix.
For more information about specifying the lowRankMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outLowRank |
|---|
specifies the name of the output table for the sparse matrix.
For more information about specifying the sparseMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outSparse |
|---|
specifies a list of parameters for the output tables of the principal component analysis.
The outPcaTabs value can be one or more of the following:
specifies the name of the output table for the principal component loadings.
For more information about specifying the pcLoadings parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the principal component scores.
For more information about specifying the pcScores parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter.
The outSvdTabs value can be one or more of the following:
specifies the name of the output table for the diagonal vector of the rectangular diagonal matrix.
For more information about specifying the svdDiag parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the left-singular vectors.
For more information about specifying the svdLeft parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the right-singular vectors.
For more information about specifying the svdRight parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components.
| Default | "Prin" |
|---|
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter.
For more information about specifying the saveState parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
when set to True, scales the numeric variables by the standard deviation of each column.
| Alias | scaling |
|---|---|
| Default | false |
specifies the threshold on the standardized sparse value in the SIGVARS method for anomaly detection or a coefficient that is applied to the threshold in the R4S method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_SIGMACOEF as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1E-10 |
specifies the maximum value of rank to be considered in the singular value decomposition solver. The default value is the smaller of the number of observations and the number of variables in the input table.
| Minimum value | 1 |
|---|
specifies a list of parameters to use when the value of the svdMethod parameter is RANDOM.
The randomizedSvd value can be one or more of the following:
specifies the parameter power.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the seed value.
| Default | 0 |
|---|---|
| Minimum value | 1 |
specifies the settings for an input table.
| Long form | table={name="table-name"} |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | false |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | false |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the convergence criterion for the robust principal component analysis algorithms.
| Alias | stopcriterion |
|---|---|
| Default | 1E-07 |
| Minimum value | 1E-10 |
when set to True, uses the standard deviation of the columns of the sparse matrix to standardize the sparse part of the scoring observation in the anomaly detection methods SIGVARS and R4S. When set to False, the action uses the standard deviation of the columns of the original input data for that purpose. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_USEMATRIX as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | false |
|---|
Performs robust principal component analysis.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
casOut |
produces SAS score code. This parameter is disabled if you specify the image parameter. |
|
|
— |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter. |
|
|
lowRankMat, sparseMat, errMat |
specifies a list of parameters for the output tables of the robust principal component analysis method. |
|
|
pcLoadings, pcScores |
specifies a list of parameters for the output tables of the principal component analysis. |
|
|
svdDiag, svdLeft, svdRight |
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
— |
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter. |
when set to True, uses a subsequent score action for anomaly detection.
| Aliases | anomaly |
|---|---|
| AD | |
| Default | False |
specifies the method of anomaly detection. If this value is set to 0, the SIGVARS method for anomaly detection is used. If this value is set to 1, the R4S method for anomaly detection is used. If this value is set to 2, the ICA-SIGVARS method for anomaly detection is used. If this value is set to 3, the ICA-NORMS method for anomaly detection is used. For more information about these anomaly detection methods, see the Details section. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_ANOMALYDETECTION_METHOD as the value of the name subparameter, and specify the override value in the value subparameter.
| Alias | ADMethod |
|---|---|
| Default | 0 |
| Range | 0–3 |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
when set to True, centers the numeric variables by the mean of each column.
| Alias | centering |
|---|---|
| Default | False |
produces SAS score code. This parameter is disabled if you specify the image parameter.
The rpcaCodegen value can be one or more of the following:
specifies the settings for an output table.
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
when set to True, applies data compression to the table.
| Default | False |
|---|
specifies the list of variables to create indexes for in the output data.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the maximum amount of memory, in bytes, that each thread should allocate for in-memory blocks before converting to a memory-mapped file. Files are written in the directories that are specified in the CAS_DISK_CACHE environment variable.
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
|---|
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | False |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | False |
|---|
specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the number of bytes to use for blocks in the output table. The blocks are read by threads. Gradually increase this value when you have a large table with millions or billions of rows and you are tuning for performance. Larger values can increase performance with indexed tables. However, if the value is too large, then you can cause thread starvation due to too few blocks for threads to work on.
| Alias | blockSize |
|---|---|
| Default | 1048576 |
| Minimum value | 0 |
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
specifies to add a timestamp column to the table. Support for timeStamp is action-specific. Specify the value in the form that is appropriate for your session locale.
specifies one or more expressions for subsetting the output data. When multiple expressions are specified, the expressions are effectively combined using AND to form the final output filter. If an expression contains quoted values, use nested quotation marks.
when set to True, adds comments to the DATA step code.
| Default | False |
|---|
specifies the width to use for formatting derived numbers such as parameter estimates in the DATA step code.
| Alias | fmtWidth |
|---|---|
| Default | 20 |
| Range | 0–32 |
specifies the number of spaces to indent the DATA step code for each level.
| Default | 3 |
|---|---|
| Range | 0–10 |
specifies the label ID to use in array names and statement labels in the DATA step code. By default, a random positive integer is used.
specifies the line size for the generated code.
| Default | 120 |
|---|---|
| Range | 64–254 |
when set to True, bases the comparison of variables with formatted values on the full format width with padding. By default, leading and trailing blanks are removed from the formatted values.
| Default | False |
|---|
when set to True, generates the code in a way that is appropriate for storing in a table.
| Alias | tableForm |
|---|---|
| Default | False |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter.
For more information about specifying the colStatistics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the significance level of the eigenvalues that determine the rank of the low-rank matrix.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies the decomposition method for the low-rank matrix. If the value of the maxiter parameter is 0, decomposition is applied to the original input data instead of to the low-rank matrix.
| Default | NONE |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, fixes mu in each iteration of the accelerated proximal gradient method. Otherwise, mu is dynamically updated in each iteration.
| Default | False |
|---|
specifies a numeric variable that contains the frequency of occurrence of each observation.
specifies the maximum number of iterations of Infomax ICA when training.
| Default | 100 |
|---|---|
| Range | 1–500 |
specifies the ICA method for RPCA-ICA anomaly detection.
| Default | FOBI |
|---|
specifies the variables to use as record identifiers.
specifies the name of the column that contains image binaries, encoded as JPG, PNG, TIF, or WIDE. You cannot specify this parameter with the inputs parameter.
| Alias | imageVar |
|---|
specifies the numeric variables to be analyzed. If you omit this parameter, all numeric variables that are not specified in other parameters are analyzed. You cannot specify this parameter with the image parameter.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
specifies the value of the coefficient in the objective function (lambda), which is multiplied by the L1 norm of the sparse matrix in the objective function. The default value is computed as 1 divided by the square root of the number of observations or the number of variables in the input table, whichever is greater.
| Range | (0–10000000000] |
|---|
specifies the weight of lambda.
| Default | 1 |
|---|---|
| Range | (0–10000000000] |
specifies the maximum number of iterations for robust principal component analysis algorithms.
| Default | 1000 |
|---|---|
| Minimum value | 0 |
specifies an initial value of mu in the objective function for the accelerated proximal gradient method.
| Default | 0.001 |
|---|---|
| Range | 0–10000000000 |
specifies the maximum number of threads to use on each computation node.
| Default | 16 |
|---|---|
| Range | 0–1024 |
specifies the minimum number of significant variables in an observation for it to be considered as an anomaly by the SIGVARS and ICA-SIGVARS method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_NUMSIGVARS as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1 |
specifies a list of parameters for the output tables of the robust principal component analysis method.
The outRpcaTabs value can be one or more of the following:
specifies the name of the output table for the error matrix.
For more information about specifying the errMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outError |
|---|
specifies the name of the output table for the low-rank matrix.
For more information about specifying the lowRankMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outLowRank |
|---|
specifies the name of the output table for the sparse matrix.
For more information about specifying the sparseMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outSparse |
|---|
specifies a list of parameters for the output tables of the principal component analysis.
The outPcaTabs value can be one or more of the following:
specifies the name of the output table for the principal component loadings.
For more information about specifying the pcLoadings parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the principal component scores.
For more information about specifying the pcScores parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter.
The outSvdTabs value can be one or more of the following:
specifies the name of the output table for the diagonal vector of the rectangular diagonal matrix.
For more information about specifying the svdDiag parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the left-singular vectors.
For more information about specifying the svdLeft parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the right-singular vectors.
For more information about specifying the svdRight parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components.
| Default | "Prin" |
|---|
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter.
For more information about specifying the saveState parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
when set to True, scales the numeric variables by the standard deviation of each column.
| Alias | scaling |
|---|---|
| Default | False |
specifies the threshold on the standardized sparse value in the SIGVARS method for anomaly detection or a coefficient that is applied to the threshold in the R4S method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_SIGMACOEF as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1E-10 |
specifies the maximum value of rank to be considered in the singular value decomposition solver. The default value is the smaller of the number of observations and the number of variables in the input table.
| Minimum value | 1 |
|---|
specifies a list of parameters to use when the value of the svdMethod parameter is RANDOM.
The randomizedSvd value can be one or more of the following:
specifies the parameter power.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the seed value.
| Default | 0 |
|---|---|
| Minimum value | 1 |
specifies the settings for an input table.
| Long form | table={"name":"table-name"} |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | False |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import_ |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | False |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import_ |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the convergence criterion for the robust principal component analysis algorithms.
| Alias | stopcriterion |
|---|---|
| Default | 1E-07 |
| Minimum value | 1E-10 |
when set to True, uses the standard deviation of the columns of the sparse matrix to standardize the sparse part of the scoring observation in the anomaly detection methods SIGVARS and R4S. When set to False, the action uses the standard deviation of the columns of the original input data for that purpose. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_USEMATRIX as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | False |
|---|
Performs robust principal component analysis.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parametertable |
— |
specifies the settings for an input table. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
casOut |
produces SAS score code. This parameter is disabled if you specify the image parameter. |
|
|
— |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter. |
|
|
lowRankMat, sparseMat, errMat |
specifies a list of parameters for the output tables of the robust principal component analysis method. |
|
|
pcLoadings, pcScores |
specifies a list of parameters for the output tables of the principal component analysis. |
|
|
svdDiag, svdLeft, svdRight |
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter. |
|
|
names |
lists the names of results tables to save as CAS tables on the server. |
|
|
— |
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter. |
when set to True, uses a subsequent score action for anomaly detection.
| Aliases | anomaly |
|---|---|
| AD | |
| Default | FALSE |
specifies the method of anomaly detection. If this value is set to 0, the SIGVARS method for anomaly detection is used. If this value is set to 1, the R4S method for anomaly detection is used. If this value is set to 2, the ICA-SIGVARS method for anomaly detection is used. If this value is set to 3, the ICA-NORMS method for anomaly detection is used. For more information about these anomaly detection methods, see the Details section. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_ANOMALYDETECTION_METHOD as the value of the name subparameter, and specify the override value in the value subparameter.
| Alias | ADMethod |
|---|---|
| Default | 0 |
| Range | 0–3 |
changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.
For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | attribute |
|---|---|
| attr |
when set to True, centers the numeric variables by the mean of each column.
| Alias | centering |
|---|---|
| Default | FALSE |
produces SAS score code. This parameter is disabled if you specify the image parameter.
The rpcaCodegen value can be one or more of the following:
specifies the settings for an output table.
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
when set to True, applies data compression to the table.
| Default | FALSE |
|---|
specifies the list of variables to create indexes for in the output data.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the maximum amount of memory, in bytes, that each thread should allocate for in-memory blocks before converting to a memory-mapped file. Files are written in the directories that are specified in the CAS_DISK_CACHE environment variable.
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
|---|
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies the number of copies of the table to make for fault tolerance. Larger values result in slower performance and use more memory, but provide high availability for data in the event of a node failure. Data redundancy applies to distributed servers only.
| Default | 1 |
|---|---|
| Minimum value | 0 |
specifies the number of bytes to use for blocks in the output table. The blocks are read by threads. Gradually increase this value when you have a large table with millions or billions of rows and you are tuning for performance. Larger values can increase performance with indexed tables. However, if the value is too large, then you can cause thread starvation due to too few blocks for threads to work on.
| Alias | blockSize |
|---|---|
| Default | 1048576 |
| Minimum value | 0 |
| TIP | You can enclose the value in quotation marks and specify B, K, M, G, or T as a suffix to indicate the units. For example, "8M" specifies eight megabytes. |
specifies to add a timestamp column to the table. Support for timeStamp is action-specific. Specify the value in the form that is appropriate for your session locale.
specifies one or more expressions for subsetting the output data. When multiple expressions are specified, the expressions are effectively combined using AND to form the final output filter. If an expression contains quoted values, use nested quotation marks.
when set to True, adds comments to the DATA step code.
| Default | FALSE |
|---|
specifies the width to use for formatting derived numbers such as parameter estimates in the DATA step code.
| Alias | fmtWidth |
|---|---|
| Default | 20 |
| Range | 0–32 |
specifies the number of spaces to indent the DATA step code for each level.
| Default | 3 |
|---|---|
| Range | 0–10 |
specifies the label ID to use in array names and statement labels in the DATA step code. By default, a random positive integer is used.
specifies the line size for the generated code.
| Default | 120 |
|---|---|
| Range | 64–254 |
when set to True, bases the comparison of variables with formatted values on the full format width with padding. By default, leading and trailing blanks are removed from the formatted values.
| Default | FALSE |
|---|
when set to True, generates the code in a way that is appropriate for storing in a table.
| Alias | tableForm |
|---|---|
| Default | FALSE |
specifies the name of the output table to contain simple statistics for the variables of the input data set. This parameter is disabled if you specify the image parameter.
For more information about specifying the colStatistics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the significance level of the eigenvalues that determine the rank of the low-rank matrix.
| Default | 1 |
|---|---|
| Range | (0–1] |
specifies the decomposition method for the low-rank matrix. If the value of the maxiter parameter is 0, decomposition is applied to the original input data instead of to the low-rank matrix.
| Default | NONE |
|---|
specifies a list of results tables to send to the client for display.
For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).
when set to True, fixes mu in each iteration of the accelerated proximal gradient method. Otherwise, mu is dynamically updated in each iteration.
| Default | FALSE |
|---|
specifies a numeric variable that contains the frequency of occurrence of each observation.
specifies the maximum number of iterations of Infomax ICA when training.
| Default | 100 |
|---|---|
| Range | 1–500 |
specifies the ICA method for RPCA-ICA anomaly detection.
| Default | FOBI |
|---|
specifies the variables to use as record identifiers.
specifies the name of the column that contains image binaries, encoded as JPG, PNG, TIF, or WIDE. You cannot specify this parameter with the inputs parameter.
| Alias | imageVar |
|---|
specifies the numeric variables to be analyzed. If you omit this parameter, all numeric variables that are not specified in other parameters are analyzed. You cannot specify this parameter with the image parameter.
For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).
| Aliases | input |
|---|---|
| vars | |
| var |
specifies the value of the coefficient in the objective function (lambda), which is multiplied by the L1 norm of the sparse matrix in the objective function. The default value is computed as 1 divided by the square root of the number of observations or the number of variables in the input table, whichever is greater.
| Range | (0–10000000000] |
|---|
specifies the weight of lambda.
| Default | 1 |
|---|---|
| Range | (0–10000000000] |
specifies the maximum number of iterations for robust principal component analysis algorithms.
| Default | 1000 |
|---|---|
| Minimum value | 0 |
specifies an initial value of mu in the objective function for the accelerated proximal gradient method.
| Default | 0.001 |
|---|---|
| Range | 0–10000000000 |
specifies the maximum number of threads to use on each computation node.
| Default | 16 |
|---|---|
| Range | 0–1024 |
specifies the minimum number of significant variables in an observation for it to be considered as an anomaly by the SIGVARS and ICA-SIGVARS method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_NUMSIGVARS as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1 |
specifies a list of parameters for the output tables of the robust principal component analysis method.
The outRpcaTabs value can be one or more of the following:
specifies the name of the output table for the error matrix.
For more information about specifying the errMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outError |
|---|
specifies the name of the output table for the low-rank matrix.
For more information about specifying the lowRankMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outLowRank |
|---|
specifies the name of the output table for the sparse matrix.
For more information about specifying the sparseMat parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | outSparse |
|---|
specifies a list of parameters for the output tables of the principal component analysis.
The outPcaTabs value can be one or more of the following:
specifies the name of the output table for the principal component loadings.
For more information about specifying the pcLoadings parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the principal component scores.
For more information about specifying the pcScores parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
lists the names of results tables to save as CAS tables on the server.
For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).
| Alias | displayOut |
|---|
specifies a list of parameters for the output tables of the singular value decomposition. This parameter is disabled if you specify the image parameter.
The outSvdTabs value can be one or more of the following:
specifies the name of the output table for the diagonal vector of the rectangular diagonal matrix.
For more information about specifying the svdDiag parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the left-singular vectors.
For more information about specifying the svdLeft parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output table for the right-singular vectors.
For more information about specifying the svdRight parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies a prefix for naming the principal components.
| Default | "Prin" |
|---|
specifies the output data table in which to save the scoring results to be used in the score action of the aStore action set. You can specify the RPCA_PROJECTION_TYPE subparameter in the options parameter in the score action: the value 0 projects the scoring observations onto the principal component space; the value 1 projects the scoring observations onto the low-rank subspace; the value 2 projects the scoring observations onto the low-rank subspace, but the sparse part of the scoring data is stored in the scoring results table. The value 0 is not available if you generate the table by using the image parameter.
For more information about specifying the saveState parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
when set to True, scales the numeric variables by the standard deviation of each column.
| Alias | scaling |
|---|---|
| Default | FALSE |
specifies the threshold on the standardized sparse value in the SIGVARS method for anomaly detection or a coefficient that is applied to the threshold in the R4S method. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_SIGMACOEF as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | 1 |
|---|---|
| Minimum value | 1E-10 |
specifies the maximum value of rank to be considered in the singular value decomposition solver. The default value is the smaller of the number of observations and the number of variables in the input table.
| Minimum value | 1 |
|---|
specifies a list of parameters to use when the value of the svdMethod parameter is RANDOM.
The randomizedSvd value can be one or more of the following:
specifies the parameter power.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the seed value.
| Default | 0 |
|---|---|
| Minimum value | 1 |
specifies the settings for an input table.
| Long form | table=list(name="table-name") |
|---|---|
| Shortcut form | table="table-name" |
The castable value can be one or more of the following:
specifies the caslib for the input table that you want to use with the action. By default, the active caslib is used. Specify a value only if you need to access a table from a different caslib.
when set to True, creates the computed variables when the table is loaded instead of when the action begins.
| Alias | compOnDemand |
|---|---|
| Default | FALSE |
specifies the names of the computed variables to create. Specify an expression for each variable in the computedVarsProgram parameter. If you do not specify this parameter, then all variables from computedVarsProgram are automatically included.
| Alias | compVars |
|---|
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for each computed variable that you include in the computedVars parameter.
| Alias | compPgm |
|---|
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the input table.
specifies the variables to use for ordering observations within partitions. This parameter applies to partitioned tables, or it can be combined with variables that are specified in the groupBy parameter when the value of the groupByMode parameter is set to REDISTRIBUTE.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
when set to True, does not create a transient table on the server. Setting this parameter to True can be efficient, but the data might not have stable ordering upon repeated runs.
| Default | FALSE |
|---|
specifies the variables to use in the action.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the input data.
specifies an input table that contains rows to use as a WHERE filter. If the vars parameter is not specified, then all the variable names that are common to the input table and the filtering table are used to find matching rows. If the where parameter for the input table and this parameter are specified, then this filtering table is applied first.
The groupbytable value can be one or more of the following:
specifies the caslib for the filter table. By default, the active caslib is used.
specifies data source options.
| Aliases | options |
|---|---|
| dataSource |
For more information about specifying the dataSourceOptions parameter, see the common dataSourceOptions parameter (Appendix A: Common Parameters).
specifies the settings for reading a table from a data source.
| Alias | import |
|---|
For more information about specifying the importOptions parameter, see the common importOptions parameter (Appendix A: Common Parameters).
specifies the name of the filter table.
specifies the variable names to use from the filter table.
The casinvardesc value can be one or more of the following:
specifies the format to apply to the variable.
specifies the length of the format field plus the length of the format precision.
specifies the descriptive label for the variable.
specifies the name for the variable.
specifies the length of the format precision.
specifies the length of the format field.
specifies an expression for subsetting the data from the filter table.
specifies the convergence criterion for the robust principal component analysis algorithms.
| Alias | stopcriterion |
|---|---|
| Default | 1E-07 |
| Minimum value | 1E-10 |
when set to True, uses the standard deviation of the columns of the sparse matrix to standardize the sparse part of the scoring observation in the anomaly detection methods SIGVARS and R4S. When set to False, the action uses the standard deviation of the columns of the original input data for that purpose. You can override this parameter by specifying the following values in the respective subparameters of the options parameter in the score action: specify RPCA_USEMATRIX as the value of the name subparameter, and specify the override value in the value subparameter.
| Default | FALSE |
|---|