Text Mining Action Set

Provides actions for mining textual data

tmSvd Action

Computes the SVD factorization and generates topics. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

CASL Syntax

textMining.tmSvd <result=results> <status=rc> /
config={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
count="variable-name",
docId="variable-name",
docPro={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
exactDocPro=TRUE | FALSE,
exactWeight=TRUE | FALSE,
k=integer,
legacyNames=TRUE | FALSE,
maxK=integer,
norm="ALL" | "DOC" | "NONE" | "WORD",
nThreads=integer,
numLabels=integer,
required parameter parent={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
resolution="HIGH" | "LOW" | "MED",
rotate="PROMAX" | "VARIMAX",
rowPivot=double,
s={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
scoreConfig={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
termId="variable-name",
terms={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
termTopics={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
tolerance=double,
topicDecision=TRUE | FALSE,
topics={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
u={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
v={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
wordPro={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
}
;
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 config

specifies the name of the input CAS table that contains parsing configuration information

required parameterparent

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

 terms

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 docPro

specifies the name of the table to contain the SVD projections of the documents.

 s

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

 scoreConfig

Specifies the output scoring config file.

 termTopics

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

 topics

specifies the output CAS table to contain the topics that are discovered.

 u

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

 v

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

 wordPro

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

Parameter Descriptions

config={castable}

specifies the name of the input CAS table that contains parsing configuration information

For more information about specifying the config parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias parseConfig

count="variable-name"

specifies the variable that contains the, possibly weighted, term count. The values in this variable must be numeric. There can be no missing values in this variable.

Default "_COUNT_"

docId="variable-name"

specifies the variable that contains the document ID. The type of this variable can either be numeric or a string. There can be no missing values in this variable.

Default "_DOCUMENT_"

docPro={casouttable}

specifies the name of the table to contain the SVD projections of the documents.

For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

docStdMultiple=double

specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

exactDocPro=TRUE | FALSE

specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.

Default TRUE

exactWeight=TRUE | FALSE

Alias exactWeights
Default FALSE

k=integer

specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.

Alias numTopics
Range 1–1000

legacyNames=TRUE | FALSE

specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default FALSE

maxK=integer

specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.

Default 10
Range 1–1000

norm="ALL" | "DOC" | "NONE" | "WORD"

specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.

Default ALL

nThreads=integer

specifies number of threads to be used per node. If not set, or if a value of 0 is specified, all available threads will be used.

Default 8
Range 0–64

numLabels=integer

specifies the number of terms to use in the descriptive label for each topic.

Default 5
Range 1–500

* parent={castable}

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

For more information about specifying the parent parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

resolution="HIGH" | "LOW" | "MED"

specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.

Default HIGH

rotate="PROMAX" | "VARIMAX"

specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.

Default VARIMAX

rowPivot=double

specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.

Default -1
Range -1–1

s={casouttable}

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

scoreConfig={casouttable}

Specifies the output scoring config file.

For more information about specifying the scoreConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

termId="variable-name"

specifies the variable that contains the term ID. The contents of this variable must be an integer greater than or equal to 1. There can be no missing values in this variable.

Default "_TERMNUM_"

terms={castable}

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

For more information about specifying the terms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

termStdMultiple=double

specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

termTopics={casouttable}

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tolerance=double

specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.

Default 1E-06
Range 0–1

topicDecision=TRUE | FALSE

Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default FALSE

topics={casouttable}

specifies the output CAS table to contain the topics that are discovered.

For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

u={casouttable}

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

v={casouttable}

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

wordPro={casouttable}

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tmSvd Action

Computes the SVD factorization and generates topics. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Lua Syntax

results, info = s:textMining_tmSvd{
config={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
count="variable-name",
docId="variable-name",
docPro={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
exactDocPro=true | false,
exactWeight=true | false,
k=integer,
legacyNames=true | false,
maxK=integer,
norm="ALL" | "DOC" | "NONE" | "WORD",
nThreads=integer,
numLabels=integer,
required parameter parent={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
resolution="HIGH" | "LOW" | "MED",
rotate="PROMAX" | "VARIMAX",
rowPivot=double,
s={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
scoreConfig={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
termId="variable-name",
terms={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
termTopics={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
tolerance=double,
topicDecision=true | false,
topics={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
u={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
v={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
wordPro={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
}
}
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 config

specifies the name of the input CAS table that contains parsing configuration information

required parameterparent

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

 terms

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 docPro

specifies the name of the table to contain the SVD projections of the documents.

 s

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

 scoreConfig

Specifies the output scoring config file.

 termTopics

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

 topics

specifies the output CAS table to contain the topics that are discovered.

 u

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

 v

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

 wordPro

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

Parameter Descriptions

config={castable}

specifies the name of the input CAS table that contains parsing configuration information

For more information about specifying the config parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias parseConfig

count="variable-name"

specifies the variable that contains the, possibly weighted, term count. The values in this variable must be numeric. There can be no missing values in this variable.

Default "_COUNT_"

docId="variable-name"

specifies the variable that contains the document ID. The type of this variable can either be numeric or a string. There can be no missing values in this variable.

Default "_DOCUMENT_"

docPro={casouttable}

specifies the name of the table to contain the SVD projections of the documents.

For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

docStdMultiple=double

specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

exactDocPro=true | false

specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.

Default true

exactWeight=true | false

Alias exactWeights
Default false

k=integer

specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.

Alias numTopics
Range 1–1000

legacyNames=true | false

specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default false

maxK=integer

specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.

Default 10
Range 1–1000

norm="ALL" | "DOC" | "NONE" | "WORD"

specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.

Default ALL

nThreads=integer

specifies number of threads to be used per node. If not set, or if a value of 0 is specified, all available threads will be used.

Default 8
Range 0–64

numLabels=integer

specifies the number of terms to use in the descriptive label for each topic.

Default 5
Range 1–500

* parent={castable}

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

For more information about specifying the parent parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

resolution="HIGH" | "LOW" | "MED"

specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.

Default HIGH

rotate="PROMAX" | "VARIMAX"

specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.

Default VARIMAX

rowPivot=double

specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.

Default -1
Range -1–1

s={casouttable}

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

scoreConfig={casouttable}

Specifies the output scoring config file.

For more information about specifying the scoreConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

termId="variable-name"

specifies the variable that contains the term ID. The contents of this variable must be an integer greater than or equal to 1. There can be no missing values in this variable.

Default "_TERMNUM_"

terms={castable}

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

For more information about specifying the terms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

termStdMultiple=double

specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

termTopics={casouttable}

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tolerance=double

specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.

Default 1E-06
Range 0–1

topicDecision=true | false

Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default false

topics={casouttable}

specifies the output CAS table to contain the topics that are discovered.

For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

u={casouttable}

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

v={casouttable}

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

wordPro={casouttable}

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tmSvd Action

Computes the SVD factorization and generates topics. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Python Syntax

results=s.textMining.tmSvd(
config={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
count="variable-name",
docId="variable-name",
docPro={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
exactDocPro=True | False,
exactWeight=True | False,
k=integer,
legacyNames=True | False,
maxK=integer,
norm="ALL" | "DOC" | "NONE" | "WORD",
nThreads=integer,
numLabels=integer,
required parameter parent={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
resolution="HIGH" | "LOW" | "MED",
rotate="PROMAX" | "VARIMAX",
rowPivot=double,
s={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
scoreConfig={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
termId="variable-name",
terms={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
termTopics={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
tolerance=double,
topicDecision=True | False,
topics={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
u={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
v={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
wordPro={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
}
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 config

specifies the name of the input CAS table that contains parsing configuration information

required parameterparent

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

 terms

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 docPro

specifies the name of the table to contain the SVD projections of the documents.

 s

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

 scoreConfig

Specifies the output scoring config file.

 termTopics

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

 topics

specifies the output CAS table to contain the topics that are discovered.

 u

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

 v

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

 wordPro

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

Parameter Descriptions

config={castable}

specifies the name of the input CAS table that contains parsing configuration information

For more information about specifying the config parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias parseConfig

count="variable-name"

specifies the variable that contains the, possibly weighted, term count. The values in this variable must be numeric. There can be no missing values in this variable.

Default "_COUNT_"

docId="variable-name"

specifies the variable that contains the document ID. The type of this variable can either be numeric or a string. There can be no missing values in this variable.

Default "_DOCUMENT_"

docPro={casouttable}

specifies the name of the table to contain the SVD projections of the documents.

For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

docStdMultiple=double

specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

exactDocPro=True | False

specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.

Default True

exactWeight=True | False

Alias exactWeights
Default False

k=integer

specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.

Alias numTopics
Range 1–1000

legacyNames=True | False

specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default False

maxK=integer

specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.

Default 10
Range 1–1000

norm="ALL" | "DOC" | "NONE" | "WORD"

specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.

Default ALL

nThreads=integer

specifies number of threads to be used per node. If not set, or if a value of 0 is specified, all available threads will be used.

Default 8
Range 0–64

numLabels=integer

specifies the number of terms to use in the descriptive label for each topic.

Default 5
Range 1–500

* parent={castable}

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

For more information about specifying the parent parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

resolution="HIGH" | "LOW" | "MED"

specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.

Default HIGH

rotate="PROMAX" | "VARIMAX"

specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.

Default VARIMAX

rowPivot=double

specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.

Default -1
Range -1–1

s={casouttable}

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

scoreConfig={casouttable}

Specifies the output scoring config file.

For more information about specifying the scoreConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

termId="variable-name"

specifies the variable that contains the term ID. The contents of this variable must be an integer greater than or equal to 1. There can be no missing values in this variable.

Default "_TERMNUM_"

terms={castable}

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

For more information about specifying the terms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

termStdMultiple=double

specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

termTopics={casouttable}

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tolerance=double

specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.

Default 1E-06
Range 0–1

topicDecision=True | False

Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default False

topics={casouttable}

specifies the output CAS table to contain the topics that are discovered.

For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

u={casouttable}

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

v={casouttable}

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

wordPro={casouttable}

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tmSvd Action

Computes the SVD factorization and generates topics. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

R Syntax

results <– cas.textMining.tmSvd(s,
config=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
count="variable-name",
docId="variable-name",
docPro=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
exactDocPro=TRUE | FALSE,
exactWeight=TRUE | FALSE,
k=integer,
legacyNames=TRUE | FALSE,
maxK=integer,
norm="ALL" | "DOC" | "NONE" | "WORD",
nThreads=integer,
numLabels=integer,
required parameter parent=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
resolution="HIGH" | "LOW" | "MED",
rotate="PROMAX" | "VARIMAX",
rowPivot=double,
s=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
scoreConfig=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
termId="variable-name",
terms=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
termTopics=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
tolerance=double,
topicDecision=TRUE | FALSE,
topics=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
u=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
v=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
wordPro=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
)
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 config

specifies the name of the input CAS table that contains parsing configuration information

required parameterparent

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

 terms

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 docPro

specifies the name of the table to contain the SVD projections of the documents.

 s

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

 scoreConfig

Specifies the output scoring config file.

 termTopics

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

 topics

specifies the output CAS table to contain the topics that are discovered.

 u

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

 v

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

 wordPro

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

Parameter Descriptions

config=list(castable)

specifies the name of the input CAS table that contains parsing configuration information

For more information about specifying the config parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias parseConfig

count="variable-name"

specifies the variable that contains the, possibly weighted, term count. The values in this variable must be numeric. There can be no missing values in this variable.

Default "_COUNT_"

docId="variable-name"

specifies the variable that contains the document ID. The type of this variable can either be numeric or a string. There can be no missing values in this variable.

Default "_DOCUMENT_"

docPro=list(casouttable)

specifies the name of the table to contain the SVD projections of the documents.

For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

docStdMultiple=double

specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

exactDocPro=TRUE | FALSE

specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.

Default TRUE

exactWeight=TRUE | FALSE

Alias exactWeights
Default FALSE

k=integer

specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.

Alias numTopics
Range 1–1000

legacyNames=TRUE | FALSE

specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default FALSE

maxK=integer

specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.

Default 10
Range 1–1000

norm="ALL" | "DOC" | "NONE" | "WORD"

specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.

Default ALL

nThreads=integer

specifies number of threads to be used per node. If not set, or if a value of 0 is specified, all available threads will be used.

Default 8
Range 0–64

numLabels=integer

specifies the number of terms to use in the descriptive label for each topic.

Default 5
Range 1–500

* parent=list(castable)

specifies the input CAS table that contains the term-by-document matrix in transaction form. The table must have at last three variables, one containing the document id, a second containing the term id, and the third containing the value in the cell corresponding to that particular term and document.

For more information about specifying the parent parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

resolution="HIGH" | "LOW" | "MED"

specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.

Default HIGH

rotate="PROMAX" | "VARIMAX"

specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.

Default VARIMAX

rowPivot=double

specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.

Default -1
Range -1–1

s=list(casouttable)

specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.

For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

scoreConfig=list(casouttable)

Specifies the output scoring config file.

For more information about specifying the scoreConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

termId="variable-name"

specifies the variable that contains the term ID. The contents of this variable must be an integer greater than or equal to 1. There can be no missing values in this variable.

Default "_TERMNUM_"

terms=list(castable)

specifies the name of the input table that contains information about the terms in the document collection. The table is used to determine which terms to use in the topic calculation.

For more information about specifying the terms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

termStdMultiple=double

specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.

Default 1
Range 0–10

termTopics=list(casouttable)

specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.

For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

tolerance=double

specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.

Default 1E-06
Range 0–1

topicDecision=TRUE | FALSE

Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.

Default FALSE

topics=list(casouttable)

specifies the output CAS table to contain the topics that are discovered.

For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

u=list(casouttable)

specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.

For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

v=list(casouttable)

specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.

For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

wordPro=list(casouttable)

specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.

For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

Last updated: November 23, 2025