Provides actions for mining textual data
Combines the tpParse action, the tpAccumulate action, and SVD functionality into one action. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parameterdocuments |
— |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar. |
|
— |
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license. |
|
|
— |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token. |
|
|
— |
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts. |
|
|
— |
specifies the name of the table to contain the SVD projections of the documents. |
|
|
— |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. |
|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix. |
|
|
— |
specifies the name of the config CAS table to contain parsing configuration information. |
|
|
— |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values. |
|
|
— |
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information. |
|
|
— |
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the output CAS table to contain the topics that are discovered. |
|
|
— |
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1. |
|
|
— |
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1. |
|
|
— |
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns. |
specifies how the elements in the term-by-document matrix (the parent output table) are weighted.
| Alias | cellWgt |
|---|---|
| Default | LOG |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts.
For more information about specifying the child parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Specifies if the part of speech tags that should be used for tokenization and accumulation are detailed (complex, such as A.nom.f.p), or general (simple, such as A). This parameter requires a SAS Visual Text Analytics license.
| Default | FALSE |
|---|
specifies a list of variables from the documents table that are to be retained on the output docPro table. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Aliases | keepVars |
|---|---|
| keepVar |
specifies the priority of the default LITI file that contains predefined concepts when both predefined and custom concepts are used. The default setting is 1, which means that the predefined concepts have the lowest priority compared to the custom concepts. However, certain predefined concepts within the LITI file may still have a higher priority. For more information, see the SAS Visual Text Analytics User's Guide. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–32 |
specifies the character or numeric variable on the documents table that contains the ID of each document.
| Default | "DOC_ID" |
|---|
specifies the name of the table to contain the SVD projections of the documents.
For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar.
For more information about specifying the documents parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to extract entities in parsing. If set to None, no entities are output. If set to STD, the standard entities are output.
| Default | NONE |
|---|
specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.
| Default | TRUE |
|---|
specifies if the exact entries on the u table are to be used in the topic computation, otherwise the values are rounded to three decimal places.
| Alias | exactWeights |
|---|---|
| Default | FALSE |
Indicates if empty document indicators are included on parent table. This parameter requires a SAS Visual Text Analytics license.
| Default | FALSE |
|---|
specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.
| Alias | numTopics |
|---|---|
| Range | 1–1000 |
specifies the language used in the text variable of the input document table.
| Default | ENGLISH |
|---|
specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | FALSE |
|---|
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license.
For more information about specifying the liti parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.
| Default | 10 |
|---|---|
| Range | 1–1000 |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token.
For more information about specifying the multiterm parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.
| Default | ALL |
|---|
when set to True, extracts noun groups during parsing and adds the noun groups as additional rows in the offset table. This is also reflected in the terms and parent tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | TRUE |
|---|
specifies the number of threads to be used per node. The value must be an integer. When the value is 0, the number of threads equals the number of CPUs.
| Default | 8 |
|---|---|
| Minimum value | 0 |
specifies the number of terms to use in the descriptive label for each topic.
| Default | 5 |
|---|---|
| Range | 1–500 |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value.
For more information about specifying the offset parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix.
For more information about specifying the parent parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the config CAS table to contain parsing configuration information.
For more information about specifying the parseConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | scoreConfig |
|---|
specifies the minimum number of documents a term should be in to be kept. The value must be an integer.
| Default | 10 |
|---|---|
| Range | 1–32767 |
specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.
| Default | HIGH |
|---|
specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.
| Default | VARIMAX |
|---|
specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.
| Default | -1 |
|---|---|
| Range | -1–1 |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.
For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Long form | saveState={name="table-name"} |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies a list of attribute types to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of entity types to be kept or ignored. If this parameter is specified, entities must be set to STD.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of part-of-speech tags to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies whether to include terms that have a keep status of N in the TERMS output table.
| Default | FALSE |
|---|
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the startList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether stemming is to occur in parsing. When set to True, terms are evaluated to see if they belong to a common parent form and the information is added to the offset table.
| Default | TRUE |
|---|
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the stopList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional.
For more information about specifying the synonyms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether part-of-speech tagging is used in parsing.
| Default | TRUE |
|---|
specifies the numeric or character variable that contains a category level on the documents table. This parameter is optional unless you plan to use Mutual Information as the term weight in accumulation.
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
For more information about specifying the terms parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.
For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how terms are weighted. Valid values are Entropy, None and MI (Mutual Information). MI requires a target variable in the offset table, which is generated by the tpParse action.
| Alias | termWgt |
|---|---|
| Default | ENTROPY |
specifies the character variable in the documents table that contains the text to be processed.
| Default | "text" |
|---|
specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.
| Default | 1E-05 |
|---|---|
| Range | 0–1 |
Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | FALSE |
|---|
specifies the output CAS table to contain the topics that are discovered.
For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.
For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.
For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.
For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Combines the tpParse action, the tpAccumulate action, and SVD functionality into one action. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parameterdocuments |
— |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar. |
|
— |
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license. |
|
|
— |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token. |
|
|
— |
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts. |
|
|
— |
specifies the name of the table to contain the SVD projections of the documents. |
|
|
— |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. |
|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix. |
|
|
— |
specifies the name of the config CAS table to contain parsing configuration information. |
|
|
— |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values. |
|
|
— |
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information. |
|
|
— |
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the output CAS table to contain the topics that are discovered. |
|
|
— |
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1. |
|
|
— |
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1. |
|
|
— |
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns. |
specifies how the elements in the term-by-document matrix (the parent output table) are weighted.
| Alias | cellWgt |
|---|---|
| Default | LOG |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts.
For more information about specifying the child parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Specifies if the part of speech tags that should be used for tokenization and accumulation are detailed (complex, such as A.nom.f.p), or general (simple, such as A). This parameter requires a SAS Visual Text Analytics license.
| Default | false |
|---|
specifies a list of variables from the documents table that are to be retained on the output docPro table. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Aliases | keepVars |
|---|---|
| keepVar |
specifies the priority of the default LITI file that contains predefined concepts when both predefined and custom concepts are used. The default setting is 1, which means that the predefined concepts have the lowest priority compared to the custom concepts. However, certain predefined concepts within the LITI file may still have a higher priority. For more information, see the SAS Visual Text Analytics User's Guide. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–32 |
specifies the character or numeric variable on the documents table that contains the ID of each document.
| Default | "DOC_ID" |
|---|
specifies the name of the table to contain the SVD projections of the documents.
For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar.
For more information about specifying the documents parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to extract entities in parsing. If set to None, no entities are output. If set to STD, the standard entities are output.
| Default | NONE |
|---|
specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.
| Default | true |
|---|
specifies if the exact entries on the u table are to be used in the topic computation, otherwise the values are rounded to three decimal places.
| Alias | exactWeights |
|---|---|
| Default | false |
Indicates if empty document indicators are included on parent table. This parameter requires a SAS Visual Text Analytics license.
| Default | false |
|---|
specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.
| Alias | numTopics |
|---|---|
| Range | 1–1000 |
specifies the language used in the text variable of the input document table.
| Default | ENGLISH |
|---|
specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | false |
|---|
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license.
For more information about specifying the liti parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.
| Default | 10 |
|---|---|
| Range | 1–1000 |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token.
For more information about specifying the multiterm parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.
| Default | ALL |
|---|
when set to True, extracts noun groups during parsing and adds the noun groups as additional rows in the offset table. This is also reflected in the terms and parent tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | true |
|---|
specifies the number of threads to be used per node. The value must be an integer. When the value is 0, the number of threads equals the number of CPUs.
| Default | 8 |
|---|---|
| Minimum value | 0 |
specifies the number of terms to use in the descriptive label for each topic.
| Default | 5 |
|---|---|
| Range | 1–500 |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value.
For more information about specifying the offset parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix.
For more information about specifying the parent parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the config CAS table to contain parsing configuration information.
For more information about specifying the parseConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | scoreConfig |
|---|
specifies the minimum number of documents a term should be in to be kept. The value must be an integer.
| Default | 10 |
|---|---|
| Range | 1–32767 |
specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.
| Default | HIGH |
|---|
specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.
| Default | VARIMAX |
|---|
specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.
| Default | -1 |
|---|---|
| Range | -1–1 |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.
For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Long form | saveState={name="table-name"} |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | false |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | false |
|---|
specifies a list of attribute types to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of entity types to be kept or ignored. If this parameter is specified, entities must be set to STD.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of part-of-speech tags to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies whether to include terms that have a keep status of N in the TERMS output table.
| Default | false |
|---|
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the startList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether stemming is to occur in parsing. When set to True, terms are evaluated to see if they belong to a common parent form and the information is added to the offset table.
| Default | true |
|---|
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the stopList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional.
For more information about specifying the synonyms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether part-of-speech tagging is used in parsing.
| Default | true |
|---|
specifies the numeric or character variable that contains a category level on the documents table. This parameter is optional unless you plan to use Mutual Information as the term weight in accumulation.
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
For more information about specifying the terms parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.
For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how terms are weighted. Valid values are Entropy, None and MI (Mutual Information). MI requires a target variable in the offset table, which is generated by the tpParse action.
| Alias | termWgt |
|---|---|
| Default | ENTROPY |
specifies the character variable in the documents table that contains the text to be processed.
| Default | "text" |
|---|
specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.
| Default | 1E-05 |
|---|---|
| Range | 0–1 |
Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | false |
|---|
specifies the output CAS table to contain the topics that are discovered.
For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.
For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.
For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.
For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Combines the tpParse action, the tpAccumulate action, and SVD functionality into one action. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parameterdocuments |
— |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar. |
|
— |
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license. |
|
|
— |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token. |
|
|
— |
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts. |
|
|
— |
specifies the name of the table to contain the SVD projections of the documents. |
|
|
— |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. |
|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix. |
|
|
— |
specifies the name of the config CAS table to contain parsing configuration information. |
|
|
— |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values. |
|
|
— |
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information. |
|
|
— |
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the output CAS table to contain the topics that are discovered. |
|
|
— |
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1. |
|
|
— |
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1. |
|
|
— |
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns. |
specifies how the elements in the term-by-document matrix (the parent output table) are weighted.
| Alias | cellWgt |
|---|---|
| Default | LOG |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts.
For more information about specifying the child parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Specifies if the part of speech tags that should be used for tokenization and accumulation are detailed (complex, such as A.nom.f.p), or general (simple, such as A). This parameter requires a SAS Visual Text Analytics license.
| Default | False |
|---|
specifies a list of variables from the documents table that are to be retained on the output docPro table. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Aliases | keepVars |
|---|---|
| keepVar |
specifies the priority of the default LITI file that contains predefined concepts when both predefined and custom concepts are used. The default setting is 1, which means that the predefined concepts have the lowest priority compared to the custom concepts. However, certain predefined concepts within the LITI file may still have a higher priority. For more information, see the SAS Visual Text Analytics User's Guide. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–32 |
specifies the character or numeric variable on the documents table that contains the ID of each document.
| Default | "DOC_ID" |
|---|
specifies the name of the table to contain the SVD projections of the documents.
For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar.
For more information about specifying the documents parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to extract entities in parsing. If set to None, no entities are output. If set to STD, the standard entities are output.
| Default | NONE |
|---|
specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.
| Default | True |
|---|
specifies if the exact entries on the u table are to be used in the topic computation, otherwise the values are rounded to three decimal places.
| Alias | exactWeights |
|---|---|
| Default | False |
Indicates if empty document indicators are included on parent table. This parameter requires a SAS Visual Text Analytics license.
| Default | False |
|---|
specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.
| Alias | numTopics |
|---|---|
| Range | 1–1000 |
specifies the language used in the text variable of the input document table.
| Default | ENGLISH |
|---|
specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | False |
|---|
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license.
For more information about specifying the liti parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.
| Default | 10 |
|---|---|
| Range | 1–1000 |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token.
For more information about specifying the multiterm parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.
| Default | ALL |
|---|
when set to True, extracts noun groups during parsing and adds the noun groups as additional rows in the offset table. This is also reflected in the terms and parent tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | True |
|---|
specifies the number of threads to be used per node. The value must be an integer. When the value is 0, the number of threads equals the number of CPUs.
| Default | 8 |
|---|---|
| Minimum value | 0 |
specifies the number of terms to use in the descriptive label for each topic.
| Default | 5 |
|---|---|
| Range | 1–500 |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value.
For more information about specifying the offset parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix.
For more information about specifying the parent parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the config CAS table to contain parsing configuration information.
For more information about specifying the parseConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | scoreConfig |
|---|
specifies the minimum number of documents a term should be in to be kept. The value must be an integer.
| Default | 10 |
|---|---|
| Range | 1–32767 |
specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.
| Default | HIGH |
|---|
specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.
| Default | VARIMAX |
|---|
specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.
| Default | -1 |
|---|---|
| Range | -1–1 |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.
For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Long form | saveState={"name":"table-name"} |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | False |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | False |
|---|
specifies a list of attribute types to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of entity types to be kept or ignored. If this parameter is specified, entities must be set to STD.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of part-of-speech tags to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies whether to include terms that have a keep status of N in the TERMS output table.
| Default | False |
|---|
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the startList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether stemming is to occur in parsing. When set to True, terms are evaluated to see if they belong to a common parent form and the information is added to the offset table.
| Default | True |
|---|
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the stopList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional.
For more information about specifying the synonyms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether part-of-speech tagging is used in parsing.
| Default | True |
|---|
specifies the numeric or character variable that contains a category level on the documents table. This parameter is optional unless you plan to use Mutual Information as the term weight in accumulation.
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
For more information about specifying the terms parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.
For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how terms are weighted. Valid values are Entropy, None and MI (Mutual Information). MI requires a target variable in the offset table, which is generated by the tpParse action.
| Alias | termWgt |
|---|---|
| Default | ENTROPY |
specifies the character variable in the documents table that contains the text to be processed.
| Default | "text" |
|---|
specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.
| Default | 1E-05 |
|---|---|
| Range | 0–1 |
Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | False |
|---|
specifies the output CAS table to contain the topics that are discovered.
For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.
For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.
For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.
For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Combines the tpParse action, the tpAccumulate action, and SVD functionality into one action. Some parameters require a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
required parameterdocuments |
— |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar. |
|
— |
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license. |
|
|
— |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token. |
|
|
— |
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional. |
|
|
— |
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional. |
|
Parameter |
Subparameter |
Description |
|---|---|---|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts. |
|
|
— |
specifies the name of the table to contain the SVD projections of the documents. |
|
|
— |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. |
|
|
— |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix. |
|
|
— |
specifies the name of the config CAS table to contain parsing configuration information. |
|
|
— |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values. |
|
|
— |
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information. |
|
|
— |
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license. |
|
|
— |
specifies the output CAS table to contain the topics that are discovered. |
|
|
— |
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1. |
|
|
— |
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1. |
|
|
— |
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns. |
specifies how the elements in the term-by-document matrix (the parent output table) are weighted.
| Alias | cellWgt |
|---|---|
| Default | LOG |
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix with raw counts.
For more information about specifying the child parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
Specifies if the part of speech tags that should be used for tokenization and accumulation are detailed (complex, such as A.nom.f.p), or general (simple, such as A). This parameter requires a SAS Visual Text Analytics license.
| Default | FALSE |
|---|
specifies a list of variables from the documents table that are to be retained on the output docPro table. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Aliases | keepVars |
|---|---|
| keepVar |
specifies the priority of the default LITI file that contains predefined concepts when both predefined and custom concepts are used. The default setting is 1, which means that the predefined concepts have the lowest priority compared to the custom concepts. However, certain predefined concepts within the LITI file may still have a higher priority. For more information, see the SAS Visual Text Analytics User's Guide. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–32 |
specifies the character or numeric variable on the documents table that contains the ID of each document.
| Default | "DOC_ID" |
|---|
specifies the name of the table to contain the SVD projections of the documents.
For more information about specifying the docPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the document cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
names the input CAS table of documents to be parsed. You must include a text variable specified with textVar and a document ID variable specified with docIdVar.
For more information about specifying the documents parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to extract entities in parsing. If set to None, no entities are output. If set to STD, the standard entities are output.
| Default | NONE |
|---|
specifies if the exact document projection values should be output. This parameter requires a SAS Visual Text Analytics license.
| Default | TRUE |
|---|
specifies if the exact entries on the u table are to be used in the topic computation, otherwise the values are rounded to three decimal places.
| Alias | exactWeights |
|---|---|
| Default | FALSE |
Indicates if empty document indicators are included on parent table. This parameter requires a SAS Visual Text Analytics license.
| Default | FALSE |
|---|
specifies the number of dimensions to be extracted (also the number of derived topics). If the input data is too small for the requested number of dimensions, this value is adjusted to complete the calculation.
| Alias | numTopics |
|---|---|
| Range | 1–1000 |
specifies the language used in the text variable of the input document table.
| Default | ENGLISH |
|---|
specifies whether to use the legacy variable names on tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | FALSE |
|---|
specifies the input CAS table that contains the LITI binary, which contains the predefined or custom concept definitions. The tmMine action can reference a concepts model that is compiled in the compileConcept action. For more information on how to do this, see the example, Referencing a Concepts Model in the tmMine Action. This parameter requires a SAS Visual Text Analytics license.
For more information about specifying the liti parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the maximum number of dimensions to be extracted. The maxK option can be used in conjunction with the resolution option to dynamically select the recommended number of dimensions. If you wish to use a specific number of dimensions use maxK and set the resolution to high, or use the k parameter.
| Default | 10 |
|---|---|
| Range | 1–1000 |
specifies the name of the CAS table that contains a list of multi-word terms and their part-of-speech types. Each multi-word term is parsed as a single token.
For more information about specifying the multiterm parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether to normalize the document projections, term projections, or both. The normalization converts the representation from depending on angles between vectors to depending on Euclidean distances between vectors.
| Default | ALL |
|---|
when set to True, extracts noun groups during parsing and adds the noun groups as additional rows in the offset table. This is also reflected in the terms and parent tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | TRUE |
|---|
specifies the number of threads to be used per node. The value must be an integer. When the value is 0, the number of threads equals the number of CPUs.
| Default | 8 |
|---|---|
| Minimum value | 0 |
specifies the number of terms to use in the descriptive label for each topic.
| Default | 5 |
|---|---|
| Range | 1–500 |
specifies the name of the output CAS table to contain the position information about the occurrences of child terms in the document collection. The maximum output length of a tokenized term in this table is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value.
For more information about specifying the offset parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the output CAS table to contain a compressed representation of the sparse term-by-document matrix.
For more information about specifying the parent parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the config CAS table to contain parsing configuration information.
For more information about specifying the parseConfig parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
| Alias | scoreConfig |
|---|
specifies the minimum number of documents a term should be in to be kept. The value must be an integer.
| Default | 10 |
|---|---|
| Range | 1–32767 |
specifies the desired resolution level for the recommended number of dimensions to be extracted by the SVD.
| Default | HIGH |
|---|
specifies the type of rotation used to maximize the explanatory power of each topic. A VARIMAX rotation produces uncorrelated topics and a PROMAX rotation produces correlated topics.
| Default | VARIMAX |
|---|
specifies the row-pivot weight for document normalization of the parent table before the SVD. A negative value turns off the row-pivot process. When topics are requested, a value of 1 is used for this parameter by default. This parameter requires a SAS Visual Text Analytics license.
| Default | -1 |
|---|---|
| Range | -1–1 |
specifies the S matrix, which is a diagonal matrix that is output in compressed form, with two variables and k rows. The variable _ID_ indicates the row and column of the entry and the variable S contains the singular values.
For more information about specifying the s parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the name of the table for saving the analytic score model. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Long form | saveState=list(name="table-name") |
|---|---|
| Shortcut form | saveState="table-name" |
The casouttable value can be one or more of the following:
specifies the name of the caslib for the output table.
specifies the descriptive label to associate with the table.
specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.
| Default | 0 |
|---|---|
| Minimum value | 0 |
specifies the memory format for the output table.
| Default | INHERIT |
|---|
use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.
specifies the name for the output table.
when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.
| Default | FALSE |
|---|
when set to True, overwrites an existing table that has the same name.
| Default | FALSE |
|---|
specifies a list of attribute types to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of entity types to be kept or ignored. If this parameter is specified, entities must be set to STD.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies a list of part-of-speech tags to be kept or ignored.
The seltag value can be one or more of the following:
specifies what to do with terms with selected tags. KEEP: terms without selected tags will be ignored. IGNORE: terms with selected tags will be ignored.
| Default | KEEP |
|---|
specifies a list of tags. Unsupported tags trigger a warning message.
specifies whether to include terms that have a keep status of N in the TERMS output table.
| Default | FALSE |
|---|
specifies the input CAS table that contains the terms that are to be kept for the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the startList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether stemming is to occur in parsing. When set to True, terms are evaluated to see if they belong to a common parent form and the information is added to the offset table.
| Default | TRUE |
|---|
specifies the input CAS table that contains the terms to exclude from the analysis. If specified, the table must have the Term (varchar) variable. A Role (varchar) variable is optional.
For more information about specifying the stopList parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies the input CAS table that contains user-defined synonyms to be used in the analysis. If specified, the table must have the following variables (all varchar): Term, Parent. Termrole and parentrole variables are optional.
For more information about specifying the synonyms parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).
specifies whether part-of-speech tagging is used in parsing.
| Default | TRUE |
|---|
specifies the numeric or character variable that contains a category level on the documents table. This parameter is optional unless you plan to use Mutual Information as the term weight in accumulation.
specifies the output CAS table to contain the summary information about the terms in the document collection. The maximum output length of a tokenized term is 256 bytes. So tokens consisting of an extremely long sequence of letters, numbers and symbols will be truncated to less than or equal to that maximum value. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
For more information about specifying the terms parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how many standard deviations above the mean to set the term cutoff. This parameter requires a SAS Visual Text Analytics license.
| Default | 1 |
|---|---|
| Range | 0–10 |
specifies the name of the output CAS table to contain the term-by-topic sparse matrix information.
For more information about specifying the termTopics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies how terms are weighted. Valid values are Entropy, None and MI (Mutual Information). MI requires a target variable in the offset table, which is generated by the tpParse action.
| Alias | termWgt |
|---|---|
| Default | ENTROPY |
specifies the character variable in the documents table that contains the text to be processed.
| Default | "text" |
|---|
specifies the stopping threshold for the iterative factorization algorithm. If 0 is specified the default value is used.
| Default | 1E-05 |
|---|---|
| Range | 0–1 |
Specifies to include topic membership decisions and document cutoffs in the output tables. This parameter requires a SAS Visual Text Analytics license or a SAS Visual Data Mining and Machine Learning license.
| Default | FALSE |
|---|
specifies the output CAS table to contain the topics that are discovered.
For more information about specifying the topics parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the U matrix, which contains the left singular vectors. The matrix U is number of terms by k+1.
For more information about specifying the u parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the transpose of the matrix containing the right singular vectors. The matrix V is number of documents by k+1.
For more information about specifying the v parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).
specifies the table to contain the projections of the terms. If k dimensions of the SVD are found and the input data set contains n terms, this table will have n rows and k+1 columns.
For more information about specifying the wordPro parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).