The HPTMINE Procedure
The OUTPOS= Data Set
The data set that is specified in the OUTPOS= option in the PARSE statement contains the position information about the child terms’ occurrences in the document collection. Table 5.12 shows the fields in this data set.
Table 5.12: Fields in the OUTPOS= Data Set
Field | Description |
---|---|
TERM | A lowercase version of the term |
ROLE | The term’s part of speech (this variable is empty if the NOTAGGING option is specified in the PARSE statement) |
PARENT | A lowercase version of the parent term |
_START_ | The starting position of the term’s occurrence (the first position is 0) |
_END_ | The ending position of the term’s occurrence |
SENTENCE | The sentence where the occurrence appears |
PARAGRAPH | The paragraph where the occurrence appears (this has not been implemented in the current release, and the value is always set to 0) |
DOCUMENT | The ID of the document where the occurrence appears |
If you exclude terms by specifying the IGNORE option in the SELECT statement, then those terms are excluded from the OUTPOS= data set. No synonym lists, start lists, or stop lists are used when generating the OUTPOS= data set.
Copyright © SAS Institute Inc. All rights reserved.