The HPTMINE Procedure

The OUTPOS= Data Set

The data set that is specified in the OUTPOS= option in the PARSE statement contains the position information about the child terms’ occurrences in the document collection. Table 5.12 shows the fields in this data set.

Table 5.12: Fields in the OUTPOS= Data Set

Field

Description

TERM

A lowercase version of the term

ROLE

The term’s part of speech (this variable is empty if the NOTAGGING option is specified in the PARSE statement)

PARENT

A lowercase version of the parent term

_START_

The starting position of the term’s occurrence (the first position is 0)

_END_

The ending position of the term’s occurrence

SENTENCE

The sentence where the occurrence appears

PARAGRAPH

The paragraph where the occurrence appears (this has not been implemented in the current release, and the value is always set to 0)

DOCUMENT

The ID of the document where the occurrence appears


If you exclude terms by specifying the IGNORE option in the SELECT statement, then those terms are excluded from the OUTPOS= data set. No synonym lists, start lists, or stop lists are used when generating the OUTPOS= data set.