Bayesian Net Classifier Action Set

Uses Bayesian network models to classify the target variable

bnet Action

Bayesian Net Classifier Action.

CASL Syntax

bayesianNetClassifier.bnet <result=results> <status=rc> /
alpha={double-1 <, double-2, ...>},
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
bestModel=TRUE | FALSE,
code={
casOut={
caslib="string"
compress=TRUE | FALSE
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
onDemand=TRUE | FALSE
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
comment=TRUE | FALSE,
fmtWdth=integer,
indentSize=integer,
intoCutPt=double,
iProb=TRUE | FALSE,
labelId=integer,
lineSize=integer,
noTrim=TRUE | FALSE,
pCatAll=TRUE | FALSE,
tabForm=TRUE | FALSE
},
codeGroup="string",
display={
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
},
freq="string",
id={"variable-name-1" <, "variable-name-2", ...>},
inNetwork={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
inputs={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
maxParents=integer,
miAlpha=double,
nominals={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
numBin=integer,
outNetwork={
caslib="string",
compress=TRUE | FALSE,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
output={
required parameter casOut={
caslib="string"
compress=TRUE | FALSE
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>},
role="string"
},
outputTables={
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=TRUE | FALSE,
replace=TRUE | FALSE
},
parenting={"BESTONE", "BESTSET"},
partByFrac={
seed=integer,
test=double,
validate=double
},
partByVar={
required parameter name="variable-name",
test="string",
train="string",
validate="string"
},
preScreening={"ONE", "ZERO"},
printtarget=TRUE | FALSE,
resident=TRUE | FALSE,
saveState={
caslib="string",
label="string",
lifetime=64-bit-integer,
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
},
structures={"GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"},
required parameter table={
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=TRUE | FALSE,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
target="string",
varSelect={"ONE", "THREE", "TWO", "ZERO"}
;
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 inNetwork

specifies the name of the input table that specifies the links to be included in and excluded from the network.

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 code

casOut

 outNetwork

specifies the name of the output table for the network structure and the probability distributions.

 output

required parametercasOut

creates an output table to contain the predicted target values of the input table.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 saveState

specifies the table in which to save the model for future scoring.

Parameter Descriptions

alpha={double-1 <, double-2, ...>}

specifies the significance level for independence tests by using chi-square or G-square statistics. If you want to choose the best model among several, you can specify up to five numbers, separated by spaces. If you specify multiple numbers but you do not specify the value True for the bestModel parameter, the action uses the first number and ignores the remaining numbers.

Default 0.05
Requirement The specified values must be unique.

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias attribute

bestModel=TRUE | FALSE

when set to True, selects the best model.

Default FALSE

code={aircodegen}

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

codeGroup="string"

Code Group

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

freq="string"

specifies the frequency variable.

id={"variable-name-1" <, "variable-name-2", ...>}

specifies the variables to copy to the generated table.

indepTest="ALL" | "CHIGSQUARE" | "CHISQUARE" | "GSQUARE" | "MI"

specifies the method for independence tests.

Default CHIGSQUARE
ALL

uses the chi-square statistic, the G-square statistic, and the normalized mutual information for independence tests. A variable is independent of the target if the p-values of both the chi-square and the G-square statistics are greater than the value of the alpha parameter and the normalized mutual information is less than the value of the miAlpha parameter.

CHIGSQUARE

uses both the chi-square and G-square statistics for independence tests. A variable is independent of the target if the p-values of both the chi-square and G-square statistics are greater than the value of the alpha parameter.

CHISQUARE

uses the chi-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

GSQUARE

uses the G-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

MI

uses the normalized mutual information for independence tests. A variable is independent of the target if the normalized mutual information is less than the value of the miAlpha parameter.

inNetwork={castable}

specifies the name of the input table that specifies the links to be included in and excluded from the network.

For more information about specifying the inNetwork parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias inNet

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias input

maxParents=integer

specifies the maximum number of parents for each node in the network. If you specify the value True for the bestModel parameter, the action tries all values from 1 to the value of this parameter to find the best setting; otherwise, the specified value is used as the maximum number of parents.

Default 5
Range 1–16

miAlpha=double

specifies the significance level for independence tests that use mutual information.

Default 0.05
Range 0–1

missingInt="IGNORE" | "IMPUTE"

specifies how to handle missing values for interval variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the interval variables.

IMPUTE

replaces the missing values in any interval variable by the mean of the variable.

missingNom="IGNORE" | "IMPUTE" | "LEVEL"

specifies how to handle missing values for nominal variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the nominal variables.

IMPUTE

replaces the missing values in any nominal variable by the mode of the variable.

LEVEL

treats the missing values in any nominal variable as a separate level of the variable.

nominals={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

numBin=integer

specifies the binning number for interval variables.

Default 5
Range 2–1024

outNetwork={casouttable}

specifies the name of the output table for the network structure and the probability distributions.

For more information about specifying the outNetwork parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

Alias outNet

output={BnetOutputStatement}

creates an output table to contain the predicted target values of the input table.

The BnetOutputStatement value can be one or more of the following:

* casOut={casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

role="string"

renames the generated column _ROLE_ in the output data table to the specified role name.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

parenting={"BESTONE", "BESTSET"}

specifies the structure learning methods. If you want the action to choose between the two methods, you can specify both BESTONE and BESTSET and also specify the value True for the bestModel parameter. If you specify both methods but you do not specify the value True for the bestModel parameter, the action uses the first specified method and ignores the other.

Default BESTSET
BESTONE

uses a greedy approach to determine the parents of each node; that is, for each node, the best candidate is added as a parent of the node in each iteration.

BESTSET

determines the best set of variables among possible candidate sets as the parents of each node; that is, instead of adding one variable in an iteration, the action tests multiple sets of variables together and chooses the best set as the parents of the node.

partByFrac={partByFracStatement}

The partByFracStatement value can be one or more of the following:

seed=integer

specifies the seed to use in the random number generator that is used for partitioning the data.

Default 0
test=double

randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Range 0–1
validate=double

randomly assigns the specified proportion of observations in the input table to the validation role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Alias valid
Range 0–1

partByVar={partByVarStatement}

Long form partByVar={name="variable-name"}
Shortcut form partByVar="variable-name"

The partByVarStatement value can be one or more of the following:

* name="variable-name"

names the variable in the input table whose values are used to assign roles to each observation.

test="string"

specifies the formatted value of the variable that is used to assign observations to the testing role.

train="string"

specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.

validate="string"

specifies the formatted value of the variable that is used to assign observations to the validation role.

Alias valid

preScreening={"ONE", "ZERO"}

specifies the initial screening for the input variables. If you want the action to choose the best model with or without prescreening, you can specify {"ZERO","ONE"} or {"ONE","ZERO"} for the parameter and also specify the value True for the bestModel parameter. If you specify both ONE and ZERO but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the other.

Default ONE
Requirement The specified values must be unique.
ONE

uses only the input variables that are dependent on the target.

ZERO

uses all the input variables.

printtarget=TRUE | FALSE

when set to True, generates names for the predicted target variable and the predicted probability variables.

Default FALSE

resident=TRUE | FALSE

Default TRUE

saveState={casouttable}

specifies the table in which to save the model for future scoring.

Long form saveState={name="table-name"}
Shortcut form saveState="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default FALSE
replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default FALSE
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

structures={"GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"}

specifies the network structure types. Together with the maxParents parameter, this parameter determines which network structure the action learns from the training data. If you want the action to choose the best structure among several structures, you can specify multiple values in any combination, separated by spaces, and also specify the value True for the bestModel parameter. If you specify multiple structures but you do not specify the value True for the bestModel parameter, the first value that you specify is used and the rest are ignored.

Alias structure
Default PC
Requirement The specified values must be unique.
GENERAL

learns a general Bayesian network. If the value "GENERAL" is specified for the structures parameter, the action learns a general Bayesian network. A general Bayesian network removes the requirement of a direct connection between the target variable and the input variables that are selected to be in the network.

Alias GN
MB

learns the Markov blanket of the target variable. The Markov blanket includes the parents, the children, and other parents of the children. After learning the Markov blanket, the action further determines the parents of the target, the links from the parents to the children, and the links among the children. When you specify the value "MB" for the structures parameter, the action learns the Markov blanket regardless of the values of the preScreening and the varSelect parameters.

NAIVE

learns a naive Bayesian network structure (that is, the target has a direct link to each input variable). If you specify the value 1 for maxParents, the structure being trained is a naive Bayesian network. If you specify a value greater than 1 for maxParents, the structure is a Bayesian network-augmented naive Bayesian network.

PC

learns the parent-child Bayesian network structure. PC structure differs from NAIVE structure in that some input variables could be learned as the parents of the target variable. In addition, links from the parents to the children and among the children are also possible in PC.

TAN

learns the tree-augmented naive Bayesian network structure. The TAN structure includes a direct link from the target to each input variable plus a tree structure among the input variables.

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

target="string"

specifies the target variable to use for analysis.

varSelect={"ONE", "THREE", "TWO", "ZERO"}

specifies how to select input variables beyond prescreening. If you specify the value "ONE", "TWO", or "THREE", the action automatically tests each input variable for unconditional independence of the target regardless of the value of the preScreening parameter. If no variables are left at a particular variable selection level, the action rolls back to the previous level. For example, if you specify "THREE" and there are no variables in the Markov blanket of the target, the action uses the variables from the previous level "TWO". If you want to choose the best model among different levels of variable selection, you can specify any combination of values for this parameter and also specify the value True for the bestModel parameter. If you specify multiple values for the varSelect parameter but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the remaining values.

Default ONE
Requirement The specified values must be unique.
ONE

tests each input variable for conditional independence of the target variable given any other input variable. This type of selection rejects all variables that become conditionally independent of the target variable given any other input variable.

THREE

determines the Markov blanket of the target variable and uses only the variables in the Markov blanket.

TWO

tests each input variable further for conditional independence of the target variable given any subset of other input variables. This type of selection rejects all variables that are conditionally independent of the target given any subset of other input variables.

ZERO

uses all input variables that remain after the initial screening is performed as specified in the preScreening parameter.

bnet Action

Bayesian Net Classifier Action.

Lua Syntax

results, info = s:bayesianNetClassifier_bnet{
alpha={double-1 <, double-2, ...>},
attributes={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
bestModel=true | false,
code={
casOut={
caslib="string"
compress=true | false
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
onDemand=true | false
promote=true | false
replace=true | false
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
comment=true | false,
fmtWdth=integer,
indentSize=integer,
intoCutPt=double,
iProb=true | false,
labelId=integer,
lineSize=integer,
noTrim=true | false,
pCatAll=true | false,
tabForm=true | false
},
codeGroup="string",
display={
caseSensitive=true | false,
exclude=true | false,
excludeAll=true | false,
keyIsPath=true | false,
names={"string-1" <, "string-2", ...>},
pathType="LABEL" | "NAME",
traceNames=true | false
},
freq="string",
id={"variable-name-1" <, "variable-name-2", ...>},
inNetwork={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
inputs={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
maxParents=integer,
miAlpha=double,
nominals={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
numBin=integer,
outNetwork={
caslib="string",
compress=true | false,
indexVars={"variable-name-1" <, "variable-name-2", ...>},
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=true | false,
replace=true | false,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where={"string-1" <, "string-2", ...>}
},
output={
required parameter casOut={
caslib="string"
compress=true | false
indexVars={"variable-name-1" <, "variable-name-2", ...>}
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=true | false
replace=true | false
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where={"string-1" <, "string-2", ...>}
},
copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>},
role="string"
},
outputTables={
groupByVarsRaw=true | false,
includeAll=true | false,
names={"string-1" <, "string-2", ...>} | {key-1={casouttable-1} <, key-2={casouttable-2}, ...>},
repeated=true | false,
replace=true | false
},
parenting={"BESTONE", "BESTSET"},
partByFrac={
seed=integer,
test=double,
validate=double
},
partByVar={
required parameter name="variable-name",
test="string",
train="string",
validate="string"
},
preScreening={"ONE", "ZERO"},
printtarget=true | false,
resident=true | false,
saveState={
caslib="string",
label="string",
lifetime=64-bit-integer,
name="table-name",
promote=true | false,
replace=true | false,
},
structures={"GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"},
required parameter table={
caslib="string",
computedOnDemand=true | false,
computedVars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
computedVarsProgram="string",
dataSourceOptions={key-1=any-list-or-data-type-1 <, key-2=any-list-or-data-type-2, ...>},
groupBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter name="table-name",
orderBy={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
singlePass=true | false,
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}},
where="where-expression",
whereTable={
casLib="string"
dataSourceOptions={adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
importOptions={fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter name="table-name"
vars={{
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
}, {...}}
where="where-expression"
}
},
target="string",
varSelect={"ONE", "THREE", "TWO", "ZERO"}
}
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 inNetwork

specifies the name of the input table that specifies the links to be included in and excluded from the network.

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 code

casOut

 outNetwork

specifies the name of the output table for the network structure and the probability distributions.

 output

required parametercasOut

creates an output table to contain the predicted target values of the input table.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 saveState

specifies the table in which to save the model for future scoring.

Parameter Descriptions

alpha={double-1 <, double-2, ...>}

specifies the significance level for independence tests by using chi-square or G-square statistics. If you want to choose the best model among several, you can specify up to five numbers, separated by spaces. If you specify multiple numbers but you do not specify the value True for the bestModel parameter, the action uses the first number and ignores the remaining numbers.

Default 0.05
Requirement The specified values must be unique.

attributes={{casinvardesc-1} <, {casinvardesc-2}, ...>}

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias attribute

bestModel=true | false

when set to True, selects the best model.

Default false

code={aircodegen}

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

codeGroup="string"

Code Group

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

freq="string"

specifies the frequency variable.

id={"variable-name-1" <, "variable-name-2", ...>}

specifies the variables to copy to the generated table.

indepTest="ALL" | "CHIGSQUARE" | "CHISQUARE" | "GSQUARE" | "MI"

specifies the method for independence tests.

Default CHIGSQUARE
ALL

uses the chi-square statistic, the G-square statistic, and the normalized mutual information for independence tests. A variable is independent of the target if the p-values of both the chi-square and the G-square statistics are greater than the value of the alpha parameter and the normalized mutual information is less than the value of the miAlpha parameter.

CHIGSQUARE

uses both the chi-square and G-square statistics for independence tests. A variable is independent of the target if the p-values of both the chi-square and G-square statistics are greater than the value of the alpha parameter.

CHISQUARE

uses the chi-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

GSQUARE

uses the G-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

MI

uses the normalized mutual information for independence tests. A variable is independent of the target if the normalized mutual information is less than the value of the miAlpha parameter.

inNetwork={castable}

specifies the name of the input table that specifies the links to be included in and excluded from the network.

For more information about specifying the inNetwork parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias inNet

inputs={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias input

maxParents=integer

specifies the maximum number of parents for each node in the network. If you specify the value True for the bestModel parameter, the action tries all values from 1 to the value of this parameter to find the best setting; otherwise, the specified value is used as the maximum number of parents.

Default 5
Range 1–16

miAlpha=double

specifies the significance level for independence tests that use mutual information.

Default 0.05
Range 0–1

missingInt="IGNORE" | "IMPUTE"

specifies how to handle missing values for interval variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the interval variables.

IMPUTE

replaces the missing values in any interval variable by the mean of the variable.

missingNom="IGNORE" | "IMPUTE" | "LEVEL"

specifies how to handle missing values for nominal variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the nominal variables.

IMPUTE

replaces the missing values in any nominal variable by the mode of the variable.

LEVEL

treats the missing values in any nominal variable as a separate level of the variable.

nominals={{casinvardesc-1} <, {casinvardesc-2}, ...>}

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

numBin=integer

specifies the binning number for interval variables.

Default 5
Range 2–1024

outNetwork={casouttable}

specifies the name of the output table for the network structure and the probability distributions.

For more information about specifying the outNetwork parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

Alias outNet

output={BnetOutputStatement}

creates an output table to contain the predicted target values of the input table.

The BnetOutputStatement value can be one or more of the following:

* casOut={casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | {"variable-name-1" <, "variable-name-2", ...>}

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

role="string"

renames the generated column _ROLE_ in the output data table to the specified role name.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

parenting={"BESTONE", "BESTSET"}

specifies the structure learning methods. If you want the action to choose between the two methods, you can specify both BESTONE and BESTSET and also specify the value True for the bestModel parameter. If you specify both methods but you do not specify the value True for the bestModel parameter, the action uses the first specified method and ignores the other.

Default BESTSET
BESTONE

uses a greedy approach to determine the parents of each node; that is, for each node, the best candidate is added as a parent of the node in each iteration.

BESTSET

determines the best set of variables among possible candidate sets as the parents of each node; that is, instead of adding one variable in an iteration, the action tests multiple sets of variables together and chooses the best set as the parents of the node.

partByFrac={partByFracStatement}

The partByFracStatement value can be one or more of the following:

seed=integer

specifies the seed to use in the random number generator that is used for partitioning the data.

Default 0
test=double

randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Range 0–1
validate=double

randomly assigns the specified proportion of observations in the input table to the validation role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Alias valid
Range 0–1

partByVar={partByVarStatement}

Long form partByVar={name="variable-name"}
Shortcut form partByVar="variable-name"

The partByVarStatement value can be one or more of the following:

* name="variable-name"

names the variable in the input table whose values are used to assign roles to each observation.

test="string"

specifies the formatted value of the variable that is used to assign observations to the testing role.

train="string"

specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.

validate="string"

specifies the formatted value of the variable that is used to assign observations to the validation role.

Alias valid

preScreening={"ONE", "ZERO"}

specifies the initial screening for the input variables. If you want the action to choose the best model with or without prescreening, you can specify {"ZERO","ONE"} or {"ONE","ZERO"} for the parameter and also specify the value True for the bestModel parameter. If you specify both ONE and ZERO but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the other.

Default ONE
Requirement The specified values must be unique.
ONE

uses only the input variables that are dependent on the target.

ZERO

uses all the input variables.

printtarget=true | false

when set to True, generates names for the predicted target variable and the predicted probability variables.

Default false

resident=true | false

Default true

saveState={casouttable}

specifies the table in which to save the model for future scoring.

Long form saveState={name="table-name"}
Shortcut form saveState="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=true | false

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default false
replace=true | false

when set to True, overwrites an existing table that has the same name.

Default false
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

structures={"GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"}

specifies the network structure types. Together with the maxParents parameter, this parameter determines which network structure the action learns from the training data. If you want the action to choose the best structure among several structures, you can specify multiple values in any combination, separated by spaces, and also specify the value True for the bestModel parameter. If you specify multiple structures but you do not specify the value True for the bestModel parameter, the first value that you specify is used and the rest are ignored.

Alias structure
Default PC
Requirement The specified values must be unique.
GENERAL

learns a general Bayesian network. If the value "GENERAL" is specified for the structures parameter, the action learns a general Bayesian network. A general Bayesian network removes the requirement of a direct connection between the target variable and the input variables that are selected to be in the network.

Alias GN
MB

learns the Markov blanket of the target variable. The Markov blanket includes the parents, the children, and other parents of the children. After learning the Markov blanket, the action further determines the parents of the target, the links from the parents to the children, and the links among the children. When you specify the value "MB" for the structures parameter, the action learns the Markov blanket regardless of the values of the preScreening and the varSelect parameters.

NAIVE

learns a naive Bayesian network structure (that is, the target has a direct link to each input variable). If you specify the value 1 for maxParents, the structure being trained is a naive Bayesian network. If you specify a value greater than 1 for maxParents, the structure is a Bayesian network-augmented naive Bayesian network.

PC

learns the parent-child Bayesian network structure. PC structure differs from NAIVE structure in that some input variables could be learned as the parents of the target variable. In addition, links from the parents to the children and among the children are also possible in PC.

TAN

learns the tree-augmented naive Bayesian network structure. The TAN structure includes a direct link from the target to each input variable plus a tree structure among the input variables.

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

target="string"

specifies the target variable to use for analysis.

varSelect={"ONE", "THREE", "TWO", "ZERO"}

specifies how to select input variables beyond prescreening. If you specify the value "ONE", "TWO", or "THREE", the action automatically tests each input variable for unconditional independence of the target regardless of the value of the preScreening parameter. If no variables are left at a particular variable selection level, the action rolls back to the previous level. For example, if you specify "THREE" and there are no variables in the Markov blanket of the target, the action uses the variables from the previous level "TWO". If you want to choose the best model among different levels of variable selection, you can specify any combination of values for this parameter and also specify the value True for the bestModel parameter. If you specify multiple values for the varSelect parameter but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the remaining values.

Default ONE
Requirement The specified values must be unique.
ONE

tests each input variable for conditional independence of the target variable given any other input variable. This type of selection rejects all variables that become conditionally independent of the target variable given any other input variable.

THREE

determines the Markov blanket of the target variable and uses only the variables in the Markov blanket.

TWO

tests each input variable further for conditional independence of the target variable given any subset of other input variables. This type of selection rejects all variables that are conditionally independent of the target given any subset of other input variables.

ZERO

uses all input variables that remain after the initial screening is performed as specified in the preScreening parameter.

bnet Action

Bayesian Net Classifier Action.

Python Syntax

results=s.bayesianNetClassifier.bnet(
alpha=[double-1 <, double-2, ...>],
attributes=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
bestModel=True | False,
code={
"casOut":{
"caslib":"string"
"compress":True | False
"indexVars":["variable-name-1" <, "variable-name-2", ...>]
"label":"string"
"lifetime":64-bit-integer
"maxMemSize":64-bit-integer
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"
"name":"table-name"
"onDemand":True | False
"promote":True | False
"replace":True | False
"replication":integer
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"
"threadBlockSize":64-bit-integer
"timeStamp":"string"
"where":["string-1" <, "string-2", ...>]
},
"comment":True | False,
"fmtWdth":integer,
"indentSize":integer,
"intoCutPt":double,
"iProb":True | False,
"labelId":integer,
"lineSize":integer,
"noTrim":True | False,
"pCatAll":True | False,
"tabForm":True | False
},
codeGroup="string",
display={
"caseSensitive":True | False,
"exclude":True | False,
"excludeAll":True | False,
"keyIsPath":True | False,
"names":["string-1" <, "string-2", ...>],
"pathType":"LABEL" | "NAME",
"traceNames":True | False
},
freq="string",
id=["variable-name-1" <, "variable-name-2", ...>],
inNetwork={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
inputs=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
maxParents=integer,
miAlpha=double,
nominals=[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
numBin=integer,
outNetwork={
"caslib":"string",
"compress":True | False,
"indexVars":["variable-name-1" <, "variable-name-2", ...>],
"label":"string",
"lifetime":64-bit-integer,
"maxMemSize":64-bit-integer,
"memoryFormat":"DVR" | "INHERIT" | "STANDARD",
"name":"table-name",
"promote":True | False,
"replace":True | False,
"replication":integer,
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE",
"threadBlockSize":64-bit-integer,
"timeStamp":"string",
"where":["string-1" <, "string-2", ...>]
},
output={
required parameter "casOut":{
"caslib":"string"
"compress":True | False
"indexVars":["variable-name-1" <, "variable-name-2", ...>]
"label":"string"
"lifetime":64-bit-integer
"maxMemSize":64-bit-integer
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"
"name":"table-name"
"promote":True | False
"replace":True | False
"replication":integer
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"
"threadBlockSize":64-bit-integer
"timeStamp":"string"
"where":["string-1" <, "string-2", ...>]
},
"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>],
"role":"string"
},
outputTables={
"groupByVarsRaw":True | False,
"includeAll":True | False,
"names":["string-1" <, "string-2", ...>] | {"key-1":{casouttable-1} <, "key-2":{casouttable-2}, ...>},
"repeated":True | False,
"replace":True | False
},
parenting=["BESTONE", "BESTSET"],
partByFrac={
"seed":integer,
"test":double,
"validate":double
},
partByVar={
required parameter "name":"variable-name",
"test":"string",
"train":"string",
"validate":"string"
},
preScreening=["ONE", "ZERO"],
printtarget=True | False,
resident=True | False,
saveState={
"caslib":"string",
"label":"string",
"lifetime":64-bit-integer,
"name":"table-name",
"promote":True | False,
"replace":True | False,
},
structures=["GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"],
required parameter table={
"caslib":"string",
"computedOnDemand":True | False,
"computedVars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"computedVarsProgram":"string",
"dataSourceOptions":{"key-1":{any-list-or-data-type-1} <, "key-2":{any-list-or-data-type-2}, ...>},
"groupBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"groupByMode":"NOSORT" | "REDISTRIBUTE",
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters},
required parameter "name":"table-name",
"orderBy":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"singlePass":True | False,
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>],
"where":"where-expression",
"whereTable":{
"casLib":"string"
"dataSourceOptions":{adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters}
"importOptions":{"fileType":"ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters}
required parameter "name":"table-name"
"vars":[{
"format":"string",
"formattedLength":integer,
"label":"string",
required parameter "name":"variable-name",
"nfd":integer,
"nfl":integer
}<, {...}>]
"where":"where-expression"
}
},
target="string",
varSelect=["ONE", "THREE", "TWO", "ZERO"]
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 inNetwork

specifies the name of the input table that specifies the links to be included in and excluded from the network.

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 code

casOut

 outNetwork

specifies the name of the output table for the network structure and the probability distributions.

 output

required parametercasOut

creates an output table to contain the predicted target values of the input table.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 saveState

specifies the table in which to save the model for future scoring.

Parameter Descriptions

alpha=[double-1 <, double-2, ...>]

specifies the significance level for independence tests by using chi-square or G-square statistics. If you want to choose the best model among several, you can specify up to five numbers, separated by spaces. If you specify multiple numbers but you do not specify the value True for the bestModel parameter, the action uses the first number and ignores the remaining numbers.

Default 0.05
Requirement The specified values must be unique.

attributes=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias attribute

bestModel=True | False

when set to True, selects the best model.

Default False

code={aircodegen}

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

codeGroup="string"

Code Group

display={displayTables}

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

freq="string"

specifies the frequency variable.

id=["variable-name-1" <, "variable-name-2", ...>]

specifies the variables to copy to the generated table.

indepTest="ALL" | "CHIGSQUARE" | "CHISQUARE" | "GSQUARE" | "MI"

specifies the method for independence tests.

Default CHIGSQUARE
ALL

uses the chi-square statistic, the G-square statistic, and the normalized mutual information for independence tests. A variable is independent of the target if the p-values of both the chi-square and the G-square statistics are greater than the value of the alpha parameter and the normalized mutual information is less than the value of the miAlpha parameter.

CHIGSQUARE

uses both the chi-square and G-square statistics for independence tests. A variable is independent of the target if the p-values of both the chi-square and G-square statistics are greater than the value of the alpha parameter.

CHISQUARE

uses the chi-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

GSQUARE

uses the G-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

MI

uses the normalized mutual information for independence tests. A variable is independent of the target if the normalized mutual information is less than the value of the miAlpha parameter.

inNetwork={castable}

specifies the name of the input table that specifies the links to be included in and excluded from the network.

For more information about specifying the inNetwork parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias inNet

inputs=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias input

maxParents=integer

specifies the maximum number of parents for each node in the network. If you specify the value True for the bestModel parameter, the action tries all values from 1 to the value of this parameter to find the best setting; otherwise, the specified value is used as the maximum number of parents.

Default 5
Range 1–16

miAlpha=double

specifies the significance level for independence tests that use mutual information.

Default 0.05
Range 0–1

missingInt="IGNORE" | "IMPUTE"

specifies how to handle missing values for interval variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the interval variables.

IMPUTE

replaces the missing values in any interval variable by the mean of the variable.

missingNom="IGNORE" | "IMPUTE" | "LEVEL"

specifies how to handle missing values for nominal variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the nominal variables.

IMPUTE

replaces the missing values in any nominal variable by the mode of the variable.

LEVEL

treats the missing values in any nominal variable as a separate level of the variable.

nominals=[{casinvardesc-1} <, {casinvardesc-2}, ...>]

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

numBin=integer

specifies the binning number for interval variables.

Default 5
Range 2–1024

outNetwork={casouttable}

specifies the name of the output table for the network structure and the probability distributions.

For more information about specifying the outNetwork parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

Alias outNet

output={BnetOutputStatement}

creates an output table to contain the predicted target values of the input table.

The BnetOutputStatement value can be one or more of the following:

* "casOut":{casouttable}

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

"copyVars":"ALL" | "ALL_MODEL" | "ALL_NUMERIC" | ["variable-name-1" <, "variable-name-2", ...>]

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

"role":"string"

renames the generated column _ROLE_ in the output data table to the specified role name.

outputTables={outputTables}

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

parenting=["BESTONE", "BESTSET"]

specifies the structure learning methods. If you want the action to choose between the two methods, you can specify both BESTONE and BESTSET and also specify the value True for the bestModel parameter. If you specify both methods but you do not specify the value True for the bestModel parameter, the action uses the first specified method and ignores the other.

Default BESTSET
BESTONE

uses a greedy approach to determine the parents of each node; that is, for each node, the best candidate is added as a parent of the node in each iteration.

BESTSET

determines the best set of variables among possible candidate sets as the parents of each node; that is, instead of adding one variable in an iteration, the action tests multiple sets of variables together and chooses the best set as the parents of the node.

partByFrac={partByFracStatement}

The partByFracStatement value can be one or more of the following:

"seed":integer

specifies the seed to use in the random number generator that is used for partitioning the data.

Default 0
"test":double

randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Range 0–1
"validate":double

randomly assigns the specified proportion of observations in the input table to the validation role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Alias valid
Range 0–1

partByVar={partByVarStatement}

Long form partByVar={"name":"variable-name"}
Shortcut form partByVar="variable-name"

The partByVarStatement value can be one or more of the following:

* "name":"variable-name"

names the variable in the input table whose values are used to assign roles to each observation.

"test":"string"

specifies the formatted value of the variable that is used to assign observations to the testing role.

"train":"string"

specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.

"validate":"string"

specifies the formatted value of the variable that is used to assign observations to the validation role.

Alias valid

preScreening=["ONE", "ZERO"]

specifies the initial screening for the input variables. If you want the action to choose the best model with or without prescreening, you can specify {"ZERO","ONE"} or {"ONE","ZERO"} for the parameter and also specify the value True for the bestModel parameter. If you specify both ONE and ZERO but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the other.

Default ONE
Requirement The specified values must be unique.
ONE

uses only the input variables that are dependent on the target.

ZERO

uses all the input variables.

printtarget=True | False

when set to True, generates names for the predicted target variable and the predicted probability variables.

Default False

resident=True | False

Default True

saveState={casouttable}

specifies the table in which to save the model for future scoring.

Long form saveState={"name":"table-name"}
Shortcut form saveState="table-name"

The casouttable value can be one or more of the following:

"caslib":"string"

specifies the name of the caslib for the output table.

"label":"string"

specifies the descriptive label to associate with the table.

"lifetime":64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
"memoryFormat":"DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

"name":"table-name"

specifies the name for the output table.

"promote":True | False

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default False
"replace":True | False

when set to True, overwrites an existing table that has the same name.

Default False
"tableRedistUpPolicy":"DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

structures=["GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"]

specifies the network structure types. Together with the maxParents parameter, this parameter determines which network structure the action learns from the training data. If you want the action to choose the best structure among several structures, you can specify multiple values in any combination, separated by spaces, and also specify the value True for the bestModel parameter. If you specify multiple structures but you do not specify the value True for the bestModel parameter, the first value that you specify is used and the rest are ignored.

Alias structure
Default PC
Requirement The specified values must be unique.
GENERAL

learns a general Bayesian network. If the value "GENERAL" is specified for the structures parameter, the action learns a general Bayesian network. A general Bayesian network removes the requirement of a direct connection between the target variable and the input variables that are selected to be in the network.

Alias GN
MB

learns the Markov blanket of the target variable. The Markov blanket includes the parents, the children, and other parents of the children. After learning the Markov blanket, the action further determines the parents of the target, the links from the parents to the children, and the links among the children. When you specify the value "MB" for the structures parameter, the action learns the Markov blanket regardless of the values of the preScreening and the varSelect parameters.

NAIVE

learns a naive Bayesian network structure (that is, the target has a direct link to each input variable). If you specify the value 1 for maxParents, the structure being trained is a naive Bayesian network. If you specify a value greater than 1 for maxParents, the structure is a Bayesian network-augmented naive Bayesian network.

PC

learns the parent-child Bayesian network structure. PC structure differs from NAIVE structure in that some input variables could be learned as the parents of the target variable. In addition, links from the parents to the children and among the children are also possible in PC.

TAN

learns the tree-augmented naive Bayesian network structure. The TAN structure includes a direct link from the target to each input variable plus a tree structure among the input variables.

* table={castable}

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

target="string"

specifies the target variable to use for analysis.

varSelect=["ONE", "THREE", "TWO", "ZERO"]

specifies how to select input variables beyond prescreening. If you specify the value "ONE", "TWO", or "THREE", the action automatically tests each input variable for unconditional independence of the target regardless of the value of the preScreening parameter. If no variables are left at a particular variable selection level, the action rolls back to the previous level. For example, if you specify "THREE" and there are no variables in the Markov blanket of the target, the action uses the variables from the previous level "TWO". If you want to choose the best model among different levels of variable selection, you can specify any combination of values for this parameter and also specify the value True for the bestModel parameter. If you specify multiple values for the varSelect parameter but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the remaining values.

Default ONE
Requirement The specified values must be unique.
ONE

tests each input variable for conditional independence of the target variable given any other input variable. This type of selection rejects all variables that become conditionally independent of the target variable given any other input variable.

THREE

determines the Markov blanket of the target variable and uses only the variables in the Markov blanket.

TWO

tests each input variable further for conditional independence of the target variable given any subset of other input variables. This type of selection rejects all variables that are conditionally independent of the target given any subset of other input variables.

ZERO

uses all input variables that remain after the initial screening is performed as specified in the preScreening parameter.

bnet Action

Bayesian Net Classifier Action.

R Syntax

results <– cas.bayesianNetClassifier.bnet(s,
alpha=list(double-1 <, double-2, ...>),
attributes=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
bestModel=TRUE | FALSE,
code=list(
casOut=list(
caslib="string"
compress=TRUE | FALSE
indexVars=list("variable-name-1" <, "variable-name-2", ...>)
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
onDemand=TRUE | FALSE
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where=list("string-1" <, "string-2", ...>)
),
comment=TRUE | FALSE,
fmtWdth=integer,
indentSize=integer,
intoCutPt=double,
iProb=TRUE | FALSE,
labelId=integer,
lineSize=integer,
noTrim=TRUE | FALSE,
pCatAll=TRUE | FALSE,
tabForm=TRUE | FALSE
),
codeGroup="string",
display=list(
caseSensitive=TRUE | FALSE,
exclude=TRUE | FALSE,
excludeAll=TRUE | FALSE,
keyIsPath=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>),
pathType="LABEL" | "NAME",
traceNames=TRUE | FALSE
),
freq="string",
id=list("variable-name-1" <, "variable-name-2", ...>),
inNetwork=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
inputs=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
maxParents=integer,
miAlpha=double,
nominals=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
numBin=integer,
outNetwork=list(
caslib="string",
compress=TRUE | FALSE,
indexVars=list("variable-name-1" <, "variable-name-2", ...>),
label="string",
lifetime=64-bit-integer,
maxMemSize=64-bit-integer,
memoryFormat="DVR" | "INHERIT" | "STANDARD",
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
replication=integer,
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE",
threadBlockSize=64-bit-integer,
timeStamp="string",
where=list("string-1" <, "string-2", ...>)
),
output=list(
required parameter casOut=list(
caslib="string"
compress=TRUE | FALSE
indexVars=list("variable-name-1" <, "variable-name-2", ...>)
label="string"
lifetime=64-bit-integer
maxMemSize=64-bit-integer
memoryFormat="DVR" | "INHERIT" | "STANDARD"
name="table-name"
promote=TRUE | FALSE
replace=TRUE | FALSE
replication=integer
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"
threadBlockSize=64-bit-integer
timeStamp="string"
where=list("string-1" <, "string-2", ...>)
),
copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>),
role="string"
),
outputTables=list(
groupByVarsRaw=TRUE | FALSE,
includeAll=TRUE | FALSE,
names=list("string-1" <, "string-2", ...>) | list(key-1=list(casouttable-1) <, key-2=list(casouttable-2), ...>),
repeated=TRUE | FALSE,
replace=TRUE | FALSE
),
parenting=list("BESTONE", "BESTSET"),
partByFrac=list(
seed=integer,
test=double,
validate=double
),
partByVar=list(
required parameter name="variable-name",
test="string",
train="string",
validate="string"
),
preScreening=list("ONE", "ZERO"),
printtarget=TRUE | FALSE,
resident=TRUE | FALSE,
saveState=list(
caslib="string",
label="string",
lifetime=64-bit-integer,
name="table-name",
promote=TRUE | FALSE,
replace=TRUE | FALSE,
),
structures=list("GENERAL", "GN", "MB", "NAIVE", "PC", "TAN"),
required parameter table=list(
caslib="string",
computedOnDemand=TRUE | FALSE,
computedVars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
computedVarsProgram="string",
dataSourceOptions=list(key-1=list(any-list-or-data-type-1) <, key-2=list(any-list-or-data-type-2), ...>),
groupBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
groupByMode="NOSORT" | "REDISTRIBUTE",
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters),
required parameter name="table-name",
orderBy=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
singlePass=TRUE | FALSE,
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>),
where="where-expression",
whereTable=list(
casLib="string"
dataSourceOptions=list(adls_noreq-parameters | bigquery-parameters | cas_noreq-parameters | clouddex-parameters | db2-parameters | dnfs-parameters | esp-parameters | fedsvr-parameters | gcs_noreq-parameters | hadoop-parameters | hana-parameters | impala-parameters | informix-parameters | jdbc-parameters | mongodb-parameters | mysql-parameters | odbc-parameters | oracle-parameters | path-parameters | postgres-parameters | redshift-parameters | s3-parameters | sapiq-parameters | sforce-parameters | singlestore_standard-parameters | snowflake-parameters | spark-parameters | spde-parameters | sqlserver-parameters | ss_noreq-parameters | teradata-parameters | vertica-parameters | yellowbrick-parameters)
importOptions=list(fileType="ANY" | "AUDIO" | "AUTO" | "BASESAS" | "CSV" | "DELIMITED" | "DOCUMENT" | "DTA" | "ESP" | "EXCEL" | "FMT" | "HDAT" | "IMAGE" | "JMP" | "LASR" | "PARQUET" | "SOUND" | "SPSS" | "VIDEO" | "XLS", fileType-specific-parameters)
required parameter name="table-name"
vars=list( list(
format="string",
formattedLength=integer,
label="string",
required parameter name="variable-name",
nfd=integer,
nfl=integer
) <, list(...)>)
where="where-expression"
)
),
target="string",
varSelect=list("ONE", "THREE", "TWO", "ZERO")
)
indicates a required parameter

Summary: Input and Output Tables

If a row includes a subparameter, you can specify the name, caslib, and so on in the subparameter. Otherwise, you can specify the name, caslib, and so on in the parameter.

Parameters for Reading Input Tables

Parameter

Subparameter

Description

 inNetwork

specifies the name of the input table that specifies the links to be included in and excluded from the network.

required parametertable

specifies the settings for an input table.

Parameters for Creating Output Tables

Parameter

Subparameter

Description

 code

casOut

 outNetwork

specifies the name of the output table for the network structure and the probability distributions.

 output

required parametercasOut

creates an output table to contain the predicted target values of the input table.

 outputTables

names

lists the names of results tables to save as CAS tables on the server.

 saveState

specifies the table in which to save the model for future scoring.

Parameter Descriptions

alpha=list(double-1 <, double-2, ...>)

specifies the significance level for independence tests by using chi-square or G-square statistics. If you want to choose the best model among several, you can specify up to five numbers, separated by spaces. If you specify multiple numbers but you do not specify the value True for the bestModel parameter, the action uses the first number and ignores the remaining numbers.

Default 0.05
Requirement The specified values must be unique.

attributes=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

changes the attributes of variables used in this action. Currently, attributes specified on the inputs and nominals parameter are ignored.

For more information about specifying the attributes parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias attribute

bestModel=TRUE | FALSE

when set to True, selects the best model.

Default FALSE

code=list(aircodegen)

For more information about specifying the code parameter, see the common aircodegen parameter (Appendix A: Common Parameters).

codeGroup="string"

Code Group

display=list(displayTables)

specifies a list of results tables to send to the client for display.

For more information about specifying the display parameter, see the common displayTables parameter (Appendix A: Common Parameters).

freq="string"

specifies the frequency variable.

id=list("variable-name-1" <, "variable-name-2", ...>)

specifies the variables to copy to the generated table.

indepTest="ALL" | "CHIGSQUARE" | "CHISQUARE" | "GSQUARE" | "MI"

specifies the method for independence tests.

Default CHIGSQUARE
ALL

uses the chi-square statistic, the G-square statistic, and the normalized mutual information for independence tests. A variable is independent of the target if the p-values of both the chi-square and the G-square statistics are greater than the value of the alpha parameter and the normalized mutual information is less than the value of the miAlpha parameter.

CHIGSQUARE

uses both the chi-square and G-square statistics for independence tests. A variable is independent of the target if the p-values of both the chi-square and G-square statistics are greater than the value of the alpha parameter.

CHISQUARE

uses the chi-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

GSQUARE

uses the G-square statistic for independence tests. A variable is independent of the target if the p-value of the statistic is greater than the value of the alpha parameter.

MI

uses the normalized mutual information for independence tests. A variable is independent of the target if the normalized mutual information is less than the value of the miAlpha parameter.

inNetwork=list(castable)

specifies the name of the input table that specifies the links to be included in and excluded from the network.

For more information about specifying the inNetwork parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

Alias inNet

inputs=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies variables to use for analysis.

For more information about specifying the inputs parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias input

maxParents=integer

specifies the maximum number of parents for each node in the network. If you specify the value True for the bestModel parameter, the action tries all values from 1 to the value of this parameter to find the best setting; otherwise, the specified value is used as the maximum number of parents.

Default 5
Range 1–16

miAlpha=double

specifies the significance level for independence tests that use mutual information.

Default 0.05
Range 0–1

missingInt="IGNORE" | "IMPUTE"

specifies how to handle missing values for interval variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the interval variables.

IMPUTE

replaces the missing values in any interval variable by the mean of the variable.

missingNom="IGNORE" | "IMPUTE" | "LEVEL"

specifies how to handle missing values for nominal variables.

Default IGNORE
IGNORE

ignores the observations that have missing values in any of the nominal variables.

IMPUTE

replaces the missing values in any nominal variable by the mode of the variable.

LEVEL

treats the missing values in any nominal variable as a separate level of the variable.

nominals=list( list(casinvardesc-1) <, list(casinvardesc-2), ...>)

specifies nominal variables to use for analysis.

For more information about specifying the nominals parameter, see the common casinvardesc parameter (Appendix A: Common Parameters).

Alias nominal

numBin=integer

specifies the binning number for interval variables.

Default 5
Range 2–1024

outNetwork=list(casouttable)

specifies the name of the output table for the network structure and the probability distributions.

For more information about specifying the outNetwork parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

Alias outNet

output=list(BnetOutputStatement)

creates an output table to contain the predicted target values of the input table.

The BnetOutputStatement value can be one or more of the following:

* casOut=list(casouttable)

specifies the settings for an output table.

For more information about specifying the casOut parameter, see the common casouttable (Form 1) parameter (Appendix A: Common Parameters).

copyVars="ALL" | "ALL_MODEL" | "ALL_NUMERIC" | list("variable-name-1" <, "variable-name-2", ...>)

specifies a list of one or more variables to be copied from the input table to the output table. You can alternatively specify the value ALL, ALL_MODEL, or ALL_NUMERIC, which respectively copies all variables, all variables used in the modeling, or all numeric variables from the input table to the output table.

role="string"

renames the generated column _ROLE_ in the output data table to the specified role name.

outputTables=list(outputTables)

lists the names of results tables to save as CAS tables on the server.

For more information about specifying the outputTables parameter, see the common outputTables parameter (Appendix A: Common Parameters).

Alias displayOut

parenting=list("BESTONE", "BESTSET")

specifies the structure learning methods. If you want the action to choose between the two methods, you can specify both BESTONE and BESTSET and also specify the value True for the bestModel parameter. If you specify both methods but you do not specify the value True for the bestModel parameter, the action uses the first specified method and ignores the other.

Default BESTSET
BESTONE

uses a greedy approach to determine the parents of each node; that is, for each node, the best candidate is added as a parent of the node in each iteration.

BESTSET

determines the best set of variables among possible candidate sets as the parents of each node; that is, instead of adding one variable in an iteration, the action tests multiple sets of variables together and chooses the best set as the parents of the node.

partByFrac=list(partByFracStatement)

The partByFracStatement value can be one or more of the following:

seed=integer

specifies the seed to use in the random number generator that is used for partitioning the data.

Default 0
test=double

randomly assigns the specified proportion of observations in the input table to the testing role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Range 0–1
validate=double

randomly assigns the specified proportion of observations in the input table to the validation role. The sum of the fractions that are specified in the test and validate parameters must be less than 1.

Alias valid
Range 0–1

partByVar=list(partByVarStatement)

Long form partByVar=list(name="variable-name")
Shortcut form partByVar="variable-name"

The partByVarStatement value can be one or more of the following:

* name="variable-name"

names the variable in the input table whose values are used to assign roles to each observation.

test="string"

specifies the formatted value of the variable that is used to assign observations to the testing role.

train="string"

specifies the formatted value of the variable that is used to assign observations to the training role. If you do not specify the train parameter, then all observations whose roles are not determined by the test and validate parameters are assigned to training.

validate="string"

specifies the formatted value of the variable that is used to assign observations to the validation role.

Alias valid

preScreening=list("ONE", "ZERO")

specifies the initial screening for the input variables. If you want the action to choose the best model with or without prescreening, you can specify {"ZERO","ONE"} or {"ONE","ZERO"} for the parameter and also specify the value True for the bestModel parameter. If you specify both ONE and ZERO but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the other.

Default ONE
Requirement The specified values must be unique.
ONE

uses only the input variables that are dependent on the target.

ZERO

uses all the input variables.

printtarget=TRUE | FALSE

when set to True, generates names for the predicted target variable and the predicted probability variables.

Default FALSE

resident=TRUE | FALSE

Default TRUE

saveState=list(casouttable)

specifies the table in which to save the model for future scoring.

Long form saveState=list(name="table-name")
Shortcut form saveState="table-name"

The casouttable value can be one or more of the following:

caslib="string"

specifies the name of the caslib for the output table.

label="string"

specifies the descriptive label to associate with the table.

lifetime=64-bit-integer

specifies the number of seconds to keep the table in memory after it is last accessed. The table is dropped if it is not accessed for the specified number of seconds.

Default 0
Minimum value 0
memoryFormat="DVR" | "INHERIT" | "STANDARD"

specifies the memory format for the output table.

Default INHERIT
DVR

use the duplicate value reduction memory format. This memory format can reduce the memory consumption and file size when the input data contains duplicate values.

INHERIT

use the default memory format that is set for the server. By default, the server uses the standard memory format. If an administrator sets the CAS_DEFAULT_MEMORY_FORMAT environment variable to DVR, then the DVR memory format is set as the default for the server.

STANDARD

use the standard memory format.

name="table-name"

specifies the name for the output table.

promote=TRUE | FALSE

when set to True, adds the output table with a global scope. This enables other sessions to access the table, subject to access controls. The target caslib must also have a global scope.

Default FALSE
replace=TRUE | FALSE

when set to True, overwrites an existing table that has the same name.

Default FALSE
tableRedistUpPolicy="DEFER" | "NOREDIST" | "REBALANCE"

Specifies the Table Redistribution Policy when the number of worker pods increases on a running CAS server.

DEFER

Defer redistribution policy selection to higher-level entity.

NOREDIST

Do not redistribute table data when the number of worker pods changes on a running CAS server.

REBALANCE

Rebalance table data when the number of worker pods changes on a running CAS server.

structures=list("GENERAL", "GN", "MB", "NAIVE", "PC", "TAN")

specifies the network structure types. Together with the maxParents parameter, this parameter determines which network structure the action learns from the training data. If you want the action to choose the best structure among several structures, you can specify multiple values in any combination, separated by spaces, and also specify the value True for the bestModel parameter. If you specify multiple structures but you do not specify the value True for the bestModel parameter, the first value that you specify is used and the rest are ignored.

Alias structure
Default PC
Requirement The specified values must be unique.
GENERAL

learns a general Bayesian network. If the value "GENERAL" is specified for the structures parameter, the action learns a general Bayesian network. A general Bayesian network removes the requirement of a direct connection between the target variable and the input variables that are selected to be in the network.

Alias GN
MB

learns the Markov blanket of the target variable. The Markov blanket includes the parents, the children, and other parents of the children. After learning the Markov blanket, the action further determines the parents of the target, the links from the parents to the children, and the links among the children. When you specify the value "MB" for the structures parameter, the action learns the Markov blanket regardless of the values of the preScreening and the varSelect parameters.

NAIVE

learns a naive Bayesian network structure (that is, the target has a direct link to each input variable). If you specify the value 1 for maxParents, the structure being trained is a naive Bayesian network. If you specify a value greater than 1 for maxParents, the structure is a Bayesian network-augmented naive Bayesian network.

PC

learns the parent-child Bayesian network structure. PC structure differs from NAIVE structure in that some input variables could be learned as the parents of the target variable. In addition, links from the parents to the children and among the children are also possible in PC.

TAN

learns the tree-augmented naive Bayesian network structure. The TAN structure includes a direct link from the target to each input variable plus a tree structure among the input variables.

* table=list(castable)

specifies the settings for an input table.

For more information about specifying the table parameter, see the common castable (Form 1) parameter (Appendix A: Common Parameters).

target="string"

specifies the target variable to use for analysis.

varSelect=list("ONE", "THREE", "TWO", "ZERO")

specifies how to select input variables beyond prescreening. If you specify the value "ONE", "TWO", or "THREE", the action automatically tests each input variable for unconditional independence of the target regardless of the value of the preScreening parameter. If no variables are left at a particular variable selection level, the action rolls back to the previous level. For example, if you specify "THREE" and there are no variables in the Markov blanket of the target, the action uses the variables from the previous level "TWO". If you want to choose the best model among different levels of variable selection, you can specify any combination of values for this parameter and also specify the value True for the bestModel parameter. If you specify multiple values for the varSelect parameter but you do not specify the value True for the bestModel parameter, the action uses the first specified value and ignores the remaining values.

Default ONE
Requirement The specified values must be unique.
ONE

tests each input variable for conditional independence of the target variable given any other input variable. This type of selection rejects all variables that become conditionally independent of the target variable given any other input variable.

THREE

determines the Markov blanket of the target variable and uses only the variables in the Markov blanket.

TWO

tests each input variable further for conditional independence of the target variable given any subset of other input variables. This type of selection rejects all variables that are conditionally independent of the target given any subset of other input variables.

ZERO

uses all input variables that remain after the initial screening is performed as specified in the preScreening parameter.

Last updated: November 23, 2025