Description Usage Arguments Details Author(s) Examples
The function cpt
operates on sets of factors. Specifically,
it computes the conditional probability distribution of one of the factors
given other factors, and stores the result in a multidimensional array
.
inputCPT()
is a utility function aimed at facilitating the process of
populating small conditional probability distributions, i.e., those for which
the response variable doesn't have too many levels, there are relatively few
independent variables, and the independent variables also don't have too many
levels.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | cpt(x, data, wt, ...)
## S3 method for class 'formula'
cpt(formula, data, wt, ...)
## S3 method for class 'list'
cpt(x, data, wt, ...)
inputCPT(x, factorLevels, reduce = TRUE, ...)
## S3 method for class 'formula'
inputCPT(formula, factorLevels, reduce = TRUE, ...)
## S3 method for class 'list'
inputCPT(x, factorLevels, reduce = TRUE, ...)
|
x |
a list containing the names of the variables used to compute the conditional probability table. See details. |
data |
a data frame containing all the factors represented by the |
wt |
(optional) a numeric vector of observation weights. |
... |
Additional arguments to be passed to other methods. |
formula |
a formula specifying the relationship between the dependent and independent variables. |
factorLevels |
(optional) a named list with the following structure:
Variable names for the factors specified in |
reduce |
set to |
If a formula
object is entered for the vars
parameter, the
formula must have the following structure: response ~ var1 + var2 + etc..
The other option is to pass a named list
containing two elements y
and x
. Element y
is a character string containing the name of the
factor variable in data
to be used as the dependent variable, and
element x
is a character vector containing the name(s) of the factor
variable(s) to be used as independent (or conditioning) variables.
In inputCPT()
, when the parameter reduce
is set to FALSE
,
any non-negative number (e.g., cell counts) is accepted as input. Conditional
probabilities are then calculated via a normalization procedure. However, when
reduce
is set to TRUE
, a) only probabilities in [0,1] are accepted
and b) all inputted probabilities for each specific combination of independent
variable values must not sum to a value greater than 1 (or the calculated
probability for the first level of the dependent variable would be negative).
The cpt()
function with a weight vector passed to parameter wt
works analogously to inputCPT(reduce = FALSE)
, i.e., it accepts any
non-negative vector, and computes the conditional probability array by
normalizing sums of weights.
Jarrod Dalton and Benjamin Nutter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # a very imbalanced dice example
n <- 50000
data <- data.frame(
di1 = as.factor(1:6 %*% rmultinom(n,1,prob=c(.4,.3,.15,.10,.03,.02))),
di2 = as.factor(1:6 %*% rmultinom(n,1,prob=rev(c(.4,.3,.15,.10,.03,.02)))),
di3 = as.factor(1:6 %*% rmultinom(n,1,prob=c(.15,.10,.02,.3,.4,.03)))
)
cpt1 <- cpt(di3 ~ di1 + di2, data)
cpt1[di1 = 1, di2 = 4, ] # Pr(di3 | di1 = 1, di2 = 4)
cpt1["1","4",]
cpt1[1,4,]
plyr::aaply(cpt1, c(1,2), sum) # card(di1)*card(di2) matrix of ones
l <- list(y = "di3", x = c("di1","di2"))
all(cpt(l, data) == cpt1)
## Not run:
inputCPT(wetGrass ~ rain + morning)
inputCPT(wetGrass ~ rain + morning,
factorLevels <- list(wetGrass = c("dry","moist","VeryWet"),
rain = c("nope","yep"),
morning = c("NO","YES")),
reduce = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.