cpt: Compute a conditional probability table for a factor given...

Description Usage Arguments Details Author(s) Examples

View source: R/cpt.R

Description

The function cpt operates on sets of factors. Specifically, it computes the conditional probability distribution of one of the factors given other factors, and stores the result in a multidimensional array.

inputCPT() is a utility function aimed at facilitating the process of populating small conditional probability distributions, i.e., those for which the response variable doesn't have too many levels, there are relatively few independent variables, and the independent variables also don't have too many levels.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cpt(x, data, wt, ...)

## S3 method for class 'formula'
cpt(formula, data, wt, ...)

## S3 method for class 'list'
cpt(x, data, wt, ...)

inputCPT(x, factorLevels, reduce = TRUE, ...)

## S3 method for class 'formula'
inputCPT(formula, factorLevels, reduce = TRUE, ...)

## S3 method for class 'list'
inputCPT(x, factorLevels, reduce = TRUE, ...)

Arguments

x

a list containing the names of the variables used to compute the conditional probability table. See details.

data

a data frame containing all the factors represented by the formula parameter.

wt

(optional) a numeric vector of observation weights.

...

Additional arguments to be passed to other methods.

formula

a formula specifying the relationship between the dependent and independent variables.

factorLevels

(optional) a named list with the following structure: Variable names for the factors specified in vars comprise the names of the list elements, and each list element is a character vector containing the levels of the respective factor. See examples.

reduce

set to TRUE if inputCPT() is to compute probabilities for the first level of the dependent variable as the complement of the inputted probabilities corresponding to the other levels of the dependent variable. For example, reduce = TRUE with a binary dependent variable y (say, with levels 'no' and 'yes') will ask for the probabilities of 'yes' at each combination of the independent variables, and compute the probability of 'no' as their respective complements. See details.

Details

If a formula object is entered for the vars parameter, the formula must have the following structure: response ~ var1 + var2 + etc.. The other option is to pass a named list containing two elements y and x. Element y is a character string containing the name of the factor variable in data to be used as the dependent variable, and element x is a character vector containing the name(s) of the factor variable(s) to be used as independent (or conditioning) variables.

In inputCPT(), when the parameter reduce is set to FALSE, any non-negative number (e.g., cell counts) is accepted as input. Conditional probabilities are then calculated via a normalization procedure. However, when reduce is set to TRUE, a) only probabilities in [0,1] are accepted and b) all inputted probabilities for each specific combination of independent variable values must not sum to a value greater than 1 (or the calculated probability for the first level of the dependent variable would be negative).

The cpt() function with a weight vector passed to parameter wt works analogously to inputCPT(reduce = FALSE), i.e., it accepts any non-negative vector, and computes the conditional probability array by normalizing sums of weights.

Author(s)

Jarrod Dalton and Benjamin Nutter

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# a very imbalanced dice example

n <- 50000
data <- data.frame(
  di1 = as.factor(1:6 %*% rmultinom(n,1,prob=c(.4,.3,.15,.10,.03,.02))),
  di2 = as.factor(1:6 %*% rmultinom(n,1,prob=rev(c(.4,.3,.15,.10,.03,.02)))),
  di3 = as.factor(1:6 %*% rmultinom(n,1,prob=c(.15,.10,.02,.3,.4,.03)))
)

cpt1 <- cpt(di3 ~ di1 + di2, data)
cpt1[di1 = 1, di2 = 4, ]  # Pr(di3 | di1 = 1, di2 = 4)
cpt1["1","4",]
cpt1[1,4,]

plyr::aaply(cpt1, c(1,2), sum) # card(di1)*card(di2) matrix of ones

l <- list(y = "di3", x = c("di1","di2"))
all(cpt(l, data) == cpt1)

## Not run: 
inputCPT(wetGrass ~ rain + morning) 

inputCPT(wetGrass ~ rain + morning,
         factorLevels <- list(wetGrass = c("dry","moist","VeryWet"),
                              rain     = c("nope","yep"),
                              morning  = c("NO","YES")),
         reduce = FALSE)

## End(Not run)

HydeNet documentation built on July 8, 2020, 5:15 p.m.