woeHist: Creates weights of evidence from a history matrix.
In ralmond/CPTtools: Tools for Creating Conditional Probability Tables

woeHist

R Documentation

Creates weights of evidence from a history matrix.

Description

Takes a matrix providing the probability distribution for the target variable at several time points and returns a weight of evidence for all time points except the first.

Usage

woeHist(hist, pos=1L, neg=NULL)

Arguments

`hist`	A matrix whose rows represent time points (after tests) and columns represent probabilities.
`pos`	An expression for selecting rows of the `cpf` which corresponds to the hypothesis.
`neg`	An expression for selecting the rows corresponding to the complement of the hypothesis. (The default value is `-pos` if `pos` is numeric; `!pos` if `pos` is logical, and `setdiff(rownames(cpf),pos` if `pos` is a character vector.

Details

Good (1971) defines the Weight Of Evidence (WOE) as:

100 \log_{10} \frac{\Pr(E|H)}{\Pr(E|\overline H)} = 100 \left [\log_{10} \frac{\Pr(H|E)}{\Pr(\overline H|E)} - \log_{10} \frac{\Pr(H)}{\Pr(\overline H)} \right ]

Where \overline H is used to indicate the negation of the hypothesis. Good recommends taking the log base 10 and multiplying by 100, and calls the resulting units centibans. The second definition of weight of evidence as a difference in log odd leads naturally to the idea of an incremental weight of evidence for each new observation.

Following Madigan, Mosurski and Almond (1997), all that is needed to calculate the WOE is the marginal distribution for the hypothesis variable at each time point. They also note that the definition is somewhat problematic if the hypothesis variable is not binary. In that case, they recommend partitioning the states into a positive and negative set. The pos and neg are meant to describe that partition. They can be any expression suitable for selecting columns from the hist matrix.

Value

A vector of weights of evidence of length one less than the number of rows of hist (i.e., the result of applying diff() to the vector of log odds.)

Author(s)

Russell Almond

References

Good, I. (1971) The probabilistic explication of information, evidence, surprise, causality, explanation and utility. In Proceedings of a Symposium on the Foundations of Statistical Inference. Holt, Rinehart and Winston, 108-141.

Madigan, D., Mosurski, K. and Almond, R. (1997) Graphical explanation in belief networks. Journal of Computational Graphics and Statistics, 6, 160-181.

Examples

  testFiles <- system.file("testFiles",package="CPTtools")
  allcorrect <- readHistory(read.csv(file.path(testFiles,
     "CorrectSequence.csv"),as.is=TRUE),
     probcol="Margin.sequences.")
  woeHist(allcorrect,"High",c("Medium","Low"))
  woeHist(allcorrect,1:2,3)

ralmond/CPTtools documentation built on Dec. 27, 2024, 7:15 a.m.