expTable: Builds expected contingency tables.
In ralmond/CPTtools: Tools for Creating Conditional Probability Tables

expTable

R Documentation

Builds expected contingency tables.

Description

The table function in base-R builds contingency tables from cross-classifying factors. If the factor is not known, but instead estimated using a Bayesian classifier, then instead of a single column with the factor value, there will be several columns with giving the probability that the individual is in each state. The table built is then an expected contingency table.

Usage

expTable(data, pvecVars, facVars, 
         pvecregex = "<var>\\.<state>")
pvecTable(data, pvarName, 
         regx = sub("<var>",pvarName,"<var>\\.<state>"))
catTable(data, fvarName, 
         cc = contrasts(as.factor(dplyr::pull(data, fvarName)),
              contrasts = FALSE))

Arguments

`data`	This is a data frame where columns corresponding to probability vectors have a regular naming pattern.
`pvecVars`, `pvarName`	The names (or name for `pvecName`) of the estimated variable(s).
`facVars`, `fvarName`	The names (or name for `catTable`) of the factor variable(s).
`pvecregex`, `regx`	A regular expression template for finding the column names. The string “<var>” is replaced with the variable name (`pvecName`) and the string “<state>” is replaced with a regular expression that extract the state name.
`cc`	This is a contrasts matrix (see `contrasts` which shows how to convert the factor variable to columns.

Details

For an individual, $i$, let $Y_i$ be a fully observed variable which takes on states ${ y_1, ..., y_m}$. Let $S_i$ be a latent with states ${s_1, ..., s_k}$, and let $P(S_i)$ is an estimate of $S_i$, so it is a vector over the states.

The expected matrix is formed as follows:

Initialize a $k$ by $m$ (rows correspond to states of $S$ and columns to states of $Y$) matrix with 0.
For each individual, add $P(S_i)$ to the column corresponding to $Y_i$

The result is the expected value of the contingency table. The general case can be handled with an Einstein sum (see einsum), with a rule “za,zb,zc,... -> abc...”.

The assumption is that the estimates of the latent variable are saved in columns with a regular naming convention. For example, the ACED data set uses the names P.cr..H, P.cr..M and P.cr..L are the high, medium and low probabilities for the common ratio variable. The regular expression which will capture the names of the states is “P\.cr\.\.(\w+)”, where “\w+” is one or more word constituent characters. The parentheses around the last part are used to extract the state names.

Internally, the function substitutes the value of pvecName for “<var>”, and the “(\w+)” is substituted for “<state>”. If this regular expression doesn't work for grabbing the state names, a better expression can be substituted, but it should be the first sub-expression marked with parentheses. Note also that the period has a special meaning in regular expressions so it needs to be quoted. Note also, that the backslash needs to be quoted in R strings.

Value

The functions pvecTable and catTable return a matrix with rows corresponding to the original data, and columns to the states of the variable.

The function expTable produces an array whose dimensions correspond to the states of probability and factor variables.

Author(s)

Russell Almond

References

Observable Analysis paper (in preparation).

Examples


data(ACED)
ACED.joined <- dplyr::inner_join(ACED.scores,ACED.items,by="SubjID")
head(pvecTable(ACED.joined,"cr",regx="P\\.cr\\.\\.<state>"))
head(catTable(ACED.joined,"tCommonRatio1a"))
expTable(ACED.joined,"cr","tCommonRatio1a",
         pvecregex="P\\.<var>\\.\\.<state>")

ralmond/CPTtools documentation built on Dec. 27, 2024, 7:15 a.m.