expTable | R Documentation |
The table
function in base-R builds contingency tables from cross-classifying factors. If the factor is not known, but instead estimated using a Bayesian classifier, then instead of a single column with the factor value, there will be several columns with giving the probability that the individual is in each state. The table built is then an expected contingency table.
expTable(data, pvecVars, facVars,
pvecregex = "<var>\\.<state>")
pvecTable(data, pvarName,
regx = sub("<var>",pvarName,"<var>\\.<state>"))
catTable(data, fvarName,
cc = contrasts(as.factor(dplyr::pull(data, fvarName)),
contrasts = FALSE))
data |
This is a data frame where columns corresponding to probability vectors have a regular naming pattern. |
pvecVars , pvarName |
The names (or name for |
facVars , fvarName |
The names (or name for |
pvecregex , regx |
A regular expression template for finding the column names. The string “<var>” is replaced with the variable name ( |
cc |
This is a contrasts matrix (see |
For an individual, $i$, let $Y_i$ be a fully observed variable which takes on states ${ y_1, ..., y_m}$. Let $S_i$ be a latent with states ${s_1, ..., s_k}$, and let $P(S_i)$ is an estimate of $S_i$, so it is a vector over the states.
The expected matrix is formed as follows:
Initialize a $k$ by $m$ (rows correspond to states of $S$ and columns to states of $Y$) matrix with 0.
For each individual, add $P(S_i)$ to the column corresponding to $Y_i$
The result is the expected value of the contingency table.
The general case can be handled with an Einstein sum (see
einsum
), with a rule “za,zb,zc,... -> abc...”.
The assumption is that the estimates of the latent variable are
saved in columns with a regular naming convention. For example,
the ACED data set uses the names P.cr..H
,
P.cr..M
and P.cr..L
are the high, medium and
low probabilities for the common ratio variable. The regular
expression which will capture the names of the states is
“P\.cr\.\.(\w+)”, where “\w+” is one or more
word constituent characters. The parentheses around the
last part are used to extract the state names.
Internally, the function substitutes the value of
pvecName
for “<var>”, and the “(\w+)” is
substituted for “<state>”. If this regular expression
doesn't work for grabbing the state names, a better expression
can be substituted, but it should be the first sub-expression
marked with parentheses. Note also that
the period has a special meaning in regular expressions so it
needs to be quoted. Note also, that the backslash needs to be
quoted in R strings.
The functions pvecTable
and catTable
return a
matrix with rows corresponding to the original data, and columns
to the states of the variable.
The function expTable
produces an array whose dimensions
correspond to the states of probability and factor variables.
Russell Almond
Observable Analysis paper (in preparation).
mutualInformation
, gkGamma
,
table
, contrasts
,
einsum
data(ACED)
ACED.joined <- dplyr::inner_join(ACED.scores,ACED.items,by="SubjID")
head(pvecTable(ACED.joined,"cr",regx="P\\.cr\\.\\.<state>"))
head(catTable(ACED.joined,"tCommonRatio1a"))
expTable(ACED.joined,"cr","tCommonRatio1a",
pvecregex="P\\.<var>\\.\\.<state>")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.