emplogit: Empirical logit

View source: R/emplogit.R

emplogitR Documentation

Empirical logit

Description

Calculates the empirical logit for binomial data; i.e., data that consists of "hits" (1s) versus "misses" (0s). The formula for the empirical logit requires specification of the sample size; sample size may be provided to this function in any one of three ways:

  • the raw data (which implicitly indicate sample size)

  • the number of hits and total number of observations

  • the proportion of hits and total number of observations

Simply providing a proportion of hits or count of hits is insufficient to determine the empirical logit.

Usage

emplogit(hits, n, proportion, rawdata, na.rm = FALSE)

Arguments

hits

frequency count of observed "hits" or 1s.

n

total number of observations.

proportion

proportion of observed "hits" or 1s.

rawdata

a vector of binomial data (hits and misses) for which the empirical logit should be calculated. Raw data may be provided in any of three forms:

  • a numeric vector (0 for miss, 1 for hit)

  • a logical vector (FALSE for miss, TRUE for hit)

  • a factor vector (first level of the factor is a "miss" and all other levels a "hit", following the behavior of family)

na.rm

a logical value indicating whether NA values should be stripped form the raw data before the computation proceeds. Only relevant when providing the raw data.

Details

The empirical logit is an approximation of the logit that is useful when the proportion of hits is 0 or 1 (for which the logit is undefined) or when the proportion of hits is very close to 0 or 1 (for which the logit is unstable), as is sometimes the case with empirical data.

As the sample size increases, the empirical logit converges to the logit.

The empirical logit requires aggregating over multiple observations. Thus, in general, it may be less advantageous than logistic regression, which allows the modeling of individual observations using a logit link function. However, using the empirical logit is necessary or advisable when the proportion of hits closely approaches (or is) 0 or 1.

Because the empirical logit aggregates over multiple observations, once the empirical logit has been calculated, the original number of observations is not reflected in the empirical logit itself and is "lost." Thus, when using the empirical logit in an analysis where the number of observations differs across cells, it is suggested to incorporate the number of observations back into the model in another way–namely, by performing a weighted regression. For more information, see the provided references (esp., Barr, 2008, p. 470).

Value

value of the empirical logit.

References

Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: Wiley.

Barr, D.J. (2008). Analyzing 'visual world' eyetracking data using multilevel logistic regression. Journal of Memory and Language, 59, 457-474.

McCullagh, P., & Nelder, J. (1989). Generalized linear models. London: Chapman and Hall.

See Also

logit for the unadjusted logit.

Examples

emplogit(proportion=.71, n=100)

my.hits = 50
emplogit(hits=my.hits, n=50)

my.emplogit <- emplogit(rawdata=as.factor(c('no','no','yes','no')))
my.emplogit

sfraundorf/psycholing documentation built on April 23, 2022, 2:50 a.m.