est_prop: Estimate proportions and standard errors
In edwardlavender/utils.add: A Compilation of Additional Utilities for Day-to-Day Coding in R

est_prop

R Documentation

Estimate proportions and standard errors

Description

This function estimates proportions and their associated standard errors/confidence intervals from binary/binomial observations. These can be supplied as a binary vector of successes and failures or as integers that define the number of successes and failures. The function calculates the observed and expected proportion of successes, along with their standard errors and confidence intervals, and returns these estimates in a dataframe.

Usage

est_prop(x, y = NULL, accuracy = 0.001, ...)

Arguments

`x`	A binary vector of successes and failures or an integer that defines the number of successes.
`y`	(optional) An integer that defines the number of failures.
`accuracy`	A number that defines the accuracy to which estimates are rounded. This can be suppressed with `accuracy = NULL`.
`...`	Additional arguments (none implemented).

Details

If n is the number of trials, and n_1 is the number of successes, the estimated proportion is given by \hat{p} = n_1/n or \hat{p} = (n_1 + 2)/(n + 4) if n_1 \leq 5 or n - n_1 \leq 5. The latter is a 'correction' for small sample sizes. Corresponding standard errors are given by SE = \sqrt{\hat{p}(1-\hat{p})/n} or SE = \sqrt{\hat{p}(1-\hat{p})/(n + 4)} and 95 percent confidence intervals by \hat{p} \pm t_{0.975,n-1}SE (Gelman et al. 2021).

Value

The function returns a dataframe with the following columns: 'n', the total number of samples; 'n_success', the total number of successes; 'n_failure', the total number of failures; 'p_obs', the empirical probability of success; 'p_hat', the estimated probability of success; 'se', the standard error; 'lower_ci', the lower confidence bound; 'upper_ci', the upper confidence bound; 'ci', the confidence interval; 'truncate_lower' and 'truncate_upper', boolean variables which define whether or not the lower/upper confidence interval has been truncated at zero/one; and 'correction', a boolean variable which defines whether or not the estimates have been calculated using the correction for small samples (see Details). The dataframe also contains an 'se_prob_param' attribute which is a list of the arguments supplied to the function.

Author(s)

Edward Lavender

References

Gelman, A. et al. (2021) Regression and Other Stories. Cambridge, Cambridge University Press, pages 51-53.

Examples

# Estimate proportions from a vector of successes and failures
est_prop(sample(c(0, 1), 10, replace = TRUE))
# Compare estimates proportions from a larger sample size
est_prop(sample(c(0, 1), 1e4, replace = TRUE))
# Manipulate the simulated probability of success and failure
x <- sample(c(0, 1), 1e4, replace = TRUE, prob = c(0.2, 0.8))
est_prop(x)
# Estimate proportions from a count of the number of successes and failures
est_prop(sum(x == 1), sum(x == 0))
# Adjust the accuracy of the results
est_prop(x, accuracy = NULL)
est_prop(x,accuracy = 0.01)

edwardlavender/utils.add documentation built on Dec. 14, 2024, 8:11 a.m.