knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%" )
r utils::packageVersion("nterval")
The goal of nterval is to provide an approach for calculating the minimum required sample size for estimation of population coverage intervals (e.g., k-sigma intervals).
Install the package using devtools
:
# To install the most recent stable release of the package from Github devtools::install_github("dygobeng/nterval@*release") # To install the latest (development) version of the package from Github devtools::install_github("dygobeng/nterval")
Before exploring an example, a few definitions:
The target coverage is the expected proportion of a Normally-distributed population that should be contained within the k-sigma parametric sample interval (e.g., a 3-sigma sample interval is expected to cover 99.73% of Normally-distributed observations).
The sample coverage is the proportion of a Normally-distributed population that is contained within the k-sigma interval estimated using a sample from the population.
The upper and lower proximity limits define the distance from the target coverage that an individual sample's coverage can fall and still be considered "reasonably close" to the target. These bounds are useful for dialing in an acceptable
producer's risk by ensuring that the lower proximity limit does not move so low as to result in an unacceptable false positive rate (probability of flagging common cause variation), and
consumer's risk by ensuring that the upper proximity limit does not move so high as to result in an unacceptable false negative rate (probability of failing to flag special cause variation)
The reliability is the proportion of individual sample coverages expected to fall within the proximity limits.
A researcher would like to determine the minimum sample size required to provide that a 2-sigma sample interval (with a targeted coverage of 95.44% when calculated using data that can be reasonably approximated by a Normal distribution) will cover no less than 93% of the population and no more than 97% of the population with 70% reliability.
We'll use the find_n_ksigma()
function to determine the required sample size. The main arguments
accept values for k
(based on the target coverage), the proximity_range
, and the targeted
reliability
(NOTE: The seed
argument is used for reproducibility):
library(nterval)
find_n_ksigma( proximity_range = c(0.93, 0.97), reliability = 0.7, k = 2, seed = 12345 )
The required sample size is correlated with the target reliability and inversely correlated with the width of the proximity range:
# Increase targeted reliability to 80% find_n_ksigma( proximity_range = c(0.93, 0.97), reliability = 0.8, k = 2, seed = 12345 ) # Increase width of proximity range to 92% - 98% find_n_ksigma( proximity_range = c(0.92, 0.98), reliability = 0.7, k = 2, seed = 12345 )
There are some cases in which pre-specifying k
may generate larger sample sizes - for example,
when the proximity limits are located asymmetrically about the target coverage. In these situations,
there is evidence to suggest that setting k
closer to the midpoint of the proximity range may
yield smaller sample sizes without loss of coverage. As such, if a value fork
is not specified it
is set based on the midpoint of proximity_range
. Compare the following function executions:
# Setting k = 2 (95.44% coverage) find_n_ksigma( proximity_range = c(0.93, 0.96), reliability = 0.7, k = 2, seed = 12345 ) # Allowing k to be set to the mid-point of the proximity range (0.945) find_n_ksigma( proximity_range = c(0.93, 0.96), reliability = 0.7, seed = 12345 )
Setting verbose = TRUE
provides real-time information on the progress of the bisection algorithm.
find_n_ksigma( proximity_range = c(0.93, 0.97), reliability = 0.7, k = 2, seed = 12345, verbose = TRUE )
Setting plot = TRUE
generates a histogram of the sample coverages, highlighting the acceptable
coverage region between the proximity limits.
find_n_ksigma( proximity_range = c(0.93, 0.97), reliability = 0.7, k = 2, seed = 12345, plot = TRUE )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.