README.md

nterval 0.1.1

Lifecycle:
experimental CRAN
status

The goal of nterval is to provide an approach for calculating the minimum required sample size for estimation of population coverage intervals (e.g., k-sigma intervals).

Installation

Install the package using devtools:

# To install the most recent stable release of the package from Github
devtools::install_github("dygobeng/nterval@*release")

# To install the latest (development) version of the package from Github
devtools::install_github("dygobeng/nterval")

Definitions

Before exploring an example, a few definitions:

Example

A researcher would like to determine the minimum sample size required to provide that a 2-sigma sample interval (with a targeted coverage of 95.44% when calculated using data that can be reasonably approximated by a Normal distribution) will cover no less than 93% of the population and no more than 97% of the population with 70% reliability.

We’ll use the find_n_ksigma() function to determine the required sample size. The main arguments accept values for k (based on the target coverage), the proximity_range, and the targeted reliability (NOTE: The seed argument is used for reproducibility):

library(nterval)
find_n_ksigma(
  proximity_range = c(0.93, 0.97),
  reliability = 0.7,
  k = 2,
  seed = 12345
)
#> $sample_size
#> [1] 53
#> 
#> $k_constant
#> [1] 2
#> 
#> $reliability_hat
#> [1] 0.700928

The required sample size is correlated with the target reliability and inversely correlated with the width of the proximity range:

# Increase targeted reliability to 80%
find_n_ksigma(
  proximity_range = c(0.93, 0.97),
  reliability = 0.8,
  k = 2,
  seed = 12345
)
#> $sample_size
#> [1] 79
#> 
#> $k_constant
#> [1] 2
#> 
#> $reliability_hat
#> [1] 0.8002

# Increase width of proximity range to 92% - 98%
find_n_ksigma(
  proximity_range = c(0.92, 0.98),
  reliability = 0.7,
  k = 2,
  seed = 12345
)
#> $sample_size
#> [1] 25
#> 
#> $k_constant
#> [1] 2
#> 
#> $reliability_hat
#> [1] 0.703056

There are some cases in which pre-specifying k may generate larger sample sizes - for example, when the proximity limits are located asymmetrically about the target coverage. In these situations, there is evidence to suggest that setting k closer to the midpoint of the proximity range may yield smaller sample sizes without loss of coverage. As such, if a value fork is not specified it is set based on the midpoint of proximity_range. Compare the following function executions:

# Setting k = 2 (95.44% coverage)
find_n_ksigma(
  proximity_range = c(0.93, 0.96),
  reliability = 0.7,
  k = 2,
  seed = 12345
)
#> $sample_size
#> [1] 143
#> 
#> $k_constant
#> [1] 2
#> 
#> $reliability_hat
#> [1] 0.70016

# Allowing k to be set to the mid-point of the proximity range (0.945)
find_n_ksigma(
  proximity_range = c(0.93, 0.96),
  reliability = 0.7,
  seed = 12345
)
#> $sample_size
#> [1] 114
#> 
#> $k_constant
#> [1] 1.918876
#> 
#> $reliability_hat
#> [1] 0.701558

Setting verbose = TRUE provides real-time information on the progress of the bisection algorithm.

find_n_ksigma(
  proximity_range = c(0.93, 0.97),
  reliability = 0.7,
  k = 2,
  seed = 12345,
  verbose = TRUE
)
#> ℹ Checking k = 2
#> ℹ Round:1  a:3  b:500
#> ℹ Round:2  a:3  b:252
#> ℹ Round:3  a:3  b:128
#> ℹ Round:4  a:3  b:66
#> ℹ Round:5  a:35  b:66
#> ℹ Round:6  a:51  b:66
#> ℹ Round:7  a:51  b:59
#> ℹ Round:8  a:51  b:55
#> ℹ Round:9  a:51  b:53
#> ℹ Round:10  a:52  b:53
#> $sample_size
#> [1] 53
#> 
#> $k_constant
#> [1] 2
#> 
#> $reliability_hat
#> [1] 0.700928

Setting plot = TRUE generates a histogram of the sample coverages, highlighting the acceptable coverage region between the proximity limits.

find_n_ksigma(
  proximity_range = c(0.93, 0.97),
  reliability = 0.7,
  k = 2,
  seed = 12345,
  plot = TRUE
)
#> $sample_size
#> [1] 53
#> 
#> $k_constant
#> [1] 2
#> 
#> $reliability_hat
#> [1] 0.700928
#> 
#> $reliability_plot



dygobeng/nterval documentation built on March 22, 2022, 6:40 p.m.