Sample Size for a One- or Two-Sample t-Test, Assuming Lognormal Data

Share:

Description

Compute the sample size necessary to achieve a specified power for a one- or two-sample t-test, given the ratio of means, coefficient of variation, and significance level, assuming lognormal data.

Usage

1
2
3
4
  tTestLnormAltN(ratio.of.means, cv = 1, alpha = 0.05, power = 0.95, 
    sample.type = ifelse(!is.null(n2), "two.sample", "one.sample"), 
    alternative = "two.sided", approx = FALSE, n2 = NULL, round.up = TRUE, 
    n.max = 5000, tol = 1e-07, maxiter = 1000)

Arguments

ratio.of.means

numeric vector specifying the ratio of the first mean to the second mean. When sample.type="one.sample", this is the ratio of the population mean to the hypothesized mean. When sample.type="two.sample", this is the ratio of the mean of the first population to the mean of the second population. The default value is ratio.of.means=1.

cv

numeric vector of positive value(s) specifying the coefficient of variation. When sample.type="one.sample", this is the population coefficient of variation. When sample.type="two.sample", this is the coefficient of variation for both the first and second population. The default value is cv=1.

alpha

numeric vector of numbers between 0 and 1 indicating the Type I error level associated with the hypothesis test. The default value is alpha=0.05.

power

numeric vector of numbers between 0 and 1 indicating the power associated with the hypothesis test. The default value is power=0.95.

sample.type

character string indicating whether to compute power based on a one-sample or two-sample hypothesis test. When sample.type="one.sample", the computed power is based on a hypothesis test for a single mean. When
sample.type="two.sample", the computed power is based on a hypothesis test for the difference between two means. The default value is
sample.type="one.sample" unless the argument n2 is supplied.

alternative

character string indicating the kind of alternative hypothesis. The possible values are "two.sided" (the default), "greater", and "less".

approx

logical scalar indicating whether to compute the power based on an approximation to the non-central t-distribution. The default value is FALSE.

n2

numeric vector of sample sizes for group 2. The default value is NULL in which case it is assumed that the sample sizes for groups 1 and 2 are equal. This argument is ignored when sample.type="one.sample". Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are not allowed.

round.up

logical scalar indicating whether to round up the values of the computed sample size(s) to the next smallest integer. The default value is TRUE.

n.max

positive integer greater than 1 indicating the maximum sample size when
sample.type="one.sample" or the maximum sample size for group 1 when sample.type="two.sample". The default value is n.max=5000.

tol

numeric scalar indicating the toloerance to use in the uniroot search algorithm. The default value is tol=1e-7.

maxiter

positive integer indicating the maximum number of iterations argument to pass to the uniroot function. The default value is maxiter=1000.

Details

If the arguments ratio.of.means, cv, alpha, power, and n2 are not all the same length, they are replicated to be the same length as the length of the longest argument.

Formulas for the power of the t-test for lognormal data for specified values of the sample size, ratio of means, and Type I error level are given in the help file for tTestLnormAltPower. The function tTestLnormAltN uses the uniroot search algorithm to determine the required sample size(s) for specified values of the power, scaled difference, and Type I error level.

Value

When sample.type="one.sample", or sample.type="two.sample" and n2 is not supplied (so equal sample sizes for each group is assumed), tTestLnormAltN returns a numeric vector of sample sizes. When sample.type="two.sample" and n2 is supplied, tTestLnormAltN returns a list with two components called n1 and n2, specifying the sample sizes for each group.

Note

See tTestLnormAltPower.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

See tTestLnormAltPower.

See Also

tTestLnormAltPower, tTestLnormAltRatioOfMeans, plotTTestLnormAltDesign, LognormalAlt, t.test, Hypothesis Tests.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
  # Look at how the required sample size for the one-sample test increases with 
  # increasing required power:

  seq(0.5, 0.9, by = 0.1) 
  # [1] 0.5 0.6 0.7 0.8 0.9 

  tTestLnormAltN(ratio.of.means = 1.5, power = seq(0.5, 0.9, by = 0.1)) 
  # [1] 19 23 28 36 47

  #----------

  # Repeat the last example, but compute the sample size based on the approximate 
  # power instead of the exact power:

  tTestLnormAltN(ratio.of.means = 1.5, power = seq(0.5, 0.9, by = 0.1), approx = TRUE) 
  # [1] 19 23 29 36 47

  #==========

  # Look at how the required sample size for the two-sample t-test decreases with 
  # increasing ratio of means:

  seq(1.5, 2, by = 0.1) 
  #[1] 1.5 1.6 1.7 1.8 1.9 2.0 

  tTestLnormAltN(ratio.of.means = seq(1.5, 2, by = 0.1), sample.type = "two") 
  #[1] 111  83  65  54  45  39

  #----------

  # Look at how the required sample size for the two-sample t-test decreases with 
  # increasing values of Type I error:

  tTestLnormAltN(ratio.of.means = 1.5, alpha = c(0.001, 0.01, 0.05, 0.1), 
    sample.type = "two") 
  #[1] 209 152 111  92

  #----------

  # For the two-sample t-test, compare the total sample size required to detect a 
  # ratio of means of 2 for equal sample sizes versus the case when the sample size 
  # for the second group is constrained to be 30.  Assume a coefficient of variation 
  # of 1, a 5% significance level, and 95% power.  Note that for the case of equal 
  # sample sizes, a total of 78 samples (39+39) are required, whereas when n2 is 
  # constrained to be 30, a total of 84 samples (54 + 30) are required.

  tTestLnormAltN(ratio.of.means = 2, sample.type = "two") 
  #[1] 39 

  tTestLnormAltN(ratio.of.means = 2, n2 = 30) 
  #$n1: 
  #[1] 54 
  #
  #$n2: 
  #[1] 30

  #==========

  # The guidance document Soil Screening Guidance: Technical Background Document 
  # (USEPA, 1996c, Part 4) discusses sampling design and sample size calculations 
  # for studies to determine whether the soil at a potentially contaminated site 
  # needs to be investigated for possible remedial action. Let 'theta' denote the 
  # average concentration of the chemical of concern.  The guidance document 
  # establishes the following goals for the decision rule (USEPA, 1996c, p.87):
  #
  #     Pr[Decide Don't Investigate | theta > 2 * SSL] = 0.05
  #
  #     Pr[Decide to Investigate | theta <= (SSL/2)] = 0.2
  #
  # where SSL denotes the pre-established soil screening level.
  #
  # These goals translate into a Type I error of 0.2 for the null hypothesis
  #
  #     H0: [theta / (SSL/2)] <= 1
  #
  # and a power of 95% for the specific alternative hypothesis
  #
  #     Ha: [theta / (SSL/2)] = 4
  #
  # Assuming a lognormal distribution and the above values for Type I error and 
  # power, determine the required samples sizes associated with various values of 
  # the coefficient of variation for the one-sample test.  Based on these calculations, 
  # you need to take at least 6 soil samples to satisfy the requirements for the 
  # Type I and Type II errors when the coefficient of variation is 2.

  cv <- c(0.5, 1, 2)
  N <- tTestLnormAltN(ratio.of.means = 4, cv = cv, alpha = 0.2, 
    alternative = "greater") 

  names(N) <- paste("CV=", cv, sep = "")
  N
  #CV=0.5   CV=1   CV=2 
  #     2      3      6 

  #----------

  # Repeat the last example, but use the approximate power calculation instead of the 
  # exact. Using the approximate power calculation, you need 7 soil samples when the 
  # coefficient of variation is 2 (because the approximation underestimates the 
  # true power).

  N <- tTestLnormAltN(ratio.of.means = 4, cv = cv, alpha = 0.2, 
    alternative = "greater", approx = TRUE) 

  names(N) <- paste("CV=", cv, sep = "")
  N
  #CV=0.5   CV=1   CV=2 
  #     3      5      7

  #----------

  # Repeat the last example, but use a Type I error of 0.05.

  N <- tTestLnormAltN(ratio.of.means = 4, cv = cv, alternative = "greater", 
    approx = TRUE) 

  names(N) <- paste("CV=", cv, sep = "")
  N
  #CV=0.5   CV=1   CV=2 
  #     4      6     12

  #==========

  # Reproduce the second column of Table 2 in van Belle and Martin (1993, p.167).

  tTestLnormAltN(ratio.of.means = 1.10, cv = seq(0.1, 0.8, by = 0.1), 
    power = 0.8, sample.type = "two.sample", approx = TRUE) 
  #[1]  19  69 150 258 387 533 691 856

  #==========

  # Clean up
  #---------
  rm(cv, N)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.