T Interval

Table of Contents

  1. T Statistic a. Definitions and Terminology
  2. Confidence Limits
  3. Study Design Considerations
  4. Example: Protein Intake Among Teenage Girls a. Sample Size b. Margin of Error c. Standard Deviation d. Significance e. One Sided Intervals
  5. Study Design Derivations a. Sample Size b. Standard Deviation c. Significance d. One Sided Intervals
  6. References

1. T Statistic

The $t$-statistic is a standardized measure of the magnitude of difference between a sample's mean and some known, non-random constant. It is similar to a $z$-statistic, but differs in that a $t$-statistic may be calculated without knowledge of the population variance.

1a. Definitions and Terminology

Let $\theta$ be a sample parameter from a sample with standard deviation $s$. Let $\theta_0$ be a constant, and $s_\theta = s/\sqrt{n}$ be the standard error of the parameter $\theta$. $t$ is defined: $$t = \frac{\theta - \theta_0}{s_\theta} = \frac{\theta - \theta_0}{\frac{s}{\sqrt{n}}}$$

2. Confidence Limits

The confidence interval for $\theta$ is written: $$\theta \pm t_{1-\alpha/2} \cdot \frac{s}{\sqrt{n}}$$

The value of the expression on the right is often referred to as the margin of error, and we will refer to this value as $$E = t_{1-\alpha/2} \cdot \frac{s}{\sqrt{n}}$$

3. Study Design Considerations

I'm not sure what I had intended to write here. Probably ideas about when to estimate $n$, and when to estimate other parameters. But that may be a discussion for a separate vignette, as it applies to almost every function in this package.

4. Example: Protein Intake Among Teenage Girls

The following example is taken from the Wiley text, page 188:

A health department nutritionist, wishing to conduct a survey among a population of teenage girls to determine their average daily protein intake (measured in grams), is seeking the advice of a biostatistician relative to the sample size that should be taken.

Let us assume that the nutritionis would like an interval about 10 grams wide; that is, the estimate shoudl be within about 5 grams of the population mean in either direction. In other words, a margin of error of 5 grams is desired. Let us also assume that a confidence coefficient of .95 is decided on and that, from past experience, the nutrionist feels that the population standard deviation is probably about 20 grams.

Note: the results in our example will differ slightly from the results in the text as we are using the t-interval instead of the z-interval.

4a. Sample Size

An immediate solution to the request can be provided using

library(StudyPlanning)
interval_t1(E=5, s=20, alpha=.05)

Suppose, however, that we aren't sure that the standard deviation is 20, but could be anywhere from 10 to 25. In this case, we may wish to explore the effect the changing standard deviation will have on the sample size. We'll also explore the 90% and 99% confidence intervals as well.

library(StudyPlanning)
library(ggplot2)
SampSize <- interval_t1(E=5, s=seq(10, 25, by=1), alpha=c(.10, .05, .01))
ggplot(SampSize, aes(x=s, y=n, colour=factor(alpha))) + geom_line()

A larger value of the standard deviation has less of an impact on the 90% confidence interval than it does on the 99% confidence interval.

4b. Margin of Error

In this instance, let's consider the possibility that the nutritionist can only afford to sample 50 girls in the population. What is the margin of error that nutritionist can reasonably expect to capture in this sample?

library(StudyPlanning)
library(ggplot2)
SampSize <- interval_t1(E=NULL, n=50, s=seq(10, 25, by=1), alpha=c(.10, .05, .01))
ggplot(SampSize, aes(x=s, y=E, colour=factor(alpha))) + geom_line()

Using the 95% confidence interval with the initial estimate of the standard deviation ($s=20$), the researcher can expect to estimate the protein intake to within r SampSize$E[SampSize$s==20 & SampSize$alpha==.05] grams. With a standard deviation of $s=15$, the margin of error would reduce to r SampSize$E[SampSize$s==15 & SampSize$alpha==.05]

4c. Standard Deviation

A more likely scenario is that we know we may be able to afford to sample 75 girls, and we know that we wish to estimate the protein intake to within 5 grams. What we may not know is the standard deviation of protein intake. In this case, we may prefer to explore the magnitude of the standard deviation. If a very small standard deviation is required, the study may not be feasible.

library(StudyPlanning)
library(ggplot2)
SampSize <- interval_t1(E=seq(3, 7, by=.01), n=75, s=NULL, alpha=c(.10, .05, .01))
ggplot(SampSize, aes(x=E, y=s, colour=factor(alpha))) + geom_line()

Based on these results, we may conclude that we can estimate the protein intake to within 5 grams as long as the observed standard deviation is less than about 25.

4d. Significance

Suppose we know we can afford to sample 75 girls, and we expect the standard deviation to be between 15 and 25 (based on a confidence interval from a previous study). Let us consider the level of confidence we can have in our estimate based on these parameters.

library(StudyPlanning)
library(ggplot2)
SampSize <- interval_t1(E=c(5, 7.5, 10), n=75, s=15:25, alpha=NULL)
ggplot(SampSize, aes(x=s, y=1-alpha, colour=factor(E))) + geom_line() + 
  xlab("Confidence")

The results show that our confidence is very high when we estimate to within 10 grams. If we are estimating to within 5 grams, our confidence my dip below 95%, but we can expect it to remain above 90%.

4e. One Sided Intervals

If a one-sided confidence interval is desired, we only need to multiply our desired alpha by 2. Thus, for a one-sided confidence interval of the original example, only 46 subjects are needed (as shown below).

library(StudyPlanning)
interval_t1(E=5, s=20, alpha=.05*2)

5. Study Design Derivations

5a. Sample Size

$$E = t_{1-\alpha/2} \cdot \frac{s}{\sqrt{n}}$$ $$\frac{E}{t_{1-\alpha/2}} = \frac{s}{\sqrt{n}}$$ $$\frac{E}{t_{1-\alpha/2} \cdot s} = \frac{1}{\sqrt{n}}$$ $$\frac{t_{1-\alpha/2} \cdot s}{E} = \sqrt{n}$$ $$\frac{t_{1-\alpha/2}^2 \cdot s^2}{E^2} = n$$

Since $t_{1-\alpha/2}$ depends on the value of $n$, this is not a problem that is easily reduced to a solution. Many texts encourage using $z_{1-alpha/2}$ as a substitute, but we're using computers here, so we can probably do a little better. Instead, if we write the last line as: $$\frac{t_{1-\alpha/2}^2 \cdot s^2}{E^2} - n = 0$$ $$\big(\frac{t_{1-\alpha/2}^2 \cdot s^2}{E^2} - n\big)^2 = 0$$

We now have a quadratic equation. We'll use the optimize function in the stats package to find a best solution for $n$.

Consider when we have $n=25$, $s=4$ and $\alpha=.05$. The value of $E$ here is $$E = t_{1-\alpha/2} \cdot \frac{s}{\sqrt{n}} = 2.063899 \cdot 4/5 = 1.651119$$.

Now let's rewrite the problem to solve for $n$ using optimize.

fn <- function(n) (qt(1-.05/2, n-1)^2 * 4^2 / 1.651119^2 - n)^2
optimize(fn, c(0, 100))

On the other hand, using the $z$ approximation yields

qnorm(1-.05/2)^2 * 4^2 / 1.651119^2

which is two subjects short of what we would actually need. n_t1samp_interval uses the optimize function and searches over the values 0 to 1,000,000,000.

5b. Standard Deviation

$$E = t_{1-\alpha/2} \cdot \frac{s}{\sqrt{n}}$$ $$\frac{E}{t_{1-\alpha/2}} = \frac{s}{\sqrt{n}}$$ $$\frac{E \cdot \sqrt{n}}{t_{1-\alpha/2}} = s$$

5c. Significance

$$E = t_{1-\alpha/2} \cdot \frac{s}{\sqrt{n}}$$ $$\frac{E \cdot \sqrt{n}}{s} = t_{1-\alpha/2}$$ $$\Phi_t\Big(\frac{E \cdot \sqrt{n}}{s}\Big) = \Phi_t(t_{1-\alpha/2})$$ $$\Phi_t\Big(\frac{E \cdot \sqrt{n}}{s}\Big) = 1 - \frac{\alpha}{2}$$ $$1 - \cdot \Phi_t\Big(\frac{E \cdot \sqrt{n}}{s}\Big) = \frac{\alpha}{2}$$ $$2 \cdot \Big[1 - \Phi_t\Big(\frac{E \cdot \sqrt{n}}{s}\Big)\Big] = \alpha$$

6. References

Daniel Wayne W., Biostatistics: A Foundation For Analysis in the Health Sciences, John Wiley & Sons, Inc., New York, 4th ed. 2005, (Chapter 6)



nutterb/StudyPlanning documentation built on May 24, 2019, 10:51 a.m.