One sample Z-tests for a proportion
In distributions3: Probability Distributions as S3 Objects

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In this vignette, we work through an example Z-test a proportion, and point out a number of points where you might get stuck along the way.

Problem setup

Let's suppose that a student is interesting in estimating what percent of professors in their department watches Game of Thrones. They go to office hours and ask each professor and it turns out 17 out of 62 professors in their department watch Game of Thrones. Several of the faculty think Game of Thrones is a board game.

We can imagine that the data is a bunch of zeros and ones, where the $i^{th}$ data point, $x_i$ is one if professor $i$ watches Game of Thrones, and zero otherwise. So the full dataset might look something like:

set.seed(27)

x <- rep(0, 62)
ones <- sample(62, 17)
x[ones] <- 1
dput(x)

[ \begin{align} & 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, \ & 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, \ & 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 \end{align} ]

But it is much easier to just remember that there are 17 ones and 45 zeros.

Assumption checking

Before we can do a Z-test, we need to make check if we can reasonably treat the mean of this sample as normally distributed. The data is definitely not from a normal distribution since it's only zeros and ones, so we need to check if the central limit theorem kicks in.

Most of the time we would check if there were 30 data points or more, but for a proportion, we do something slightly different. When data is binary, like we have here, the central limit theorem kicks in slower than usual. The standard thing to check is whether

$n \cdot \pi > 5$
$n \cdot (1 - \pi) > 5$

Where $n$ is the sample size (62 in our case) and $\pi$ is the sample average. Note that some textbooks might use $p$ rather than $\pi$. In our case we have $\pi = 17 / 62$, and

$62 \cdot 17 / 62 = 17$
$62 \cdot (1 - 17 / 62) = 45$

So we're good to go.

Null hypothesis and test statistic

Let's test the null hypothesis that, on average, twenty percent of professors what Game of Thrones. The corresponding null hypothesis is

[ H_0: \pi = 0.2 \qquad H_A: \pi \neq 0.2 ]

First we need to calculate our Z-statistic. Remember that the Z-statistic for proportion is

[ Z = \frac{\pi - \pi_0}{\sqrt{\frac{\pi_0 (1 - \pi_0)}{n}}} \sim \mathrm{Normal}(0, 1) ]

Calculating p-values

In R this looks like:

n <- 62
pi <- 17 / 62
pi_0 <- 0.2

# calculate the z-statistic
z_stat <- (pi - pi_0) / sqrt(pi_0 * (1 - pi_0) / n)
z_stat

To calculate a two-sided p-value, we need to find

[ \begin{align} P(|Z| \ge |1.46|) &= P(Z \ge 1.46) + P(Z \le -1.46) \ &= 1 - P(Z \le 1.46) + P(Z \le -1.46) \ &= 1 - \Phi(1.46) + \Phi(-1.46) \end{align} ]

To do this we need to c.d.f. of a standard normal

library(distributions3)

Z <- Normal(0, 1)  # make a standard normal r.v.
1 - cdf(Z, 1.46) + cdf(Z, -1.46)

Note that we saved z_stat above so we could have also done

1 - cdf(Z, abs(z_stat)) + cdf(Z, -abs(z_stat))

which is slightly more accurate since there is no rounding error.

So our p-value is about 0.14. You should verify this with a Z-table. Note that you should get the same value from cdf(Z, 1.46) and looking up 1.46 on a Z-table.

You may also have seen a different formula for the p-value of a two-sided Z-test, which makes use of the fact that the normal distribution is symmetric:

[ \begin{align} P(|Z| \ge |1.46|) &= 2 \cdot P(Z \le -|1.46|) &= 2 \cdot \Phi(-1.46) \end{align} ]

Using this formula we get the same result:

2 * cdf(Z, -1.46)

One-sided Z-tests

Finally, sometimes we are interest in one sided Z-tests. For the test

[ \begin{align} H_0: \pi \le 0.2 \qquad H_A: \pi > 0.2 \end{align} ]

the p-value is given by

[ P(Z > 1.46) ]

which we calculate with

1 - cdf(Z, 1.46)

For the test

[ H_0: \pi \ge 0.2 \qquad H_A: \pi < 0.2 ]

the p-value is given by

[ P(Z < 1.46) ]

which we calculate with

cdf(Z, 1.46)

Any scripts or data that you put into this service are public.

distributions3 documentation built on Sept. 30, 2024, 9:37 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

distributions3
Probability Distributions as S3 Objects

One sample Z-tests for a proportion
In distributions3: Probability Distributions as S3 Objects

Problem setup

Assumption checking

Null hypothesis and test statistic

Calculating p-values

One-sided Z-tests

Try the distributions3 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

distributions3 Probability Distributions as S3 Objects

One sample Z-tests for a proportion In distributions3: Probability Distributions as S3 Objects

Problem setup

Assumption checking

Null hypothesis and test statistic

Calculating p-values

One-sided Z-tests

Try the distributions3 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

distributions3
Probability Distributions as S3 Objects

One sample Z-tests for a proportion
In distributions3: Probability Distributions as S3 Objects