# One sample sign tests" In distributions3: Probability Distributions as S3 Objects

knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE )  ## Problem setup Sometimes you want to do a Z-test or a T-test, but for some reason these tests are not appropriate. Your data may be skewed, or from a distribution with outliers, or non-normal in some other important way. In these circumstances a sign test is appropriate. For example, suppose you wander around Times Square and ask strangers for their salaries. Incomes are typically very skewed, and you might get a sample like: [ 8478, 21564, 36562, 176602, 9395, 18320, 50000, 2, 40298, 39, 10780, 2268583, 3404930 ] If we look at a QQ plot, we see there are massive outliers: incomes <- c(8478, 21564, 36562, 176602, 9395, 18320, 50000, 2, 40298, 39, 10780, 2268583, 3404930) qqnorm(incomes) qqline(incomes)  Luckily, the sign test only requires independent samples for valid inference (as a consequence, it has been low power). ## Null hypothesis and test statistic The sign test allows us to test whether the median of a distribution equals some hypothesized value. Let's test whether our data is consistent with median of 50,000, which is close-ish to the median income in the U.S. if memory serves. That is [ H_0: m = 50,000 \qquad H_A: \mu \neq 50,000 ] where$m$stands for the population median. The test statistic is then [ B = \sum_{i=1}^n 1_{(50, 000, \infty)} (x_i) \sim \mathrm{Binomial}(N, 0.5) ] Here$B$is the number of data points observed that are strictly greater than the median, and$N$is sample size after exact ties with the median have been removed. Forgetting to remove exact ties is a very frequent mistake when students do this test in classes I TA. If we sort the data we can see that$B = 3$and$N = 12in our case: sort(incomes)  We can verify this with R as well: b <- sum(incomes > 50000) b n <- sum(incomes != 50000) n  ## Calculating p-values To calculate a two-sided p-value, we need to find \begin{align} 2 \cdot \min(P(B \ge 3), P(B \le 3)) = 2 \cdot \min(1 - P(B \le 2), P(B \le 3)) \end{align} To do this we need to c.d.f. of a binomial random variable: library(distributions3) X <- Binomial(n, 0.5) 2 * min(cdf(X, b), 1 - cdf(X, b - 1))  In practice computing the c.d.f. of binomial random variables is rather tedious and there aren't great shortcuts for small samples. If you got a question like this on an exam, you'd want to use the binomial p.m.f. repeatedly, like this: \begin{align} P(B \le 3) &= P(B = 0) + P(B = 1) + P(B = 2) + P(B = 3) \ &= \binom{12}{0} 0.5^0 0.5^12 + \binom{12}{1} 0.5^1 0.5^11 + \binom{12}{2} 0.5^2 0.5^10 + \binom{12}{3} 0.5^3 0.5^9 \end{align} Finally, sometimes we are interest in one sided sign tests. For the test \begin{align} H_0: m \le 3 \qquad H_A: m > 3 \end{align} the p-value is given by [ P(B > 3) = 1 - P(B \le 2) ] which we calculate with 1 - cdf(X, b - 1)  For the test [ H_0: m \ge 3 \qquad H_A: m < 3 ] the p-value is given by [ P(B < 3) ] which we calculate with cdf(X, b)  ## Using the binom.test() function To verify results we can use the binom.test() from base R. The x argument gets the value ofB$, n the value of$N$, and p = 0.5 for a test of the median. That is, for$H_0 : m = 3$we would use binom.test(3, n = 12, p = 0.5)  For$H_0 : m \le 3$binom.test(3, n = 12, p = 0.5, alternative = "greater")  For$H_0 : m \ge 3\$

binom.test(3, n = 12, p = 0.5, alternative = "less")


All of these results agree with our manual computations, which is reassuring.

## Try the distributions3 package in your browser

Any scripts or data that you put into this service are public.

distributions3 documentation built on Sept. 3, 2019, 5:06 p.m.