knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
Sign and Mood's Median Test in R
smmr
can be found on r-universe:
options(repos = c( ken = 'https://nxskok.r-universe.dev', CRAN = 'https://cloud.r-project.org' )) install.packages("smmr")
You can also install smmr
from GitHub with:
# install.packages("devtools") devtools::install_github("nxskok/smmr")
This is a test for the population median from a single sample. It can be used when testing for a mean is problematic, for example when the sample indicates that the population distribution is skewed, and inference for the mean does not make sense.
Consider the disp
values from the mtcars
data set:
library(tidyverse) ggplot(mtcars, aes(x=disp)) + geom_histogram(bins=5)
This distribution is skewed to the right, and we might be hesitant about using a t-test for the mean.
Could the median disp
value be 150?
library(smmr) sign_test(mtcars, disp, 150)
The two-sided P-value is 0.215, so there is no evidence that the population median differs from 150. The output also permits an assessment of whether the median is less than 150 (P-value 0.945) or greater than 150 (P-value 0.108). The output also includes a count of the number of data values above (20) and below (12) the hypothesized median. This is not an uneven enough split to be able to reject a null hypothesis that the population median is 150.
A 95% confidence interval for the population median can be found by taking all the population medians that would not be rejected at alpha of 1-0.95=0.05:
ci_median(mtcars, disp)
The null median of 150 that we failed to reject is (predictably) inside this confidence interval.
The sign test and its corresponding confidence interval can also be used with matched-pair data, by applying them to the one sample of differences, testing a null median difference of zero.
To compare several medians (testing the null hypothesis that they are all equal, against the alternative that they are not all equal), we can use Mood's median test. For example, in mtcars
, is the median disp
the same for all numbers of cylinders?
median_test(mtcars, disp, cyl)
Definitely not, with P-value 0.00000196. To find which numbers of cylinders have different median values of disp
, we can run Mood's median tests for each pair of groups, adjusting for multiple testing using Bonferroni:
pairwise_median_test(mtcars, disp, cyl)
All three numbers of cylinders have different median disp
values, the largest of the adjusted P-values being 0.003.
A look back at the table of values above and below in the output for median_test
reveals that all 11 of the 4-cylinder cars have a disp
less than the median of all the cars, while all 14 of the 8-cylinder cars have a disp
value greater than this overall median.
The sign test and Mood's median test are often derided as lacking power compared to the signed rank, rank-sum and Kruskal-Wallis tests. However, the latter three tests come with extra assumptions: the signed-rank test assumes a symmetric distribution, and the other two tests assume equal spreads among the groups. When we have doubts about using a t-test or an ANOVA, we may well doubt these assumptions as well, and I think it is better to use a test that does not make these assumptions at all.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.