knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(hypeR)

Hypothesis testing is one of the core skills of any data scientist. In this package (hypeR) we learn how to obtain simple statistics about data and perform a student's t-test to test a simple hypothesis. Why is this useful? We aimed to create a package in which you could easily take in a data set and determine whether to accept or reject the null. The package does not require two inputs of data sets. Instead, you can compare your data set to a mean that you choose.

Example of Use

Suppose we are given the heights (in inches) of the following students in a class.

heights <- c(72.6, 57.2, 67.6, 64.8, 59.3, 72.0, 67.2, 61.3, 64.7, 59.6)

We are told that the average height of a student in the class is equal to 70 inches, the average height of all students in a school. We test the null hypothesis that the average height of the class is 70 by performing a t-test.

First, we can obtain the t-statistic for the situation where the heights are given and the mean is assumed to be 70.

t <- t_test(data = heights, mean_0 = 70)
print(t)

To interpret the t-statistic, we obtain the p-value, the probability of obtaining the above statistic given that (assuming that) the null hypothesis is true^[Devore, J. (n.d.). Probability and statistics for engineering and the sciences.]. If the probability is low (below the usual significance level of 5%), this means it is unlikely that the data was obtained under our null hypothesis, so we reject it. To compute the p-value and make the comparison, use

hyp_test(data = heights, mean_0 = 70, alpha = 0.05)

Finally, we obtain a 95% confidence interval.

conf_int(data = heights)

We see that this confidence interval does not contain the null hypothesis mean of 70. Under the null hypothesis, this would happen about 5% of the time.



UBC-MDS/hypeR documentation built on May 22, 2019, 2:26 p.m.