knitr::opts_chunk$set( collapse = TRUE, comment = "##" )
In this vignette we explain how to use the functions available in the stests
package to hypothesis testing. Additionally, we compare our functions with the built-in functions t.test
and var.test
from stats
package.
The functions covered in this vignette are:
z_test
: function to perform z-test using values.z.test
: function to perform z-test using vectors. t_test
: function to perform t-test using values.t.test
: built-in R function from stats
package to perform t-test using vectors.var_test
: function to perform $\chi^2$-test or F-test using values.var.test
: fuction that generalizes the var.test
function from stats
package. This function uses vectors.print.htest
: generic function to print objects with htest
class.plot.htest
: generic function to plot objects with htest
class.stests
packageAny user can download the stests
package from GitHub using the next code:
if (!require('devtools')) install.packages('devtools') devtools::install_github('fhernanb/stests', force=TRUE)
Next, the package must be loaded into the current session using:
library(stests)
The objective in this type of problems is to test $H_0: \mu = \mu_0$ against:
using the information of a random sample $X_1, X_2, \ldots, X_n$ from a Normal population with known variance $\sigma^2$.
z_test
functionThe z_test
function can be used for students or instructors to solve textbook's problems in which the information is summarized with values. In the next examples we show how to use this function and it's utility.
A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that the mean life span today is greater than 70 years? Use a 0.05 level of significance.
In this example we have: $n=100$, $\bar{x}=71.8$, $\sigma^2=79.21$. The objective is to test the next hypothesis:
Where $\mu$ representes the True average life span today (years).
The code to perform the test with $\alpha=0.05$ significance is:
test1 <- z_test(meanx=71.8, nx=100, sigma2=79.21, mu=70, alternative='greater') test1
We can observe that the output provides a p-value and a confidence interval for the population mean that will help the user in the decision making of the hypothesis test.
If we want to depict the p-value for the test, we could save the result in an object and then use the plot.htest
function as follows:
plot(test1, shade.col='firebrick1', col='firebrick1')
z.test
functionThe z.test
function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.
A dynamic cone penetrometer (DCP) is used for measuring material resistance to penetration (mm/blow) as a cone is driven into pavement or subgrade. Suppose that for a particular application it is required that the true average DCP value for a certain type of pavement be less than 30. The pavement will not be used unless there is conclusive evidence that the specification has been met. Let's state and test the appropriate hypotheses using the following data:
In this example we have: $n=52$. The objective is to test the next hypothesis:
where $\mu$ represents the True average DCP value for a certain type of pavement.
In this example we don't know if the random sample comes from a normal population, but as $n=52$ and $\sigma^2$ is unkown, we can use the z-test with $\bar{x}=28.76$, $s^2=150.422$.
The code to perform the test with $\alpha=0.05$ significance is:
x<-c(14.1, 14.5, 15.5, 16.0, 16.0, 16.7, 16.9, 17.1, 17.5, 17.8, 17.8, 18.1, 18.2, 18.3, 18.3, 19.0, 19.2, 19.4, 20.0, 20.0, 20.8, 20.8, 21.0,21.5, 23.5, 27.5, 27.5, 28.0, 28.3, 30.0, 30.0, 31.6, 31.7, 31.7, 32.5, 33.5, 33.9, 35.0, 35.0, 35.0, 36.7, 40.0, 40.0, 41.3, 41.7, 47.5, 50.0, 51.0, 51.8, 54.4, 55.0, 57.0) test2 <- z.test(x=x, sigma2=var(x), mu=30, alternative='less') test2 plot(test2, shade.col='firebrick1', col='firebrick1')
The objective in this type of problems is to test $H_0: \mu = \mu_0$ against:
using the information of a random sample $X_1, X_2, \ldots, X_n$ from a Normal population with unknown variance $\sigma^2$.
Also, we can test $H_0: \mu_x-\mu_y = \delta_0$ against:
using the information of two random samples $X_1, X_2, \ldots, X_n$ and $Y_1, Y_2, \ldots, Y_n$, both from populations with normal distributions and with unknown variances $\sigma_1^2$,$\sigma_2^2$.
t_test
functionThe t_test
function can be used for students or instructors to solve textbook's problems in which the information is summarized with values. In the next examples we show how to use this function and it's utility.
Past experience indicates that the time required for high school seniors to complete a standardized test is a normal random variable with a mean of 35 minutes. If a random sample of 20 high school seniors took an average of 33.1 minutes to complete this test with a standard deviation of 4.3 minutes, test the hypothesis, at the 0.05 level of significance, that $\mu=35$ minutes against the alternative that $\mu<35$ minutes.
In this example we have: $n=20$, $\bar{x}=33.1$, $s^2=18.49$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.05$ significance is:
test3 <- t_test(meanx=33.1, varx=18.49, nx=20, mu=35, alternative='less') test3 plot(test3, shade.col='firebrick1', col='firebrick1')
An experiment was performed to compare the abrasive wear of two different laminated materials. Twelve pieces of material 1 were tested by exposing each piece to a machine measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed. The samples of material 1 gave an average (coded) wear of 85 units with a sample standard deviation of 4, while the samples of material 2 gave an average of 81 with a sample standard deviation of 5. Can we conclude at the 0.05 level of significance that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units? Assume the populations to be approximately normal with equal variances.
In this example we have: $n_1=12$, $\bar{x}_1=85$, $s_1^2=16$, $n_2=10$, $\bar{x}_2=81$, $s_2^2=25$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.05$ significance is:
test4 <- t_test(meanx=85, varx=16, nx=12, meany=81, vary=25, ny=10, alternative='greater', var.equal=TRUE, mu=2) test4 plot(test4, shade.col='firebrick1', col='firebrick1')
t.test
functionThe t.test
function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.
Glycerol is a major by-product of ethanol fermentation in wine production and contributes to the sweetness, body, and fullness of wines. The article "A Rapid and Simple Method for Simultaneous Determination of Glycerol, Fructose, and Glucose in Wine" (American J. of Enology and Viticulture, 2007: 279-283) includes the following observations on glycerol concentration (mg/mL) for samples of standard-quality (uncertified) white wines:
Does the sample data suggest that true average concentration is something other than the desired value? (Assume that the population distribution of glycerol concentration is normal)
In this example we have: $n=5$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.05$ significance is:
x <- c(2.67, 4.62, 4.14, 3.81, 3.83) test5 <- t.test(x=x, sigma2=var(x), mu=4, alternative='two.sided') test5 plot(test5, shade.col='firebrick1', col='firebrick1')
The objective in this type of problems is to test $H_0: \sigma^2 = \sigma^2_0$ against:
using the information of a random sample $X_1, X_2, \ldots, X_n$ from a population with normal distribution.
var_test
functionThe var_test
function can be used to solve textbook's problems in which the information is summarized with values. In next examples we show the utility of this function.
The purpose of a study by Wilkins et al. (A-28) was to measure the effectiveness of recombinant human growth hormone (rhGH) on children with total body surface area burns > 40 percent. In this study, 16 subjects received daily injections at home of rhGH. At baseline, the researchers wanted to know the current levels of insulin-like growth factor (IGF-I) prior to administration of rhGH. The sample variance of IGF-I levels (in ng/ml) was 670.81. We wish to know if we may conclude from these data that the population variance is not 600.
In this example we have: $n=16$, $s^2=670.81$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.05$ significance is:
test6 <- var_test(varx=670.81, nx=16, null.value=600, alternative='two.sided') test6 plot(test6, shade.col='firebrick1', col='firebrick1')
var.test
functionThe var.test
function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.
The content of containers of a particular lubricant is known to be normally distributed with a variance of 0.03 liter. Test the hypothesis that $\sigma^2 = 0.03$ against the alternative that $\sigma^2 \neq 0.03$ for the random sample of 10 containers:
In this example we have: $n=10$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.05$ significance is:
x <- c(10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, 9.8) test7 <- var.test(x=x, null.value=0.03, alternative='two.sided') test7 plot(test7, shade.col='firebrick1', col='firebrick1')
The objective in this type of problems is to test $H_0: \sigma_1^2 = \sigma_2^2$ against:
using the information of two random samples $X_1, X_2, \ldots, X_n$ and $Y_1, Y_2, \ldots, Y_n$, both from populations with normal distributions and with unknown variances $\sigma_x^2$,$\sigma_y^2$.
var_test
functionThe var_test
function can be used for students or instructors to solve textbook's problems in which the information is summarized with values. In the next examples we show how to use this function and it's utility.
On the basis of data reported in the article "Serum Ferritin in an Elderly Population" (J. of Gerontology, 1979: 521-524), the authors concluded that the ferritin distribution in the elderly had a smaller variance than in the younger adults. (Serum ferritin is used in diagnosing iron deficiency.) For a sample of 28 elderly men, the sample standard deviation of serum ferritin (mg/L) was 52.6; for 26 young men, the sample standard deviation was 84.2. Does this data support the conclusion as applied to men?
In this example we have: $n_x=28$, $s_x^2=2766.76$, $n_y=26$, $s_y^2=7089.64$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.01$ significance is:
test8 <- var_test(varx=2766.76, nx=28, vary=7089.64, ny=26, null.value = 1, alternative='less',conf.level=0.99) test8 plot(test8, shade.col='firebrick1', col='firebrick1')
var.test
functionThe var.test
function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.
An experiment was conducted to compare the alcohol content of soy sauce on two different production lines. Production was monitored eight times a day.The data are shown here:
Assume both populations are normal. It is suspected that production line 1 is not producing as consistently as production line 2 in terms of alcohol content. Test the hypothesis that $\sigma_1^2 = \sigma_2^2$ against the alternative that $\sigma_1^2 \neq \sigma_2^2$.
In this example we have: $n_1=8$, $n_2=8$. The objective is to test the next hypothesis:
The code to perform the test with $\alpha=0.05$ significance is:
line1 <- c(0.48, 0.39, 0.42, 0.52, 0.40, 0.48, 0.52, 0.52) line2 <- c(0.38, 0.37, 0.39, 0.41, 0.38, 0.39, 0.40, 0.39) test9 <- var.test(x=line1, y=line2, null.value = 1, alternative='two.sided') test9 plot(test9, shade.col='firebrick1', col='firebrick1')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.