knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "##"
)


1) Introduction

In this vignette we explain how to use the functions available in the stests package to hypothesis testing. Additionally, we compare our functions with the built-in functions t.test and var.test from stats package.

The functions covered in this vignette are:


2) stests package

Any user can download the stests package from GitHub using the next code:

if (!require('devtools')) install.packages('devtools')
devtools::install_github('fhernanb/stests', force=TRUE)

Next, the package must be loaded into the current session using:

library(stests)

2.1) z-test for population mean $\mu$

The objective in this type of problems is to test $H_0: \mu = \mu_0$ against:

using the information of a random sample $X_1, X_2, \ldots, X_n$ from a Normal population with known variance $\sigma^2$.


2.1.1) z-test using z_test function

The z_test function can be used for students or instructors to solve textbook's problems in which the information is summarized with values. In the next examples we show how to use this function and it's utility.

Example 10.3 (Walpole, 2012)

A random sample of 100 recorded deaths in the United States during the past year showed an average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that the mean life span today is greater than 70 years? Use a 0.05 level of significance.


In this example we have: $n=100$, $\bar{x}=71.8$, $\sigma^2=79.21$. The objective is to test the next hypothesis:


$H_0: \mu = 70$ vs $H_a: \mu > 70$

Where $\mu$ representes the True average life span today (years).

The code to perform the test with $\alpha=0.05$ significance is:

test1 <- z_test(meanx=71.8, nx=100, sigma2=79.21, mu=70, alternative='greater')
test1

We can observe that the output provides a p-value and a confidence interval for the population mean that will help the user in the decision making of the hypothesis test.

If we want to depict the p-value for the test, we could save the result in an object and then use the plot.htest function as follows:

plot(test1, shade.col='firebrick1', col='firebrick1')


2.1.2) z-test using z.test function

The z.test function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.

Example 8.8 (Devore, 2016)

A dynamic cone penetrometer (DCP) is used for measuring material resistance to penetration (mm/blow) as a cone is driven into pavement or subgrade. Suppose that for a particular application it is required that the true average DCP value for a certain type of pavement be less than 30. The pavement will not be used unless there is conclusive evidence that the specification has been met. Let's state and test the appropriate hypotheses using the following data:


14.1, 14.5, 15.5, 16.0, 16.0, 16.7, 16.9, 17.1, 17.5, 17.8, 17.8, 18.1, 18.2, 18.3, 18.3, 19.0, 19.2, 19.4, 20.0, 20.0, 20.8, 20.8, 21.0, 21.5, 23.5, 27.5, 27.5, 28.0, 28.3, 30.0, 30.0, 31.6, 31.7, 31.7, 32.5, 33.5, 33.9, 35.0, 35.0, 35.0, 36.7, 40.0, 40.0, 41.3, 41.7, 47.5, 50.0, 51.0, 51.8, 54.4, 55.0, 57.0


In this example we have: $n=52$. The objective is to test the next hypothesis:

$H_0: \mu = 30$ vs $H_a: \mu < 30$

where $\mu$ represents the True average DCP value for a certain type of pavement.

In this example we don't know if the random sample comes from a normal population, but as $n=52$ and $\sigma^2$ is unkown, we can use the z-test with $\bar{x}=28.76$, $s^2=150.422$.


The code to perform the test with $\alpha=0.05$ significance is:

x<-c(14.1, 14.5, 15.5, 16.0, 16.0, 16.7, 16.9, 17.1, 17.5, 17.8, 17.8, 
     18.1, 18.2, 18.3, 18.3, 19.0, 19.2, 19.4, 20.0, 20.0, 20.8, 20.8, 
     21.0,21.5, 23.5, 27.5, 27.5, 28.0, 28.3, 30.0, 30.0, 31.6, 31.7, 
     31.7, 32.5, 33.5, 33.9, 35.0, 35.0, 35.0, 36.7, 40.0, 40.0, 41.3, 
     41.7, 47.5, 50.0, 51.0, 51.8, 54.4, 55.0, 57.0)
test2 <- z.test(x=x, sigma2=var(x), mu=30, alternative='less')
test2
plot(test2, shade.col='firebrick1', col='firebrick1')


2.2) t-test for population mean $\mu$ and for the difference in means $\mu_1-\mu_2$

The objective in this type of problems is to test $H_0: \mu = \mu_0$ against:

using the information of a random sample $X_1, X_2, \ldots, X_n$ from a Normal population with unknown variance $\sigma^2$.


Also, we can test $H_0: \mu_x-\mu_y = \delta_0$ against:

using the information of two random samples $X_1, X_2, \ldots, X_n$ and $Y_1, Y_2, \ldots, Y_n$, both from populations with normal distributions and with unknown variances $\sigma_1^2$,$\sigma_2^2$.


2.2.1) t-test using t_test function

The t_test function can be used for students or instructors to solve textbook's problems in which the information is summarized with values. In the next examples we show how to use this function and it's utility.

Excercise 10.29 (Walpole, 2012): test on a single sample

Past experience indicates that the time required for high school seniors to complete a standardized test is a normal random variable with a mean of 35 minutes. If a random sample of 20 high school seniors took an average of 33.1 minutes to complete this test with a standard deviation of 4.3 minutes, test the hypothesis, at the 0.05 level of significance, that $\mu=35$ minutes against the alternative that $\mu<35$ minutes.


In this example we have: $n=20$, $\bar{x}=33.1$, $s^2=18.49$. The objective is to test the next hypothesis:


$H_0: \mu = 35$ vs $H_a: \mu < 35$


$\mu=$ True average time required for high school seniors to complete a standardized test (minutes)


The code to perform the test with $\alpha=0.05$ significance is:

test3 <- t_test(meanx=33.1, varx=18.49, nx=20, mu=35, alternative='less')
test3
plot(test3, shade.col='firebrick1', col='firebrick1')


Example 10.6 (Walpole, 2012): test on two samples with unknown but equal variances

An experiment was performed to compare the abrasive wear of two different laminated materials. Twelve pieces of material 1 were tested by exposing each piece to a machine measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear was observed. The samples of material 1 gave an average (coded) wear of 85 units with a sample standard deviation of 4, while the samples of material 2 gave an average of 81 with a sample standard deviation of 5. Can we conclude at the 0.05 level of significance that the abrasive wear of material 1 exceeds that of material 2 by more than 2 units? Assume the populations to be approximately normal with equal variances.


In this example we have: $n_1=12$, $\bar{x}_1=85$, $s_1^2=16$, $n_2=10$, $\bar{x}_2=81$, $s_2^2=25$. The objective is to test the next hypothesis:


$H_0: \mu_1-\mu_2 = 2$ vs $H_a: \mu_1-\mu_2 > 2$


$\mu_1=$ True average abrasive wear of material 1 (units)
$\mu_2=$ True average abrasive wear of material 2 (units)


The code to perform the test with $\alpha=0.05$ significance is:

test4 <- t_test(meanx=85, varx=16, nx=12,
                meany=81, vary=25, ny=10,
                alternative='greater', var.equal=TRUE, mu=2)
test4
plot(test4, shade.col='firebrick1', col='firebrick1')


2.2.2) t-test using t.test function

The t.test function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.

Example 8.9 (Devore, 2016)

Glycerol is a major by-product of ethanol fermentation in wine production and contributes to the sweetness, body, and fullness of wines. The article "A Rapid and Simple Method for Simultaneous Determination of Glycerol, Fructose, and Glucose in Wine" (American J. of Enology and Viticulture, 2007: 279-283) includes the following observations on glycerol concentration (mg/mL) for samples of standard-quality (uncertified) white wines:


2.67, 4.62, 4.14, 3.81, 3.83


Does the sample data suggest that true average concentration is something other than the desired value? (Assume that the population distribution of glycerol concentration is normal)


In this example we have: $n=5$. The objective is to test the next hypothesis:


$H_0: \mu = 4$ vs $H_a: \mu \neq 4$


$\mu=$ True average glycerol concentration (mg/mL)


The code to perform the test with $\alpha=0.05$ significance is:

x <- c(2.67, 4.62, 4.14, 3.81, 3.83)
test5 <- t.test(x=x, sigma2=var(x), mu=4, alternative='two.sided')
test5
plot(test5, shade.col='firebrick1', col='firebrick1')


2.3) $\chi^2$-test for population variance $\sigma^2$

The objective in this type of problems is to test $H_0: \sigma^2 = \sigma^2_0$ against:

using the information of a random sample $X_1, X_2, \ldots, X_n$ from a population with normal distribution.


2.3.1) $\chi^2$-test using var_test function

The var_test function can be used to solve textbook's problems in which the information is summarized with values. In next examples we show the utility of this function.

Example 7.7.1 (Wayne & Chad, 2013)

The purpose of a study by Wilkins et al. (A-28) was to measure the effectiveness of recombinant human growth hormone (rhGH) on children with total body surface area burns > 40 percent. In this study, 16 subjects received daily injections at home of rhGH. At baseline, the researchers wanted to know the current levels of insulin-like growth factor (IGF-I) prior to administration of rhGH. The sample variance of IGF-I levels (in ng/ml) was 670.81. We wish to know if we may conclude from these data that the population variance is not 600.


In this example we have: $n=16$, $s^2=670.81$. The objective is to test the next hypothesis:


$H_0: \sigma^2 = 600$ vs $H_a: \sigma^2 \neq 600$


$\sigma^2=$ True population variance of IGF-I levels


The code to perform the test with $\alpha=0.05$ significance is:

test6 <- var_test(varx=670.81, nx=16, null.value=600, alternative='two.sided')
test6
plot(test6, shade.col='firebrick1', col='firebrick1')


2.3.2) $\chi^2$-test using var.test function

The var.test function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.

Exercise 10.67 (Walpole, 2012)

The content of containers of a particular lubricant is known to be normally distributed with a variance of 0.03 liter. Test the hypothesis that $\sigma^2 = 0.03$ against the alternative that $\sigma^2 \neq 0.03$ for the random sample of 10 containers:


10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, 9.8


In this example we have: $n=10$. The objective is to test the next hypothesis:


$H_0: \sigma^2 = 0.03$ vs $H_a: \sigma^2 \neq 0.03$


$\sigma^2=$ True population variance of container content of a particular lubricant


The code to perform the test with $\alpha=0.05$ significance is:

x <- c(10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, 9.8)
test7 <- var.test(x=x, null.value=0.03, alternative='two.sided')
test7
plot(test7, shade.col='firebrick1', col='firebrick1')


2.4) F-test for equality of variances

The objective in this type of problems is to test $H_0: \sigma_1^2 = \sigma_2^2$ against:

using the information of two random samples $X_1, X_2, \ldots, X_n$ and $Y_1, Y_2, \ldots, Y_n$, both from populations with normal distributions and with unknown variances $\sigma_x^2$,$\sigma_y^2$.


2.4.1) F-test using var_test function

The var_test function can be used for students or instructors to solve textbook's problems in which the information is summarized with values. In the next examples we show how to use this function and it's utility.

Example 9.14 (Devore, 2016)

On the basis of data reported in the article "Serum Ferritin in an Elderly Population" (J. of Gerontology, 1979: 521-524), the authors concluded that the ferritin distribution in the elderly had a smaller variance than in the younger adults. (Serum ferritin is used in diagnosing iron deficiency.) For a sample of 28 elderly men, the sample standard deviation of serum ferritin (mg/L) was 52.6; for 26 young men, the sample standard deviation was 84.2. Does this data support the conclusion as applied to men?


In this example we have: $n_x=28$, $s_x^2=2766.76$, $n_y=26$, $s_y^2=7089.64$. The objective is to test the next hypothesis:

$H_0: \sigma_x^2 = \sigma_y^2$ vs $H_a: \sigma_x^2 < \sigma_y^2$


$\sigma_x^2=$ True population variance of serum ferritin in elderly men
$\sigma_y^2=$ True population variance of serum ferritin in young men


The code to perform the test with $\alpha=0.01$ significance is:

test8 <- var_test(varx=2766.76, nx=28, vary=7089.64, ny=26, 
                  null.value = 1, alternative='less',conf.level=0.99)
test8
plot(test8, shade.col='firebrick1', col='firebrick1')


2.4.2) F-test using var.test function

The var.test function can be used to solve problems in which we have the raw data as a vector. In the next examples we show how to use this function.

Exercise 10.77 (Walpole, 2012)

An experiment was conducted to compare the alcohol content of soy sauce on two different production lines. Production was monitored eight times a day.The data are shown here:


Production line 1: 0.48, 0.39, 0.42, 0.52, 0.40, 0.48, 0.52, 0.52

Production line 2: 0.38, 0.37, 0.39, 0.41, 0.38, 0.39, 0.40, 0.39


Assume both populations are normal. It is suspected that production line 1 is not producing as consistently as production line 2 in terms of alcohol content. Test the hypothesis that $\sigma_1^2 = \sigma_2^2$ against the alternative that $\sigma_1^2 \neq \sigma_2^2$.


In this example we have: $n_1=8$, $n_2=8$. The objective is to test the next hypothesis:

$H_0: \sigma_1^2 = \sigma_2^2$ vs $H_a: \sigma_1^2 \neq \sigma_2^2$


$\sigma_1^2=$ True population variance of alcohol content in Production line 1
$\sigma_2^2=$ True population variance of alcohol content in Production line 2


The code to perform the test with $\alpha=0.05$ significance is:

line1 <- c(0.48, 0.39, 0.42, 0.52, 0.40, 0.48, 0.52, 0.52)
line2 <- c(0.38, 0.37, 0.39, 0.41, 0.38, 0.39, 0.40, 0.39)
test9 <- var.test(x=line1, y=line2, null.value = 1, alternative='two.sided')
test9
plot(test9, shade.col='firebrick1', col='firebrick1')


3) References




fhernanb/stests documentation built on March 29, 2025, 10:36 a.m.