Use Case"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(sprtt)

Note

Overview

The sprtt package is a sequential probability ratio tests toolbox (sprtt). This vignette describes an exemplary use case to improve the understanding of the package and the sequential t-test.

Other recommended vignettes cover:

Use Case

A team of researchers wants to know if the stress of the employees increased in the company itFlow AG over the last year. The Human Resources department has gathered data from a new self-care tool they implemented for the employees. It is difficult to predict how many employees will participate at the second measurement point (one year after the baseline). The team suggests using a sequential t-test instead of a Students t-test, because the sequential procedure can reduce the required sample size and is stopped right after the decision for one of the hypotheses is made. Furthermore, the sequential t-test is able to gather evidence for both the alternative and the null hypothesis.

Hypothesis

The researchers know that the company has received more orders than in the year before. Thus, they expect an increase in self-reported stress.

Data Analysis

The parameters of the sequential t-test are specified as follows:

The HR department receives the new data piece by piece and passes it directly on to the researchers. The test is performed for the first time and starts with the first two data points.

# first data from the Human Resources department ---
# current sample size
n_person <- 2

# get data
df <- df_stress[1:n_person,]

# print data
df

# sequential t-test
results <- seq_ttest(df$one_year_stress,
                     df$baseline_stress,
                     alpha = alpha,
                     power = power,
                     d = d,
                     paired = paired,
                     alternative = alternative,
                     verbose = FALSE)

# print results: console output
results

The decision from the first test is:

results@decision

As a result, the researchers take one more data point and run the test again.

# new data from the Human Resources department ---
# get one more person
n_person <- n_person + 1
df <- df_stress[1:n_person,]

# print new data
df

# sequential t-test
results <- seq_ttest(df$one_year_stress,
                     df$baseline_stress,
                     alpha = alpha,
                     power = power,
                     d = d,
                     paired = paired,
                     alternative = alternative,
                     verbose = FALSE)

# print results
results@decision

This process is repeated until the decision is made for one of the two hypotheses. To simulate this sequential process, a while-loop embraces the sequential t-test function in the code below, which will not stop until one of the hypotheses is accepted or the maximum of the data is reached.

# define the starting point
decision <- "continue sampling"
n_person <- 3

# simulation of the sequential procedure
while(decision == "continue sampling") {
  # get the current data
  df <- df_stress[1:n_person,]

  # run the sequential test and save the results
  results <- seq_ttest(df$one_year_stress,
                       df$baseline_stress,
                       alpha = alpha,
                       power = power,
                       d = d,
                       paired = paired,
                       alternative = alternative)
  # save the current desicion
  decision <- results@decision

  # add a new person
  n_person <- n_person + 1

  # break if the maximum of the data is reached
  if (n_person > nrow(df_stress)) {
    break
  }
}

# console output
results

The while-loop comes to an end after r n_person - 1 data points.

Report Results

# Required results for the report

# likelihood ratio (LR)
LR <- round(results@likelihood_ratio, digits = 2)
LR

# sample size (N) = degrees of freedom +2 (two-samples) or +1 (one-sample & paired)
N <- results@df + 1
N

# baseline stress (M and SD)
mean_t1 <- round(mean(df$baseline_stress), digits = 2)
mean_t1
sd_t1 <- round(sd(df$baseline_stress), digits = 2)
sd_t1

# after one year stress (M and SD)
mean_t2 <- round(mean(df$one_year_stress), digits = 2)
mean_t2
sd_t2 <- round(sd(df$one_year_stress), digits = 2)
sd_t2

# NOT INCLUDED IN THE PACKAGE 

# calculate effect size: Cohen´s d
d_results <- effsize::cohen.d(df$one_year_stress,
                 df$baseline_stress,
                 paired = TRUE)
d <- round(d_results$estimate, digits = 2)
d

# confidence intervall
d_lower <- round(d_results$conf.int[[1]], digits = 2)
d_lower
d_upper <- round(d_results$conf.int[[2]], digits = 2)
d_upper

Starting at N = 2, the test stops sampling at a total sample size of N = r N with LR~r N~ = r LR. This ratio indicates that the data are about 20 times more likely under H~1~ than under H~0~. Thus, we accept the alternative hypothesis: The perceived stress at the second measurement (M = r mean_t2, SD = r sd_t2) is higher than one year ago at the baseline measurement (M = r mean_t1, SD = r sd_t1), Cohen`s d = r d, 95% CI [r d_lower, r d_upper].[^1]^,^[^2]

References

[^1]: The citation style was taken from @schnuerch2020a.

[^2]: Note that this estimate of Cohen's d as well as the CI are based on the assumption of a fixed sample size and, thus, might be biased toward an overestimation of the true effect size. See for details @schnuerch2020a.



Try the sprtt package in your browser

Any scripts or data that you put into this service are public.

sprtt documentation built on July 9, 2023, 6:14 p.m.