In doomlab/learnSTATS: Introduction to Statistics (ANLY 500)

library(learnr)
library(learnSTATS)
library(faux)
data("ttest_data")
library(MOTE)
library(pwr)
library(reshape)

ttest_data <- na.omit(ttest_data)

ttest_data <- sim_df(ttest_data, #data frame
                     sample(50:100, 1), #how many of each group
                     between = "gender")

ttest_data$PAL_cell[ttest_data$PAL_cell > 100 | ttest_data$PAL_cell < 0] <- NA
ttest_data$PAL_acc[ttest_data$PAL_cell > 100 | ttest_data$PAL_acc < 0] <- NA

ttest_data <- na.omit(ttest_data)

ttest_data <- ttest_data[ , -1]

knitr::opts_chunk$set(echo = FALSE)

Comparing Two Means

The simplest experiment is two groups for your independent variable and a continuous dependent variable. This scenario can be analyzed with a t-test! In this section, we will go back to simple, special cases of regression. The learning objectives include:

What is t-test and how can we use it in data analytics?
What are independent and dependent t-tests?

Independent t-Test Video

The following videos are provided as a lecture for the class material. The lectures are provided here as part of the flow for the course. You can view the lecture notes within R using vignette("Comparing-Means", "learnSTATS"). You can skip these pages if you are in class to go on to the assignment.

Dependent t-Test Video

Exercises

In this next section, you will answer questions using the R code blocks provided. Be sure to use the solution option to see the answer if you need it!

Please enter your name for submission. If you do not need to submit, just type anything you'd like in this box.

question_text(
  "Student Name:",
  answer("Your Name", correct = TRUE),
  incorrect = "Thanks!",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Experiment Design

Title: Estimation of physical activity levels using cell phone questionnaires: A comparison with accelerometry for evaluation of between-subject and within-subject variations

Abstract: Physical activity promotes health and longevity. From a business perspective, healthier employees are more likely to report to work, miss less days, and cost less for health insurance. Your business wants to encourage healthy lifestyles in a cheap and affordable way through health care incentive programs. The use of telecommunication technologies such as cell phones is highly interesting in this respect. In an earlier report, we showed that physical activity level (PAL) assessed using a cell phone procedure agreed well with corresponding estimates obtained using the doubly labeled water method. However, our earlier study indicated high within-subject variation in relation to between-subject variations in PAL using cell phones, but we could not assess if this was a true variation of PAL or an artifact of the cell phone technique. Objective: Our objective was to compare within- and between-subject variations in PAL by means of cell phones with corresponding estimates using an accelerometer. In addition, we compared the agreement of daily PAL values obtained using the cell phone questionnaire with corresponding data obtained using an accelerometer.

Dataset

Gender: male and female subjects were examined in this experiment.
PAL_cell: average physical activity values for the cell phone accelerometer (range 0-100).
PAL_acc: average physical activity values for the hand held accelerometer (range 0-100).

library(learnSTATS)
data("ttest_data")
head(ttest_data)

head(ttest_data)

Independent t-Test

Our hypothesis is that there are are differences in gender for the cell phone measurement of physical activity level. In our data screening, we found that we had equal variances, so we can perform an independent t-test to assess this question.

First, let's calculate and save the mean, standard deviation, and length to be able to use those values to calculate effect size. Save these values as M, STDEV, and N.

M <- tapply(ttest_data$PAL_cell, ttest_data$gender, mean)
STDEV <- tapply(ttest_data$PAL_cell, ttest_data$gender, sd)
N <- tapply(ttest_data$PAL_cell, ttest_data$gender, length)

M;STDEV;N

Next, let's examine if there are group differences with our t-test.

M <- tapply(ttest_data$PAL_cell, ttest_data$gender, mean)
STDEV <- tapply(ttest_data$PAL_cell, ttest_data$gender, sd)
N <- tapply(ttest_data$PAL_cell, ttest_data$gender, length)

t.test(PAL_cell ~ gender, 
       data = ttest_data, 
       var.equal = TRUE,
       paired = FALSE)

question_text(
  "Is there a significant p < .05 difference in the means?",
  answer("Yes, p < .05", correct = TRUE),
  incorrect = "Yes, we would reject the null hypothesis.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Next, let's get the effect size for this difference in means using MOTE.

M <- tapply(ttest_data$PAL_cell, ttest_data$gender, mean)
STDEV <- tapply(ttest_data$PAL_cell, ttest_data$gender, sd)
N <- tapply(ttest_data$PAL_cell, ttest_data$gender, length)

library(MOTE)

library(MOTE)
effect <- d.ind.t(m1 = M[1], m2 = M[2],
        sd1 = STDEV[1], sd2 = STDEV[2],
        n1 = N[1], n2 = N[2], a = .05)

effect$d

question_text(
  "What size is the difference in means?",
  answer("Large", correct = TRUE),
  incorrect = "This effect size is large.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Last, we should figure out how many participants we might need if we replicated this study and expected the same effect size (although, it is quite large, so we may not get something that big again).

M <- tapply(ttest_data$PAL_cell, ttest_data$gender, mean)
STDEV <- tapply(ttest_data$PAL_cell, ttest_data$gender, sd)
N <- tapply(ttest_data$PAL_cell, ttest_data$gender, length)
effect <- d.ind.t(m1 = M[1], m2 = M[2],
        sd1 = STDEV[1], sd2 = STDEV[2],
        n1 = N[1], n2 = N[2], a = .05)

library(pwr)

pwr.t.test(n = NULL, 
           d = effect$d, 
           sig.level = .05,
           power = .80, 
           type = "two.sample", 
           alternative = "two.sided")

question_text(
  "How many participants would I need in this study?",
  answer("Eight", correct = TRUE),
  incorrect = "You would need four in each group, so eight total.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Dependent t-Test

Let's try those same steps again to examine the hypothesis that there are differences in the cell phone and hand held accelerometer results.

First, let's calculate and save the mean, standard deviation, and length to be able to use those values to calculate effect size. Save these values as M2, STDEV2, and N2. Don't forget! Repeated measures data should be restructured. You can rename the columns, but the solution shows the answer without this step.

library(reshape)
longdata <- melt(ttest_data, 
                id = c("gender"), 
                measured = c("PAL_cell", "PAL_acc"))
M2 <- tapply(longdata$value, longdata$variable, mean)
STDEV2 <- tapply(longdata$value, longdata$variable, sd)
N2 <- tapply(longdata$value, longdata$variable, length)

M2;STDEV2;N2

Are there group differences in the types of accelerometers?

longdata <- melt(ttest_data, 
                id = c("gender"), 
                measured = c("PAL_cell", "PAL_acc"))
M2 <- tapply(longdata$value, longdata$variable, mean)
STDEV2 <- tapply(longdata$value, longdata$variable, sd)
N2 <- tapply(longdata$value, longdata$variable, length)

t.test(value ~ variable,
       data = longdata,
       var.equal = TRUE, #you don't need this line
       paired = TRUE)

question_text(
  "Is there a significant p < .05 difference in the means?",
  answer("Yes, p < .05", correct = TRUE),
  incorrect = "Yes, we would reject the null hypothesis.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

There are several different effect sizes for dependent t. Let's use dependent t-test from averages to calculate the effect size (d.dep.t.avg).

longdata <- melt(ttest_data, 
                id = c("gender"), 
                measured = c("PAL_cell", "PAL_acc"))
M2 <- tapply(longdata$value, longdata$variable, mean)
STDEV2 <- tapply(longdata$value, longdata$variable, sd)
N2 <- tapply(longdata$value, longdata$variable, length)

effect2 <- d.dep.t.avg(m1 = M2[1], m2 = M2[2],
                 sd1 = STDEV2[1], sd2 = STDEV2[2],
                 n = N2[1], a = .05)

effect2$d

question_text(
  "What size is the difference in means?",
  answer("Small!", correct = TRUE),
  incorrect = "This effect size is small!",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

When we see a significant result, but a small effect, we know our sample size helped us have the power to find that small difference. How many participants did I actually need to find this small result?

longdata <- melt(ttest_data, 
                id = c("gender"), 
                measured = c("PAL_cell", "PAL_acc"))
M2 <- tapply(longdata$value, longdata$variable, mean)
STDEV2 <- tapply(longdata$value, longdata$variable, sd)
N2 <- tapply(longdata$value, longdata$variable, length)
effect2 <- d.dep.t.avg(m1 = M2[1], m2 = M2[2],
                 sd1 = STDEV2[1], sd2 = STDEV2[2],
                 n = N2[1], a = .05)

pwr.t.test(n = NULL, 
           d = effect2$d, 
           sig.level = .05,
           power = .80, 
           type = "paired", 
           alternative = "two.sided")

question_text(
  "How many participants would I need in this study?",
  answer("I would need over 100.", correct = TRUE),
  incorrect = "My results suggest 113 pairs, which is 113 people. Your simulation might be different!",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Submit

On this page, you will create the submission for your instructor (if necessary). Please copy this report and submit using a Word document or paste into the text window of your submission page. Click "Generate Submission" to get your work!