In doomlab/learnSTATS: Introduction to Statistics (ANLY 500)

library(learnr)
library(learnSTATS)
library(faux)
library(MOTE)
library(pwr)
library(ez)
library(ggplot2)
cleanup <- theme(panel.grid.major = element_blank(), 
                panel.grid.minor = element_blank(), 
                panel.background = element_blank(), 
                axis.line.x = element_line(color = "black"),
                axis.line.y = element_line(color = "black"),
                legend.key = element_rect(fill = "white"),
                text = element_text(size = 15))

##simulated data
between <- list(participant_type = c(advanced = "Advanced", 
                                     experienced = "Experienced", 
                                     intermediate = "Intermediate",
                                     novice = "Novice",
                                     postgraduate = "Postgraduate"))

##pattern is money, then time, so keep the numbers larger in the first one
mu <- data.frame(
  advanced = 85,
  experienced = 75,
  intermediate = 72, 
  novice = 65,
  postgraduate = 80
  )

anova_data <- sim_design(between = between, 
                          n = sample(50:90, 1), 
                          mu = mu, 
                          sd = 10, r = .2,
                          interactive = FALSE,
                          empirical = FALSE, 
                          plot = FALSE)

colnames(anova_data)[3] <- "competence"

anova_data <- anova_data[ , -1]

anova_data$participant_type <- factor(anova_data$participant_type, 
                                     levels = c("novice", "intermediate", "advanced", "postgraduate", "experienced"))

knitr::opts_chunk$set(echo = FALSE)

Analysis of Variance

The ANalysis Of VAriance is our last section this semester. ANOVAs are designed for understanding categorical variables predicting continuous dependent variables. We will extend your knowledge from simple two-groups in the t-test to multiple groups in one-way between subjects ANOVAs. The learning outcomes are:

Getting started with the Analysis of Variance
Understanding variance versus t-tests
One-Way Between Subjects

ANOVA Video Part 1

The following videos are provided as a lecture for the class material. The lectures are provided here as part of the flow for the course. You can view the lecture notes within R using vignette("ANOVA", "learnSTATS"). You can skip these pages if you are in class to go on to the assignment.

ANOVA Video Part 2

ANOVA Video Part 3

Exercises

In this next section, you will answer questions using the R code blocks provided. Be sure to use the solution option to see the answer if you need it!

Please enter your name for submission. If you do not need to submit, just type anything you'd like in this box.

question_text(
  "Student Name:",
  answer("Your Name", correct = TRUE),
  incorrect = "Thanks!",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Experiment Design

Abstract: How do university training and subsequent practical experience affect expertise in data science? To answer this question we developed methods to assess data science knowledge and the competence to formulate answers, construct code to problem solve, and create reports of outcomes. In the cross-sectional study, undergraduate students, trainees in a certified postgraduate data science curriculum, and data scientists with more than 10 years of experience were tested (novice, intermediate, and advanced university students, postgraduate trainees, and experienced data scientists). We discuss the results against the background of expertise research and the training of data scientist. Important factors for the continuing professional development of data scientists are proposed.

Dataset

Participant type: novice students, intermediate students, advanced university students, postgraduate trainees, and experienced data scientists
Competence: an average score of data science knowledge and competence based on a knowledge test and short case studies.

library(learnSTATS)
data("anova_data")
head(anova_data)

head(anova_data)

Hypothesis Testing

Let's examine if there are any group differences in our data scientists using the ezANOVA function. Remember to first add a participant number!

anova_data$partno <- 1:nrow(anova_data)
ezANOVA(data = anova_data,
        dv = competence,
        between = participant_type,
        wid = partno,
        type = 3, 
        detailed = T)

question_text(
  "Do you meet the assumption for homogeneity?",
  answer("Yes!", correct = TRUE),
  incorrect = "Yes, because p is greater than .001 - our cut off for data screening.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

question_text(
  "Is there a significant difference between groups?",
  answer("Yes!", correct = TRUE),
  incorrect = "Yes, the p value for our F test is less than .05, our alpha for hypothesis testing.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Power

The effect size for this study is really large. Let's say we expect this study to replicate at only half that effect size. How many participants do I need to power that study?

library(pwr)
#calculate f_eta

#calculate power

library(pwr)
#calculate f_eta my eta was .3 so .3/2
f_eta <- sqrt(.15 / (1-.15))
#calculate power
pwr.anova.test(k = 5, n = NULL, f = f_eta,
               sig.level = .05, power = .80)

question_text(
  "How many participants do you need?",
  answer("15 times 5", correct = TRUE),
  incorrect = "You need 15ish participants times 5 groups, so 75 total.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Post Hoc Test

Run a post hoc independent t-test with no correction and a Bonferroni correction. Remember, for a real analysis, you would only run one type of post hoc. This question should show you how each post hoc corrects for type 1 error by changing the p-values. We will make a graph in a minute to help us understand these results.

pairwise.t.test(anova_data$competence,
                anova_data$participant_type,
                p.adjust.method = "none", 
                paired = F, 
                var.equal = T)

pairwise.t.test(anova_data$competence,
                anova_data$participant_type,
                p.adjust.method = "bonferroni", 
                paired = F, 
                var.equal = T)

question_text(
  "What do you see that the correction is doing?",
  answer("Any good answer.", correct = TRUE),
  incorrect = "There are a lot of right answers here. But you should see that the p-values are larger with the correction because they are fixed for running so many tests.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Trend Analysis

Participant type is coded in a slightly continuous manner - each group represents more experience and time spent in the field. You should always check that your levels are in order before coding this analysis. An example of that code is provided, and these levels should be in order.

Add in the code for the trend analysis.

levels(anova_data$participant_type)

levels(anova_data$participant_type)
k = 5
anova_data$part <- anova_data$participant_type
contrasts(anova_data$part) <- contr.poly(k)
output2 <- aov(competence ~ part, data = anova_data)
summary.lm(output2)

question_text(
  "Is there a significant trend?",
  answer("Yes, quartic.", correct = TRUE),
  incorrect = "Yes, but it's a quartic trend - let's make a graph to see what it looks like.",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Graph the Groups

In order to understand our trend analysis, let's try making a graph of the results. Use the original participant_type column to make this graph.

Once you have the graph, you should see that this trend may not make sense, even though significant - it actually appears that the data is curvilinear (quadratic) - it peaks at advanced and then decreases.

bargraph <- ggplot(anova_data, aes(participant_type, competence))
bargraph +
  theme_classic() + #cleanup is loaded too 
  stat_summary(fun = mean,
               geom = "bar", 
               fill = "white", 
               color = "black") +
  stat_summary(fun.data = mean_cl_normal,
               geom = "errorbar",
               position = "dodge",
               width = .2) +
  xlab("Level of Study of Participant") +
  ylab("Average Competence Rating") +
  coord_cartesian(ylim = c(0,100)) +
  scale_x_discrete(labels = c("Novice", "Intermediate", "Advanced", "Post-Graduate", "Experienced"))

Submit

On this page, you will create the submission for your instructor (if necessary). Please copy this report and submit using a Word document or paste into the text window of your submission page. Click "Generate Submission" to get your work!

encoder_logic()

encoder_ui()

doomlab/learnSTATS documentation built on June 9, 2022, 12:54 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

doomlab/learnSTATS
Introduction to Statistics (ANLY 500)

In doomlab/learnSTATS: Introduction to Statistics (ANLY 500)

Analysis of Variance

ANOVA Video Part 1

ANOVA Video Part 2

ANOVA Video Part 3

Exercises

Experiment Design

Dataset

Hypothesis Testing

Power

Post Hoc Test

Trend Analysis

Graph the Groups

Submit

R Package Documentation

Browse R Packages

We want your feedback!

doomlab/learnSTATS Introduction to Statistics (ANLY 500)

In doomlab/learnSTATS: Introduction to Statistics (ANLY 500)

Analysis of Variance

ANOVA Video Part 1

ANOVA Video Part 2

ANOVA Video Part 3

Exercises

Experiment Design

Dataset

Hypothesis Testing

Power

Post Hoc Test

Trend Analysis

Graph the Groups

Submit

R Package Documentation

Browse R Packages

We want your feedback!

doomlab/learnSTATS
Introduction to Statistics (ANLY 500)