library(learnr) library(learnSTATS) library(faux) library(MOTE) library(pwr) library(ez) library(ggplot2) cleanup <- theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line.x = element_line(color = "black"), axis.line.y = element_line(color = "black"), legend.key = element_rect(fill = "white"), text = element_text(size = 15)) ##simulated data between <- list(participant_type = c(advanced = "Advanced", experienced = "Experienced", intermediate = "Intermediate", novice = "Novice", postgraduate = "Postgraduate")) ##pattern is money, then time, so keep the numbers larger in the first one mu <- data.frame( advanced = 85, experienced = 75, intermediate = 72, novice = 65, postgraduate = 80 ) anova_data <- sim_design(between = between, n = sample(50:90, 1), mu = mu, sd = 10, r = .2, interactive = FALSE, empirical = FALSE, plot = FALSE) colnames(anova_data)[3] <- "competence" anova_data <- anova_data[ , -1] anova_data$participant_type <- factor(anova_data$participant_type, levels = c("novice", "intermediate", "advanced", "postgraduate", "experienced")) knitr::opts_chunk$set(echo = FALSE)
The ANalysis Of VAriance is our last section this semester. ANOVAs are designed for understanding categorical variables predicting continuous dependent variables. We will extend your knowledge from simple two-groups in the t-test to multiple groups in one-way between subjects ANOVAs. The learning outcomes are:
The following videos are provided as a lecture for the class material. The lectures are provided here as part of the flow for the course. You can view the lecture notes within R using vignette("ANOVA", "learnSTATS")
. You can skip these pages if you are in class to go on to the assignment.
In this next section, you will answer questions using the R code blocks provided. Be sure to use the solution
option to see the answer if you need it!
Please enter your name for submission. If you do not need to submit, just type anything you'd like in this box.
question_text( "Student Name:", answer("Your Name", correct = TRUE), incorrect = "Thanks!", try_again_button = "Modify your answer", allow_retry = TRUE )
Abstract: How do university training and subsequent practical experience affect expertise in data science? To answer this question we developed methods to assess data science knowledge and the competence to formulate answers, construct code to problem solve, and create reports of outcomes. In the cross-sectional study, undergraduate students, trainees in a certified postgraduate data science curriculum, and data scientists with more than 10 years of experience were tested (novice, intermediate, and advanced university students, postgraduate trainees, and experienced data scientists). We discuss the results against the background of expertise research and the training of data scientist. Important factors for the continuing professional development of data scientists are proposed.
library(learnSTATS) data("anova_data") head(anova_data)
head(anova_data)
Let's examine if there are any group differences in our data scientists using the ezANOVA
function. Remember to first add a participant number!
anova_data$partno <- 1:nrow(anova_data) ezANOVA(data = anova_data, dv = competence, between = participant_type, wid = partno, type = 3, detailed = T)
question_text( "Do you meet the assumption for homogeneity?", answer("Yes!", correct = TRUE), incorrect = "Yes, because p is greater than .001 - our cut off for data screening.", try_again_button = "Modify your answer", allow_retry = TRUE )
question_text( "Is there a significant difference between groups?", answer("Yes!", correct = TRUE), incorrect = "Yes, the p value for our F test is less than .05, our alpha for hypothesis testing.", try_again_button = "Modify your answer", allow_retry = TRUE )
The effect size for this study is really large. Let's say we expect this study to replicate at only half that effect size. How many participants do I need to power that study?
library(pwr) #calculate f_eta #calculate power
library(pwr) #calculate f_eta my eta was .3 so .3/2 f_eta <- sqrt(.15 / (1-.15)) #calculate power pwr.anova.test(k = 5, n = NULL, f = f_eta, sig.level = .05, power = .80)
question_text( "How many participants do you need?", answer("15 times 5", correct = TRUE), incorrect = "You need 15ish participants times 5 groups, so 75 total.", try_again_button = "Modify your answer", allow_retry = TRUE )
Run a post hoc independent t-test with no correction and a Bonferroni correction. Remember, for a real analysis, you would only run one type of post hoc. This question should show you how each post hoc corrects for type 1 error by changing the p-values. We will make a graph in a minute to help us understand these results.
pairwise.t.test(anova_data$competence, anova_data$participant_type, p.adjust.method = "none", paired = F, var.equal = T) pairwise.t.test(anova_data$competence, anova_data$participant_type, p.adjust.method = "bonferroni", paired = F, var.equal = T)
question_text( "What do you see that the correction is doing?", answer("Any good answer.", correct = TRUE), incorrect = "There are a lot of right answers here. But you should see that the p-values are larger with the correction because they are fixed for running so many tests.", try_again_button = "Modify your answer", allow_retry = TRUE )
Participant type is coded in a slightly continuous manner - each group represents more experience and time spent in the field. You should always check that your levels
are in order before coding this analysis. An example of that code is provided, and these levels should be in order.
Add in the code for the trend analysis.
levels(anova_data$participant_type)
levels(anova_data$participant_type) k = 5 anova_data$part <- anova_data$participant_type contrasts(anova_data$part) <- contr.poly(k) output2 <- aov(competence ~ part, data = anova_data) summary.lm(output2)
question_text( "Is there a significant trend?", answer("Yes, quartic.", correct = TRUE), incorrect = "Yes, but it's a quartic trend - let's make a graph to see what it looks like.", try_again_button = "Modify your answer", allow_retry = TRUE )
In order to understand our trend analysis, let's try making a graph of the results. Use the original participant_type
column to make this graph.
Once you have the graph, you should see that this trend may not make sense, even though significant - it actually appears that the data is curvilinear (quadratic) - it peaks at advanced and then decreases.
bargraph <- ggplot(anova_data, aes(participant_type, competence)) bargraph + theme_classic() + #cleanup is loaded too stat_summary(fun = mean, geom = "bar", fill = "white", color = "black") + stat_summary(fun.data = mean_cl_normal, geom = "errorbar", position = "dodge", width = .2) + xlab("Level of Study of Participant") + ylab("Average Competence Rating") + coord_cartesian(ylim = c(0,100)) + scale_x_discrete(labels = c("Novice", "Intermediate", "Advanced", "Post-Graduate", "Experienced"))
On this page, you will create the submission for your instructor (if necessary). Please copy this report and submit using a Word document or paste into the text window of your submission page. Click "Generate Submission" to get your work!
encoder_logic()
encoder_ui()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.