knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
knitr::opts_chunk$set(echo = TRUE) # Libraries necessary for this vignette library(rio) library(flextable) library(dplyr) library(tidyr) library(semanticprimeR) set.seed(84930)
The survival processing effect
Data provided by: Röer, Bell & Buchner (2013)
The data come from a conceptual replication study on the survival processing effect. The survival processing effect refers to the finding that rating words according to their relevance in a survival-related scenario leads to better retention than processing words in a number of other fictional scenarios. Participants were randomly assigned to one of the rating scenarios (survival, afterlife, moving). The to-be-rated words were presented individually in a random order on the computer screen. Each word remained on the screen for five seconds. Participants rated the words by clicking on a 5-point scale that ranged from completely useless (1) to very useful (5), which was displayed right below the word.
Participants were students at Heinrich Heine University Düsseldorf, Germany that were paid for participating or received course credit. Their ages ranged from 18 to 55 years. The words to-be-rated consisted of 30 typical members of 30 categories drawn from the updated Battig and Montague norms (Van Overschelde, Rawson, & Dunlosky, 2004).
Data included within this vignette. We drop the scenario column because the standard deviation and mean of item ratings across the scenarios were identical. We also add a participant column to keep this script similar to other ones.
DF <- import("data/roer_data.xlsx") drops <- c("Scenario") DF <- DF[ , !(names(DF) %in% drops)] DF <- cbind(Participant_Number = 1:nrow(DF) , DF) str(DF)
No official publication, see citation below.
Röer, J. P., Bell, R., & Buchner, A. (2013). Is the survival-processing memory advantage due to richness of encoding? Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1294-1302.
usefulness; survival processing; adaptive memory; richness of encoding
CC BY-NC
Düsseldorf, Germany
metadata <- import("data/roer_metadata.xlsx") flextable(metadata) %>% autofit()
DF_long <- pivot_longer(DF, cols = -c(Participant_Number)) %>% dplyr:: rename(item = name, score = value) flextable(head(DF_long)) %>% autofit()
What is the usual standard error for the data that could be considered for our stopping rule using the 40% decile?
# individual SEs SE <- tapply(DF_long$score, DF_long$item, function (x) { sd(x)/sqrt(length(x)) }) SE cutoff <- quantile(SE, probs = .40) cutoff # we could also use the cutoff score function in semanticprimeR cutoff_score <- calculate_cutoff(population = DF_long, grouping_items = "item", score = "score", minimum = min(DF_long$score), maximum = max(DF_long$score)) cutoff_score$cutoff
Using our 40% decile as a guide, we find that r round(cutoff, digits = 3)
is our target standard error for an accurately measured item.
To estimate the minimum sample size, we should figure out what number of participants it would take to achieve 80%, 85%, 90%, and 95% of the SEs for items below our critical score of r round(cutoff, digits = 3)
.
# sequence of sample sizes to try nsim <- 10 # small for cran samplesize_values <- seq(20, 500, 5) # create a blank table for us to save the values in sim_table <- matrix(NA, nrow = length(samplesize_values)*nsim, ncol = length(unique(DF_long$item))) # make it a data frame sim_table <- as.data.frame(sim_table) # add a place for sample size values sim_table$sample_size <- NA iterate <- 1 for (p in 1:nsim){ # loop over sample sizes for (i in 1:length(samplesize_values)){ # temp dataframe that samples and summarizes temp <- DF_long %>% group_by(item) %>% sample_n(samplesize_values[i], replace = T) %>% summarize(se = sd(score)/sqrt(length(score))) colnames(sim_table)[1:length(unique(DF_long$item))] <- temp$item sim_table[iterate, 1:length(unique(DF_long$item))] <- temp$se sim_table[iterate, "sample_size"] <- samplesize_values[i] sim_table[iterate, "nsim"] <- p iterate <- iterate + 1 } } final_sample <- sim_table %>% pivot_longer(cols = -c(sample_size, nsim)) %>% dplyr::rename(item = name, se = value) %>% group_by(sample_size, nsim) %>% summarize(percent_below = sum(se <= cutoff)/length(unique(DF_long$item))) %>% ungroup() %>% # then summarize all down averaging percents dplyr::group_by(sample_size) %>% summarize(percent_below = mean(percent_below)) %>% dplyr::arrange(percent_below) %>% ungroup() flextable(final_sample %>% head()) %>% autofit()
final_table <- calculate_correction( proportion_summary = final_sample, pilot_sample_size = DF_long %>% group_by(item) %>% summarize(sample_size = n()) %>% ungroup() %>% summarize(avg_sample = mean(sample_size)) %>% pull(avg_sample), proportion_variability = cutoff_score$prop_var ) flextable(final_table) %>% autofit()
Based on these simulations, we can decide our minimum sample size is likely close to r final_table %>% arrange(percent_below, corrected_sample_size) %>% filter(percent_below >= 80) %>% slice_head() %>% pull(corrected_sample_size) %>% round()
.
In this example, we could set our maximum sample size for 90% of items, which would equate to r final_table %>% arrange(percent_below, corrected_sample_size) %>% filter(percent_below >= 90) %>% slice_head() %>% pull(corrected_sample_size) %>% round()
participants.
We should consider any data loss or other issues related to survival processing when thinking about the final sample size requirements.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.