knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Vignette Setup:

knitr::opts_chunk$set(echo = TRUE)

# Set a random seed
set.seed(43043423)
library(rio)
library(flextable)
library(dplyr)
library(tidyr)
library(psych)
library(rio)
library(ggplot2)
library(reshape)
library(semanticprimeR)

Project/Data Title:

Emerging Adulthood Measured at Multiple Institutions 2: The Next Generation (EAMMi2)

Data provided by: Joe McFall

Project/Data Description:

Collaborators from 32 academic institutions primarily in the United States collected data from emerging adults (Nraw = 4220, Nprocessed = 3134). Participants completed self-report measures assessing markers of adulthood, IDEA inventory of dimensions of emerging adulthood, subjective well-being, mindfulness, belonging, self-efficacy, disability identity, somatic health, perceived stress, perceived social support, social media use, political affiliation, beliefs about the American dream, interpersonal transgressions, narcissism, interpersonal exploitativeness, beliefs about marriage, and demographics.

Methods Description:

Project organizers recruited contributors through social media (Facebook & Twitter) and listserv invitations (Society of Personality and Social Psychology, Society of Teaching Psychology).

Data Location:

https://osf.io/qtqpb/

EAMMi2<- import("data/mcfall_data.sav.zip") %>% 
  select(starts_with("moa1#"), starts_with("moa2#"))
str(EAMMi2)

Date Published:

2018-01-09

Dataset Citation:

Grahe, J. E., Chalk, H. M., Cramblet Alvarez, L. D., Faas, C., Hermann, A., McFall, J. P., & Molyneux, K. (2018, January 10). EAMMi2 Public Data. Retrieved from: https://osf.io/x7mp2/.

Keywords:

self report, emerging adulthood

Use License:

CC-By Attribution 4.0 International

Geographic Description - City/State/Country of Participants:

Mostly United States but any English speaker could complete.

Column Metadata:

EAMMi2metadata <- import("data/mcfall_metadata.csv")
flextable(EAMMi2metadata) %>% autofit()

EAMMi2 <- EAMMi2[complete.cases(EAMMi2),]
EAMMi2long <- EAMMi2 %>% pivot_longer(cols = everything()) %>% 
  dplyr::rename(item = name, score = value) %>% 
  group_by(item) %>% 
  sample_n(size = 50)

flextable(head(EAMMi2long)) %>% autofit()

AIPE Analysis:

Stopping Rule

What the usual standard error for the data that could be considered for our stopping rule?

SE <- tapply(EAMMi2long$score, EAMMi2long$item, function (x) { sd(x)/sqrt(length(x)) })
min(SE)
quantile(SE, probs = .4)
max(SE)

cutoff <- quantile(SE, probs = .4)

# we can also use semanticprimer's function
cutoff_score <- calculate_cutoff(population = EAMMi2long,
                                 grouping_items = "item",
                                 score = "score",
                                 minimum = min(EAMMi2long$score),
                                 maximum = max(EAMMi2long$score))
cutoff_score$cutoff

The items have a range of r min(SE) to r max(SE). We could use the 40% decile SE = r cutoff as our critical value for our stopping rule given the manuscript results. We could also have a set SE to a specific item if we do not believe we have representative pilot data in this example. You should also consider the scale when estimating these values (i.e., 1-7 scales will have smaller estimates than 1-100 scales).

Minimum Sample Size

To estimate minimum sample size, we should figure out what number of participants it would take to achieve 80% of the SEs for items below our critical score of r cutoff?

# sequence of sample sizes to try
nsim <- 10 # small for cran
samplesize_values <- seq(20, 200, 5)

# create a blank table for us to save the values in 

sim_table <- matrix(NA, 
                    nrow = length(samplesize_values)*nsim, 
                    ncol = length(unique(EAMMi2long$item)))

# make it a data frame
sim_table <- as.data.frame(sim_table)

# add a place for sample size values 
sim_table$sample_size <- NA

iterate <- 1

for (p in 1:nsim){
  # loop over sample sizes
  for (i in 1:length(samplesize_values)){

    # temp dataframe that samples and summarizes
    temp <- EAMMi2long %>% 
      group_by(item) %>% 
      sample_n(samplesize_values[i], replace = T) %>% 
      summarize(se = sd(score)/sqrt(length(score))) 

    colnames(sim_table)[1:length(unique(EAMMi2long$item))] <- temp$item
    sim_table[iterate, 1:length(unique(EAMMi2long$item))] <- temp$se
    sim_table[iterate, "sample_size"] <- samplesize_values[i]
    sim_table[iterate, "nsim"] <- p

    iterate <- iterate + 1
  }
}

final_sample <- 
  sim_table %>% 
  pivot_longer(cols = -c(sample_size, nsim)) %>% 
  group_by(sample_size, nsim) %>% 
  summarize(percent_below = sum(value <= cutoff)/length(unique(EAMMi2long$item))) %>% 
  ungroup() %>% 
  # then summarize all down averaging percents
  dplyr::group_by(sample_size) %>% 
  summarize(percent_below = mean(percent_below)) %>% 
  dplyr::arrange(percent_below) %>% 
  ungroup()

flextable(final_sample %>% head()) %>% autofit()
final_table <- calculate_correction(
  proportion_summary = final_sample,
  pilot_sample_size = EAMMi2long %>% group_by(item) %>% 
    summarize(sample_size = n()) %>% ungroup() %>% 
    summarize(avg_sample = mean(sample_size)) %>% pull(avg_sample),
  proportion_variability = cutoff_score$prop_var
  )

flextable(final_table) %>% 
  autofit()

Based on these simulations, we can decide our minimum sample size is likely close to r final_table %>% arrange(percent_below, corrected_sample_size) %>% filter(percent_below >= 80) %>% slice_head() %>% pull(corrected_sample_size) %>% round().

Maximum Sample Size

In this example, we could set our maximum sample size for 90% power, which would equate to r final_table %>% arrange(percent_below, corrected_sample_size) %>% filter(percent_below >= 90) %>% slice_head() %>% pull(corrected_sample_size) %>% round() participants.

Final Sample Size

In any estimate for sample size, you should also consider the potential for missing data and/or unusable data due to any other exclusion criteria in your study (i.e., attention checks, speeding, getting the answer right, etc.). Another important note is that these estimates are driven by the number of items. Fewer items would require smaller sample sizes to achieve minimum power. Note: Several redundant (e.g., reverse coded items) and/or not useful variables (various checks) were omitted.



SemanticPriming/semanticprimeR documentation built on Feb. 26, 2024, 8:30 p.m.