knitr::opts_chunk$set(
  collapse = TRUE,
  echo = TRUE,
  comment = "#>"
  )
  library(car)
  library(dplyr)
  library(emmeans)
  library(ggplot2)
  library(ggpubr)
  library(ggrepel)
  library(psych)
  library(readr)
  library(reshape)
  library(rstatix)
  library(tibble)

This is a documentation of DBERlibR, which represents Discipline-based Education Research library R. The package runs R scripts to clean the data, merge/bind multiple data sets (as necessary), check assumption(s) for a specific statistical technique (as necessary), and run the main assessment data analysis all at once. The output(s) contain(s) a sample interpretation of the results for the convenience of users. Users need to prepare the data file as instructed and type a function in the R console (with the data file name(s)) to conduct a specific data analysis.

Users need to install the most up-to-date R version before installing the DBERlibR package. Some error messages (e.g., “returned errors in R saying it couldn’t find particular files”) identified while testing the package were due to users running an older version of R (versions prior to 3.0). DBERlibR depends on car, dplyr, emmeans, ggplot2, ggpubr, ggrepel, psych, readr, reshape, rstatix, and tibble. Loading the package will automatically load all the dependencies. All of these packages are available on CRAN at http://CRAN.R-project.org/. The results presented in this vignette have been obtained using R version 4.2.2; car 3.1-1; dplyr 1.0.10; emmeans 1.8.3; ggplot2 3.4.0; ggpubr 0.4-0; ggrepel 0.9-1; psych 2.2-5; readr 2.1-2; reshape 0.8.9; rstatix 0.7-0; and tibble 3.1.8. Users can view detailed descriptions of the package by running ‘help(packae=“DBERlibR”)’ in the R console.

Load DBERlibR, Set Working Directory, and Save Data File(s) in that Directory

Load DBERlibR by typing/copying the line below in the R console and hitting the Enter key.

library(DBERlibR)

Two different ways are available for users to interact with the functions in this DBERlibR. The first way is to use the R console to provide arguments to run specific functions. In this case, users need to set “Working Directory” under the “Session” menu in R Studio if they don't want to provide a path of the data file they need to load; it is not necessary to set the Working Directory if users provide a path of the data file they load. The second way is to use a graphical user interface by running 'gui()' in the R console; a Shiny app will pop up allowing users to interact with the functions graphically.

Sample data files for users to test the functions and view examples have been downloaded into users' computer when installing DBERlibR; run the 'system.file("extdata", package = "DBERlibR")' in the console panel, and users will see the path to the downloaded data folder (i.e., extdata).

system.file("extdata", package = "DBERlibR")

Users can download all data files manually from https://github.com/HelikarLab/DBERlibR/tree/main/inst/extdata

Data Preparation

The minimum requirement to use installed DBERlibR is a user-provided assessment data file (ADF) that contains graded assessment data for all study subjects. The ADF should include only subject identifiers and assessment items. No other data like demographic information should be included in the ADF. Users should save other data in a separate file along with the identifiers - see the demographic data format below. The assessment data should be binary (i.e., 1 for correct answers, 0 for incorrect answers). The ADF can have missing values (these are treated by DBERlibR automatically). The ADF should be in a “CSV” (comma separated values) format.

  knitr::include_graphics("data_format.png")

As users can see in the data table above, the questions were named Q1, Q2, Q3, and so on, but users can name the questions differently; make sure to use the same question (column in the csv file) names across different data sets (e.g., treatment group pre-test data, treatment group post-test data, control group pre-test data, and control group post-test data).

Skipped answers in the data file(s) (i.e., blank cells in the data frame above) will be treated as incorrect in this package. Too many skipped answers may skew the results of data analysis. Users can exclude students with too many skipped answers by defining too many skipped answers; users can just provide a percentage as an argument (m_cutoff) in the function.

If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, `key_csv_data = "answer_keys.csv", to the function. The figure below shows the ADF before the transformation.

  knitr::include_graphics("data_format_multiple.png")

The figure below demonstrates how the answer keys should be arranged in a CSV file to be used as an argument.

  knitr::include_graphics("data_format_keys.png")

If users have students' demographic information and want to examine the differences of demographic subgroups (e.g., male vs. female), they should save those demographic data in a separate file along with id (the same id as in the assessment data file(s)). The data format should look like the one below. DO NOT convert the subgroup label to a numeric code; it will be hard to read the results if users use a numeric code.

  knitr::include_graphics("data_format_demo.png")

Item Analysis Function: "item_analysis(score_csv_data, m_cutoff = 0.15, m_choice = FALSE, key_csv_data = NULL)"

The item analysis function requires users to type a data file name as shown in the sample code below. Users can input any of the data file names (e.g., "data_treat_pre.csv", "data_treat_post.csv", "data_ctrl_pre.csv", or "data_ctrl_post.csv") as they become available for item analysis.

An example without a file path: item_analysis(score_csv_data = "data_treat_pre.csv", m_cutoff = 0.1). If you provide only data file name, as shown in the function above, the folder where the file is saved should be set as the working directory. Alternatively, you can provide the path to the folder in the argument.

Alternative example with a file path: item_analysis(score_csv_data = "C:/Users/csong/documents/data_treat_pre.csv", m_cutoff = 0.1). The 'm_cutoff = 0.1' in the function indicates removing students with 10 percent or more skipped answers (the default is 15 percent: m_cutoff = 0.15).

An example with multiple-choice answers and without a file path: item_analysis(score_csv_data = "data_treat_pre_m.csv", m_cutoff = 0.15, m_choice = TRUE, key_csv_data = “answer_keys.csv”). If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, key_csv_data, with NULL replaced by a CSV file containing the answer keys (e.g., "answer_keys.csv" in the sample code below; to avoid confusion between the binary format and the multiple-choice format, the ADF name in the sample code below has been changed to "data_treat_pre_m.csv" for the purpose of illustration).

Or, users can use a graphical user interface (GUI) to provide the information for all arguments in the function by running 'gui()' in the R console. The GUI is a Shiny app that allows users to input all the information without typing code. In the Shiny app, users can select a function from the drop-down menu, load the ADF by exploring and clicking on a file, and provide additional information (i.e., m_cutoff, m_choice, and key_csv_data, as necessary) through the graphical interface.

  # Reading
  data_original <- read_csv("data_treat_pre.csv",col_types = cols())


  # Calculate average scores
  data_original[is.na(data_original)]= 0
  # colnames(data_original) <- paste( colnames(data_original), file_name, sep = "_")
  data_original <- data_original %>% mutate(avg_score = rowMeans(data_original[,-1]))

  # ggplots: difficulty index
  difficulty_data <- data_original[,c(-1)] %>% colMeans() %>% t() %>% as.data.frame() %>% t()
  difficulty_data <- reshape::melt(round(difficulty_data,2), id.vars=c("Q_id"))  %>% dplyr::select(-X2)
  colnames(difficulty_data) = c("q.number","difficulty_index")
  plot_difficulty<- ggplot(difficulty_data, aes(x= difficulty_index , y= reorder(q.number,difficulty_index))) +
    geom_point() +
    geom_vline(xintercept = 0.2, color ="red")+
    geom_hline(yintercept = difficulty_data$q.number[length(data_original)-1],color="blue") +
    ggtitle("Difficulty Plot")+
    xlab("Proportion")+ ylab("Question Item Number") +theme_minimal() + geom_text_repel(aes(label = round(difficulty_index,2)))

  # Create data subset: difficulty index lower than 0.2
  toodifficultitems <- subset(difficulty_data, difficulty_index < 0.2)
  toodifficultitems_nrow <- nrow(toodifficultitems)

  # ggplots: discrimination index
  discrimination_data <- cor(data_original[,-1]) %>% as.data.frame()
  plot_discrimination <- ggplot(discrimination_data, aes(x=avg_score , y= reorder (colnames(discrimination_data),avg_score))) + geom_point() +
    geom_vline(xintercept = 0.2, color ="red")+
    xlab("Relationship Coefficient") +
    ylab("Question Item Number") + ggtitle("Discrimination Plot") +
    theme_minimal() + geom_text_repel(aes(label = round(avg_score,2)))

  # Create data subset: discrimination index lower than 0.2
  qnumber <- colnames(discrimination_data)
  discrimination_index <- round(discrimination_data$avg_score,2)
  discrimination_df <- data.frame(qnumber,discrimination_index)
  nondiscriminantitems <- subset(discrimination_df, discrimination_index < 0.2)
  nondiscriminantitems_nrow <- nrow(nondiscriminantitems)

The function will automatically reads and cleans the data (e.g., deleting students with too many skipped answers, converting missing values to "0"), calculates difficulty and discriminant scores, and displays all results in the Console and Plots panels of RStudio.

Item difficulty refers to the proportion of students that correctly answered the item, scaled 0 to 1. Ideal values are known to be slightly higher than midway between a chance to be chosen (i.e., 1 divided by the number of choices) and a perfect score (1) for the item. In common, values below 0.2 are considered to indicate "very difficult" items and need to be reviewed for possible confusing language, removed from the test, and/or identified as an area for re-instruction to improve test instruments. In addition to improving the test instrument, the results of item difficulty analysis inform the areas of weakness in students’ knowledge to be taken into consideration in improving teaching modules. The difficulty plot is shown below.

  # Print difficulty data results
  cat("Item Analysis Results - Difficulty", sep="\n")
  print(difficulty_data)
  knitr::include_graphics("itemanalysis_difficulty_plot_treat_pre.png")
  cat("Refer to 'Difficulty Plot' in the 'Plots' panel.")
  if (toodifficultitems_nrow > 0) {
    cat("As seen in the difficulty plot, the following question items present a difficulty plot lower than:", sep="\n")
    print(toodifficultitems$Q_id)
  } else {
    cat("As seen in the difficulty plot, none of the difficulty indixes was found to be lower than 0.2.", sep="\n")
  }

The proportions in the generated difficulty plot above are ordered by their size, and the plot displays a vertical red-colored line that represents the threshold of 0.2 so that users can instantly find the question items that warrant users' attention for improvement. The plot above does not have any item of which proportion size is below 0.2.

Meanwhile, item discrimination represents the relationship between how well students did on the item and their total test performance, ranging from 0 to 1. The closer to 1, the better; the scales of 0.2 or higher are generally known to be acceptable. Items with discrimination values near or less than zero need to be removed from the test. For example, negative values indicate that students who did poorly on the test did better on that item than those who overall did well, which doesn't make sense. Removing the items highlighted in red, especially with negative values, in the table may need to be considered, or improving instructions related to the items with low values should be considered to increase the values. The discrimination plot is shown below.

    # Print discrimination data results
  cat("Item Analysis Results - Discrimination:", sep="\n")
  print(discrimination_df)
  knitr::include_graphics("itemanalysis_discrimination_plot_treat_pre.png")
  cat("Refer to 'Discrimination Plot' in the 'Plots' panel", sep="\n")
  if (nondiscriminantitems_nrow > 0) {
    cat("As seen in the discrimination plot, the following question items present a discrimination index lower than 0.2:", sep="\n")
    print(nondiscriminantitems$qnumber)
  } else {
    cat("As seen in the discrimination plot, none of the discrimination indixes was found to be lower than 0.2", sep="\n")
  }

The relationship coefficients in the generated discrimination plot are ordered by their size, and the plot displays a vertical red-colored line that represents the threshold of 0.2 so that users can instantly find the question items that warrant users' attention for improvement. The plot above presents 12 question items of which coefficients are less than 0.2.

Paired Samples Data Analysis Function: "paired_samples(pre_csv_data, post_csv_data, m_cutoff = 0.15, m_choice = FALSE, key_csv_data = NULL)"

An ideal situation for examining the effect of teaching modules on student performance (i.e., pre-post difference) is to have a control group to compare with the treatment/intervention group. However, it is often difficult to get the control group data. If that is the case, users can use this function to examine the difference between pre-test and post-test scores of the treatment group.

An example without a file path: paired_samples(pre_csv_data = "data_treat_pre.csv", post_csv_data = "data_treat_post.csv", m_cutoff = 0.1). If you provide only data file names, as shown in the function above, the folder where the file is saved should be set as the working directory. Alternatively, you can provide the path to the folder in the argument.

Alternative example with a file path: paired_samples(pre_csv_data = "C:/Users/csong/documents/data_treat_pre.csv", post_csv_data = "C:/Users/csong/documents/data_treat_post.csv", m_cutoff = 0.1). The 'm_cutoff = 0.1' in the function indicates removing students with 10 percent or more skipped answers (the default is 15 percent).

An example with multiple-choice answers and without a file path: paired_samples(pre_csv_data = "data_treat_pre.csv", post_csv_data = "data_treat_post.csv", m_cutoff = 0.1, m_choice = TRUE, key_csv_data = “answer_keys.csv”). If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, key_csv_data, with NULL replaced by a CSV file containing the answer keys.

You can provide all arguments through a graphical user interface by running 'gui()' in the R console.

The function automatically cleans the data sets (e.g., converting missing values to "0), merges pre-post data sets, check assumptions, and then runs the Paired Samples T-test (Parametric) and Wilcoxon Signed-Rank test (Nonparametric) to help users examine the difference between pre-post scores. The outputs of the function are as follows.

First, a summary of pre-/post-test scores' descriptive statistics will be presented in the R console.

  # read pre-post data sets (of the treatment group)
  data_treat_pre <- read_csv("data_treat_pre.csv")
  data_treat_post<- read_csv("data_treat_post.csv")
  n <- as.numeric(length(data_treat_pre[,-1]))
  n=n-1
  # replace skipped answers with "0"
  data_treat_pre[is.na(data_treat_pre)]= 0
  data_treat_post[is.na(data_treat_post)]= 0
  # change column names with their origin'sir
  colnames(data_treat_pre) <- paste( colnames(data_treat_pre), "pre", sep = "_")
  colnames(data_treat_post) <- paste( colnames(data_treat_post), "post", sep = "_")
  # calculate average scores
  data_treat_pre <- data_treat_pre %>% mutate(avg_score_pre = round(rowMeans(data_treat_pre[,-1]),3))
  data_treat_post <- data_treat_post %>% mutate(avg_score_post = round(rowMeans(data_treat_post[,-1]),3))
  # merge pre/post data and generate group code (treat=1, control=0)
  treat_data_merged<-merge(data_treat_pre, data_treat_post, by.x = "id_pre",  by.y = "id_post")
  names(treat_data_merged)[names(treat_data_merged) == 'id_pre'] <- "id"
  treat_data_merged <- treat_data_merged %>% mutate(avg_diff=avg_score_post-avg_score_pre)
  Mean_Differences <- treat_data_merged$avg_diff
  # get descriptive statistics for the average of pre-test scores
  avg_score_df <- cbind(treat_data_merged$avg_score_pre, treat_data_merged$avg_score_post)
  df_Long <- melt(avg_score_df)
  names(df_Long) <- c("id", "Time", "Score")
  # Name Time(group) -> 1=Pre, 2=Post
  df_Long$Time <- factor(df_Long$Time,levels=c(1,2),labels=c("Pre", "Post"))
  # Descriptive Statistics
  df_Long %>% group_by(Time) %>% get_summary_stats(Score, type="common")

Then, boxplots will be presented in the 'Plots' panel in RStudio to help users visually inspect the descriptive statistics.

  knitr::include_graphics("pairedsamples_boxplots.png")

Second, the function runs scripts to check the assumptions (no outliers and normal distribution of of mean differences) to be satisfied for using the paired samples t-test and present the results in the R console. For example, users can see the result of the Shapiro-Wilk normality test in the R console (followed by a brief interpretation of the result: e.g., "the assumption of normality by group has NOT been met (p<0.05)) and the histogram and normal Q-Q plot of the mean differences, as shown below.

  # Results of checking normality
  treat_data_merged %>% shapiro_test(avg_diff)
  knitr::include_graphics("pairedsamples_histogram.png")

If the sample size is greater than 50, it would be better refer to the QQ plots displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size.

  knitr::include_graphics("pairedsamples_qqplots.png")

Third, the main paired samples t-test result is presented in the R console, followed by a brief interpretation of the result: e.g., "The average pre-test score was 0.57 and the average post-test score was 0.66. The Paired Samples T-Test showed that the pre-post difference was statistically significant (p<0.001)", as shown below.

  # run Paired Samples T-test (two-sided)
  t.test(treat_data_merged$avg_score_pre, treat_data_merged$avg_score_post, mu=0, alt="two.sided", paired=T, conf.level=0.95)

Fourth, if the sample size is considered to be too small (e.g., less than 15), or the data has failed to satisfy the normality assumption for using the parametric paired samples t-test, then the function automatically runs the Wilcoxon signed rank sum test (a non-parametric version of the paired samples t-test) and presents its results in the R console, as shown below. Users won't be able to see the non-parametric test result if they can use the parametric test result, since the parametric test is more powerful than its non-parametric alternative.

  # run Wilcoxon Signed Rank Test
  wilcox.test(treat_data_merged$avg_score_pre, treat_data_merged$avg_score_post, paired=T, conf.level=0.95)

Finally, the function further runs scripts to run McNemar test to examine the difference between pre-post scores for individual test items. The information generated from this function is useful in determining which items would have been signficantly impacted by the treatment/intervention (e.g., teaching modules). As shown in the plot below, the function generates a plot to show the size of p-values (significance of the pre-post difference) for individual test items (ordered by the p-values). The plot displays a vertical line to show a threshold of statistical significance (p=0.05), so that users can easily find test question items of which the difference between pre-post scores is statistically significant. A brief summary of the plot is provided in the R console for the convenience of users: e.g., "the plot shows the statistical significance of the difference between pre-post scores of the test items "Q4," "Q7," and "Q17."

  knitr::include_graphics("pairedsamples_individual_items.png")

Indepent Samples Data Analysis Function: "independent_samples(treat_csv_data, ctrl_csv_data, m_cutoff = 0.15, m_choice = FALSE, key_csv_data = NULL)"

Ideally, it requires both pre-test and post-test data to examine the difference between the treatment and the control group. However, it is often difficult to get the pre-test data. If that is the case, users can use this function to examine the difference between two (treatment vs. control) groups. Please make sure to name data files accurately (i.e., “data_treat_post.csv” and “data_ctrl_post.csv”) and have them saved in the working directory.

An example without a file path: independent_samples(treat_csv_data = "data_treat_post.csv", ctrl_csv_data = "data_ctrl_post.csv", m_cutoff = 0.1). If you provide only data file names, as shown in the function above, the folder where the file is saved should be set as the working directory. Alternatively, you can provide the path to the folder in the argument independent_samples

Alternative example with a file path: independent_samples(treat_csv_data = "C:/Users/csong/documents/data_treat_post.csv", ctrl_csv_data = "C:/Users/csong/documents/data_ctrl_post.csv", m_cutoff = 0.1). The 'm_cutoff = 0.1' in the function indicates removing students with 10 percent or more skipped answers (the default is 15 percent; m_cutoff = 0.15).

An example with multiple-choice answers and without a file path: independent_samples(treat_csv_data = "data_treat_post.csv", ctrl_csv_data = "data_ctrl_post.csv", m_cutoff = 0.1, m_choice = TRUE, key_csv_data = “answer_keys.csv”). If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, key_csv_data, with NULL replaced by a CSV file containing the answer keys.

You can provide all arguments through a graphical user interface by running 'gui()' in the R console.

This function automatically cleans the data sets (e.g., converting missing values to "0), binds treatment-control group data sets, check assumptions, and then runs the Independent Samples T-test (parametric) and Mann–Whitney U test (nonparametric) to help users examine the difference between the groups. The outputs from this function are as follows.

  # Read the treatment and control group post-test data sets
  data_treat_post<- read_csv("data_treat_post.csv")
  data_ctrl_post<- read_csv("data_ctrl_post.csv")
  # Replace skipped answers with "0"
  data_treat_post[is.na(data_treat_post)]= 0
  data_ctrl_post[is.na(data_ctrl_post)]= 0
  # Change column names
  colnames(data_treat_post) <- paste( colnames(data_treat_post), "post", sep = "_")
  colnames(data_ctrl_post) <- paste( colnames(data_ctrl_post), "post", sep = "_")
  # Calculate average scores and generate group variable
  data_treat_post <- data_treat_post %>%
    mutate(avg_score_post = round(rowMeans(data_treat_post[,-1]),3)) %>%
    mutate(datagroup=1)
  data_ctrl_post <- data_ctrl_post %>%
    mutate(avg_score_post = round(rowMeans(data_ctrl_post[,-1]),3)) %>%
    mutate(datagroup=0)
  treat_post_average <- describe(data_treat_post$avg_score_post)
  ctrl_post_average <- describe(data_ctrl_post$avg_score_post)
  # Bind treat/control data
  group_data_binded <- rbind(data_treat_post, data_ctrl_post)
  # Name datagroup -> 0=control, 1=treatment
  group_data_binded$datagroup<-factor(group_data_binded$datagroup,levels=c(0,1),labels=c("Control", "Treatment"))

First, a summary of treatment/control groups' descriptive statistics will be presented in the R console.

  # Get descriptive statistics
  group_data_binded %>% group_by(datagroup) %>% get_summary_stats(avg_score_post, type = "mean_sd")

Then, boxplots will be presented in the 'Plots' panel in RStudio to help users visually inspect the descriptive statistics.

  knitr::include_graphics("independentsamples_boxplots.png")

Second, the function run scripts to check the assumptions (no outliers, normality for each gruop) to be satisfied for using the independent samples t-test (parametric) and present the results in the R console. Users will have the result of the Shapiro-Wilk normality test in the R console (followed by a brief interpretation of the result: e.g., "Interpretation: the assumption of normality has been met (p>0.05 for each group)), as shown below.

  # Check Normality
  group_data_binded %>% group_by(datagroup) %>% shapiro_test(avg_score_post)
  cat("## Interpretation: the assumption of equality of variances has been met (p>0.05)", sep="\n")

If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size.

  knitr::include_graphics("independentsamples_qqplots.png")

Then, the function proceeds to check the assumption of equal variances between two independent groups and shows the test results with an interpretation of the result, as shown below.

  # Check Equal Variances
  group_data_binded %>% levene_test(avg_score_post ~ datagroup)
  cat("## Interpretation: the assumption of equality of variances has been met (p>0.05)", sep="\n")

Third, the main independent samples t-test result is presented in the R console, followed by a brief interpretation of the result: "The treatment group's average score was 0.66, and the control group's average score was 0.59. The Independent Samples T-Test showed that the group difference was statistically significant (p=0.003).", as shown below. The function presents the results with equal variances assumed or not assumed, based on the test result of equal variances between two groups.

  # Run independent samples t-test (two-sided, non-equal variances)
  t.test(group_data_binded$avg_score_post~group_data_binded$datagroup, mu=0, alt="two.sided", conf=0.95, var.eq=F, paired=F)

Finally, if either the sample size is too small (e.g., less than 15) or the data fails to satisfy the normality assumption, then the function further runs scripts for the Mann-Whitney U test (a non-parametric version of the independent samples t-test) and presents its results in the R console, as shown below (Note: users won't be able to see the non-parametric test result if the data satisfies the normality assumption since the parametric test is more powerful than its non-parametric alternative).

# Run Mann-Whitney U Test
  wilcox.test(group_data_binded$avg_score_post~group_data_binded$datagroup)

One-way ANCOVA Function: "one_way_ancova(treat_pre_csv_data, treat_post_csv_data, ctrl_pre_csv_data, ctrl_post_csv_data, m_cutoff = 0.15, m_choice = FALSE, key_csv_data = NULL)"

The function "one_way_ancova()" is to conduct a one-way analysis of covariance (ANCOVA). The ANCOVA is an extension of the one-way ANOVA to incorporate a covariate variable into the analytic model. The inclusion of this covariate (pre-test scores in this function), which is linearly related to the dependent variable (e.g., post-test scores), into the analysis increases the ability to detect differences between groups of an independent variable (treatment vs. control group in this function).

An example without a file path: one_way_ancova(treat_pre_csv_data = "data_treat_pre.csv", treat_post_csv_data = "data_treat_post.csv", ctrl_pre_csv_data = "data_ctrl_pre.csv", ctrl_post_csv_data = "data_ctrl_post.csv", m_cutoff = 0.1). If you provide only data file names, as shown in the function above, the folder where the file is saved should be set as the working directory. Alternatively, you can provide the path to the folder in the argument independent_samples

Alternative example with a file path: one_way_ancova(treat_pre_csv_data = "C:/Users/csong/documents/data_treat_pre.csv", treat_post_csv_data = "C:/Users/csong/documents/data_treat_post.csv", ctrl_pre_csv_data = "C:/Users/csong/documents/data_ctrl_pre.csv", ctrl_post_csv_data = "C:/Users/csong/documents/data_ctrl_post.csv", m_cutoff = 0.1)). The 'm_cutoff = 0.1' in the function indicates removing students with 10 percent or more skipped answers (the default is 15 percent; m_cutoff = 0.15).

An example with multiple-choice answers and without a file path: one_way_ancova(treat_pre_csv_data = "data_treat_pre.csv", treat_post_csv_data = "data_treat_post.csv", ctrl_pre_csv_data = "data_ctrl_pre.csv", ctrl_post_csv_data = "data_ctrl_post.csv", m_cutoff = 0.1, m_choice = TRUE, key_csv_data = “answer_keys.csv”). If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, key_csv_data, with NULL replaced by a CSV file containing the answer keys.

You can provide all arguments through a graphical user interface by running 'gui()' in the R console.

This function automatically merges pre-post data sets, binds treatment-control data sets, runs scripts to check assumptions of one-way ANCOVA, and then runs the main One-way ANCOVA all at once. The outputs from this function are as follows. The outputs/results from this function are as follows.

  # Read all data sets
  data_treat_pre <- read_csv("data_treat_pre.csv")
  data_treat_post<- read_csv("data_treat_post.csv")
  data_ctrl_pre <- read_csv("data_ctrl_pre.csv")
  data_ctrl_post<- read_csv("data_ctrl_post.csv")

  # Replace skipped answers with "0"
  data_treat_pre[is.na(data_treat_pre)]= 0
  data_treat_post[is.na(data_treat_post)]= 0
  data_ctrl_pre[is.na(data_ctrl_pre)]= 0
  data_ctrl_post[is.na(data_ctrl_post)]= 0

  # Creat data sets for descriptive statistics
  data_c_pre <- data_ctrl_pre %>% mutate(avg_score = round(rowMeans(data_ctrl_pre[,-1]),3))
  data_c_post <- data_ctrl_post %>% mutate(avg_score = round(rowMeans(data_ctrl_post[,-1]),3))
  data_t_pre <- data_treat_pre %>% mutate(avg_score = round(rowMeans(data_treat_pre[,-1]),3))
  data_t_post <- data_treat_post %>% mutate(avg_score = round(rowMeans(data_treat_post[,-1]),3))

  # Change column names with their origin'sir
  colnames(data_treat_pre) <- paste( colnames(data_treat_pre), "pre", sep = "_")
  colnames(data_treat_post) <- paste( colnames(data_treat_post), "post", sep = "_")
  colnames(data_ctrl_pre) <- paste( colnames(data_ctrl_pre), "pre", sep = "_")
  colnames(data_ctrl_post) <- paste( colnames(data_ctrl_post), "post", sep = "_")

  # Calculate average scores
  data_treat_pre <- data_treat_pre %>% mutate(avg_score_pre = round(rowMeans(data_treat_pre[,-1]),3))
  data_treat_post <- data_treat_post %>% mutate(avg_score_post = round(rowMeans(data_treat_post[,-1]),3))
  data_ctrl_pre <- data_ctrl_pre %>% mutate(avg_score_pre = round(rowMeans(data_ctrl_pre[,-1]),3))
  data_ctrl_post <- data_ctrl_post %>% mutate(avg_score_post = round(rowMeans(data_ctrl_post[,-1]),3))

  # Merge pre/post data and generate group code (treat=1, control=0)
  treat_data_merged<-merge(data_treat_pre, data_treat_post, by.x = "id_pre",  by.y = "id_post")
  names(treat_data_merged)[names(treat_data_merged) == 'id_pre'] <- "id"
  treat_data_merged <- treat_data_merged %>% mutate(datagroup=1)

  ctrl_data_merged<-merge(data_ctrl_pre, data_ctrl_post, by.x = "id_pre",  by.y = "id_post")
  names(ctrl_data_merged)[names(ctrl_data_merged) == 'id_pre'] <- "id"
  ctrl_data_merged <- ctrl_data_merged %>% mutate(datagroup=0)

  # Bind treat/control data
  full_data_binded <- rbind(ctrl_data_merged, treat_data_merged)

  # Name datagroup -> 0=control, 1=treatment
  full_data_binded$datagroup<-factor(full_data_binded$datagroup,levels=c(0,1),labels=c("Control", "Treatment"))
  # Inspect the model diagnostic metrics
  model <- lm(avg_score_post ~ avg_score_pre + datagroup, data = full_data_binded)
  model_metrics <- augment(model) %>% dplyr::select(-.hat, -.sigma, -.fitted)

First, the function presents descriptive statistics and boxplots of all four groups, as shown below.

  # Descriptive Statistics
  pre_descriptive <- group_by (full_data_binded, datagroup) %>%
    summarize(mean=mean(avg_score_pre), sd=sd(avg_score_pre), min=min(avg_score_pre), max=max(avg_score_pre))
  post_descriptive <- group_by (full_data_binded, datagroup) %>%
    summarize(mean=mean(avg_score_post), sd=sd(avg_score_post), min=min(avg_score_post), max=max(avg_score_post))

  data_c_pre <- data_c_pre %>% mutate(datagroup=1)
  data_c_post <- data_c_post %>% mutate(datagroup=2)
  data_t_pre <- data_t_pre %>% mutate(datagroup=3)
  data_t_post <- data_t_post %>% mutate(datagroup=4)

  df_Long <- rbind(data_c_pre, data_c_post, data_t_pre, data_t_post)
  df_Long <- df_Long %>% select(avg_score, datagroup)
  names(df_Long) <- c("Average_Score", "Group")
  # Name Time(group) -> 1=Control-Pre, 2=Control-Post, 3=Treatment-Pre, 4=Treatment-Post
  df_Long$Group <- factor(df_Long$Group,levels=c(1,2,3,4),labels=c("Control-Pre", "Control-Post", "Treatment-Pre", "Treatment-Post"))

  print(pre_descriptive)
  print(post_descriptive)
  knitr::include_graphics("onewayancova_boxplots.png")

Second, the function runs scripts to check all assumptions to be satisfied and presents the results for users to confidently interpret the results of one-way ANCOVA. The first assumption checked and presented is linearity; the plot below can be used to check the linearity. If users are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then they can say this assumption has been met.

  knitr::include_graphics("onewayancova_linearity_check_scatter_plot.png")
  cat("## Interpretation: if you are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then you can say this assumption has been met or the data has not violated this assumption of linearity. If your relationships are not linear, you have violated this assumption, and an ANCOVA is not a siutable analysis. However, you might be able to coax your data to have a linear relationship by transforming the covariate, and still be able to run an ANCOVA.", sep="\n")

Then, the assumption of normality of residuals is checked, and its result (with an interpretation) is presented, as shown below.

  # Check normality of Residuals
  cat("# Normality of Residuals:", sep="\n")
  norm.all.aov <- aov(avg_score_post ~ datagroup, data=full_data_binded)
  shapiro.test(norm.all.aov$residuals)
  cat("## Interpretation: the assumption of normality by group has been met (p>0.05).", sep="\n")
  Residuals <- norm.all.aov$residuals
  knitr::include_graphics("onewayancova_histogram.png")

Users need to visually examine the histogram (above) and normal Q-Q plot (below) as well to confirm the Shapiro-Wilk normality test result, especially when the sample size is large because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size.

  knitr::include_graphics("onewayancova_qqplots.png")

Then, the outputs present the result of checking homogeneity of variance. The function runs Levene Test and reports its result with an interpretation, as shown below.

  # Check homogeneity of variances
  leveneTest(avg_score_post ~ datagroup, full_data_binded)
  cat("## Interpretation: the assumption of equality of error variances has been met (p>0.05).", sep="\n")

Then, the function checks the data to see if any outlier exists and reports variables if an outlier is found, as shown below.

  # Check outlier
  outlier_variables <- model_metrics %>%  dplyr::filter(abs(.std.resid) > 3)  %>% as.data.frame()
  print(outlier_variables)
  cat("# Outliers: No outlier has been found.", sep="\n")

Then, the last assumption of homogeneity of regression line slopes is checked, and its results (with an interpretation) are reported, as show below.

  # Check homogeneity of regression line slopes
  lineslopes <- (lm(avg_score_post ~ avg_score_pre + datagroup + avg_score_pre:datagroup, data = full_data_binded))
  summary(lineslopes)
  cat("## Interpretation: there was homogeneity of regression slopes as the interaction term (i.e., datagroup:avg_score_pre) was not statistically significant (p>0.05).", sep="\n")

Third, upon presenting all results of checking the assumptions, the function generates the ANOVA table (type II tests) for users to examine the significance of the group variable.

  # Run one-way ANCOVA
  res.aov <- (full_data_binded %>%  rstatix::anova_test(avg_score_post ~ datagroup + avg_score_pre))
  print(res.aov)

The function runs a post-hoc analysis to generate estimated (or adjusted) marginal means to compare between the groups and displays a summary statement of the outputs (see the results below).

  # Display the adjusted means (a.k.a., estimated marginal means) for each group
  emms <- emmeans_test(full_data_binded,
                       avg_score_post ~ datagroup,
                       covariate = avg_score_pre,
                       p.adjust.method = "bonferroni",
                       conf.level=0.95,
                       detailed=TRUE)
  posthoc_emm <- get_emmeans(emms) %>% as.data.frame()
  print(posthoc_emm)
    if (res.aov$p[1] < 0.05) {
    cat(paste0("--> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be significant with pre-test scores being controlled: F(1,",res.aov$DFd[1],")=",res.aov$F[1],", p=",res.aov$p[1]," (effect size=",res.aov$ges[1],"). The adjusted marginal mean of post-test scores of the treatment group (",round(posthoc_emm$emmean[2],2),", SE=,",round(posthoc_emm$se[2],2),") was significantly different from that of the control group (",round(posthoc_emm$emmean[1],2),", SE=,",round(posthoc_emm$se[1],2),")."), sep="\n")
  } else {
    cat(paste0("--> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be insignificant with pre-test scores being controlled: F(1,",res.aov$DFd[1],")=",res.aov$F[1],", p=",res.aov$p[1]," (effect size=",res.aov$ges[1],"). The adjusted marginal mean of post-test scores of the treatment group (",round(posthoc_emm$emmean[2],2),", SE=,",round(posthoc_emm$se[2],2),") was not significantly different from that of the control group (",round(posthoc_emm$emmean[1],2),", SE=,",round(posthoc_emm$se[1],2),")."), sep="\n")
  }

One-way Repeated Measures ANOVA Function: "one_way_repeated_anova(treat_pre_csv_data, treat_post_csv_data, treat_post2_csv_data, m_cutoff = 0.15, m_choice = FALSE, key_csv_data = NULL)"

This function can be used when users collect data from the same students repeatedly at three different time points (e.g., pre-test, post-test, and second post-test) and want to examine the significance of the changes over time. The function automatically merges pre, post, and post2 data sets, runs the One-way repeated measures ANOVA with assumptions check, and then displays outputs all at once.

An example without a file path: one_way_repeated_anova(treat_pre_csv_data = "data_treat_pre.csv", treat_post_csv_data = "data_treat_post.csv", treat_post2_csv_data = "data_treat_post2.csv", m_cutoff = 0.1). If you provide only data file names, as shown in the function above, the folder where the file is saved should be set as the working directory. Alternatively, you can provide the path to the folder in the argument independent_samples

Alternative example with a file path: one_way_repeated_anova(treat_pre_csv_data = "C:/Users/csong/documents/data_treat_pre.csv", treat_post_csv_data = "C:/Users/csong/documents/data_treat_post.csv", treat_post2_csv_data = "C:/Users/csong/documents/data_treat_post2.csv", m_cutoff = 0.1). The 'm_cutoff = 0.1' in the function indicates removing students with 10 percent or more skipped answers (the default is 15 percent; m_cutoff = 0.15).

An example with multiple-choice answers and without a file path: one_way_repeated_anova(treat_pre_csv_data = "data_treat_pre.csv", treat_post_csv_data = "data_treat_post.csv", treat_post2_csv_data = "data_treat_post2.csv", m_cutoff = 0.1, m_choice = TRUE, key_csv_data = “answer_keys.csv”). If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, key_csv_data, with NULL replaced by a CSV file containing the answer keys.

You can provide all arguments through a graphical user interface by running 'gui()' in the R console.

This function automatically merges pre-post-post2 data sets, checks the assumptions of one-way ANCOVA, and then runs the main One-way ANCOVA all at once. The outputs from this function are as follows.

  # Read all data sets
  data_treat_pre <- read_csv("data_treat_pre.csv", show_col_types = FALSE)
  data_treat_post<- read_csv("data_treat_post.csv", show_col_types = FALSE)
  data_treat_post2 <- read_csv("data_treat_post2.csv", show_col_types = FALSE)

  # Replace skipped answers with "0"
  data_treat_pre[is.na(data_treat_pre)]= 0
  data_treat_post[is.na(data_treat_post)]= 0
  data_treat_post2[is.na(data_treat_post2)]= 0

  # Change column names with their origin'sir
  colnames(data_treat_pre) <- paste(colnames(data_treat_pre), "pre", sep = "_")
  colnames(data_treat_post) <- paste(colnames(data_treat_post), "post", sep = "_")
  colnames(data_treat_post2) <- paste(colnames(data_treat_post2), "post2", sep = "_")

  # Calculate average scores
  data_treat_pre <- data_treat_pre %>% mutate(avg_score_pre = round(rowMeans(data_treat_pre[,-1]),3))
  data_treat_post <- data_treat_post %>% mutate(avg_score_post = round(rowMeans(data_treat_post[,-1]),3))
  data_treat_post2 <- data_treat_post2 %>% mutate(avg_score_post2 = round(rowMeans(data_treat_post2[,-1]),3))

  # Merge pre/post/post2 data
  data_treat_prepost<-merge(data_treat_pre, data_treat_post, by.x = "id_pre",  by.y = "id_post")
  treat_data_merged<-merge(data_treat_prepost, data_treat_post2, by.x = "id_pre",  by.y = "id_post2")
  names(treat_data_merged)[names(treat_data_merged) == 'id_pre'] <- "id"

First, the function presents descriptive statistics and boxplots of all three data sets, as shown below.

  # Convert Data Frame to a Long Format & Define the Variable
  avg_score_df <- cbind(treat_data_merged$avg_score_pre, treat_data_merged$avg_score_post, treat_data_merged$avg_score_post2)
  df_Long <- melt(avg_score_df)
  names(df_Long) <- c("id", "Time", "Score")
  # Name Time(group) -> 1=Pre, 2=Post1, 3=Post2
  df_Long$id <- factor(df_Long$id)
  df_Long$Time <- factor(df_Long$Time,levels=c(1,2,3),labels=c("Pre", "Post1", "Post2"))

  # Descriptive Statistics
  descriptive <- df_Long %>% group_by(Time) %>% get_summary_stats(Score, type="common")
  print(descriptive)
  knitr::include_graphics("onewayrepeatedanova_boxplots.png")

Second, the function runs scripts to check all assumptions to be satisfied and presents the results for users to confidently interpret the results of one-way repeated measures ANOVA. The first assumption checked is no outliers; the result is presented along with an interpretation, as shown below.

  outliers <- df_Long %>% group_by(Time) %>% identify_outliers(Score)
  cat("## Interpretation: No extreme outlier was identified in your data.", sep="\n")

Then, the assumption of normality of residuals is checked, and its result (with an interpretation) is presented, as shown below.

  # Check normality of Residuals
  res.aov <- aov(Score~Time, data=df_Long)
  shapiro.test(resid(res.aov))
  cat("--> Interpretation: the residuals were normally distributed (p>0.05).", sep="\n")

Users need to visually examine the histogram and normal Q-Q plot as well to confirm the Shapiro-Wilk normality test result, especially when the sample size is large (e.g., greater than 50) because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size.

  knitr::include_graphics("onewayrepeatedanova_histogram.png")
  knitr::include_graphics("onewayrepeatedanova_qqplots.png")
  cat("--> Interpretation: if all the points fall in the plots above approximately along the reference line, users can assume normality.", sep="\n")

Third, the function runs the one-way repeated measures ANOVA and presents its results, as shown below. The assumption of sphericity is checked as part of the computation of this ANOVA test. The Mauchly's test has been internally run to assess the sphericity assumption. Then, the Greenhouse-Geisser sphericity correction has been automatically applied to factors violating the sphericity of assumption.

  res.aov <- anova_test(data=df_Long, dv=Score, wid=id, within=Time)
  result <- get_anova_table(res.aov)
  print(result)
  if (result$p<0.001) {
    cat(paste0("--> Interpretation: The average test score at different time points of the intervention are  statistically different: F(",result$DFn," ",result$DFd,")=",result$F,", p<0.001, eta2(g)=",result$ges,"."), sep="\n")
  } else if (result$p<0.01) {
    cat(paste0("--> Interpretation: The average test score at different time points of the intervention are  statistically different: F(",result$DFn," ",result$DFd,")=",result$F,", p<0.01, eta2(g)=",result$ges,"."), sep="\n")
  } else if (result$p<0.05) {
    cat(paste0("--> Interpretation: The average test score at different time points of the intervention are  statistically different: F(",result$DFn," ",result$DFd,")=",result$F,", p<0.05, eta2(g)=",result$ges,"."), sep="\n")
  } else {
    cat(paste0("--> Interpretation: The average test score at different time points of the intervention are not statistically different: F(",result$DFn," ",result$DFd,")=",result$F,", p>0.05, eta2(g)=",result$ges,"."), sep="\n")
  }

Then, if the result above turns out to be significant, then the function proceeds to conduct pairwise comparisons and present the results with an interpretation, as shown below.

  pwc <- df_Long %>% pairwise_t_test(Score~Time, paired=TRUE, p.adjust.method="bonferroni")
  print(pwc)
  # Between Pre and Post
  if (pwc$p.adj[2]<0.001) {
    if((descriptive$mean[2]-descriptive$mean[1])>0) {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post-test score (",descriptive$mean[2],") are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.001)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post-test score (",descriptive$mean[2],") are significantly different. The average post-test score is significantly smaller than the average pre-test score (p.adj<0.001)."), sep="\n")
    }
  } else if (pwc$p.adj[2]<0.01) {
    if((descriptive$mean[2]-descriptive$mean[1])>0) {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post1-test score (",descriptive$mean[2],") are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.01)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post1-test score (",descriptive$mean[2],") are significantly different. The average post-test score is significantly smaller than the average pre-test score (p.adj<0.01)."), sep="\n")
    }
  } else if (pwc$p.adj[2]<0.05) {
    if((descriptive$mean[2]-descriptive$mean[1])>0) {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post1-test score (",descriptive$mean[2],") are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.05)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post1-test score (",descriptive$mean[2],") are significantly different. The average post-test score is significantly smaller than the average pre-test score (p.adj<0.05)."), sep="\n")
    }
  } else {
      cat(paste0("--> Interpretation for 1: The average pre-test score (",descriptive$mean[1],") and the average post1-test score (",descriptive$mean[2],") are not significantly different (p.adj>0.05)."), sep="\n")
  }
  # Between Post and Post2
  if (pwc$p.adj[1]<0.001) {
    if((descriptive$mean[3]-descriptive$mean[2])>0) {
      cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.001)."), sep="\n")
    } else {
    cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly smaller than the average post-test score (p.adj<0.001)."), sep="\n")
    }
  } else if (pwc$p.adj[1]<0.01) {
    if((descriptive$mean[3]-descriptive$mean[2])>0) {
      cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.01)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly smaller than the average post-test score (p.adj<0.01)."), sep="\n")
    }
  } else if (pwc$p.adj[1]<0.05) {
    if((descriptive$mean[3]-descriptive$mean[2])>0) {
      cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.05)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly smaller than the average post-test score (p.adj<0.05)."), sep="\n")
    }
  } else {
    cat(paste0("--> Interpretation for 2: The average post1-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are not significantly different (p.adj>0.05)."), sep="\n")
  }
  # Between Pre and Post2
  if (pwc$p.adj[3]<0.001) {
    if((descriptive$mean[3]-descriptive$mean[2])>0) {
      cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[1],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.001)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly smaller than the average pre-test score (p.adj<0.001)."), sep="\n")
    }
  } else if (pwc$p.adj[3]<0.01) {
    if((descriptive$mean[3]-descriptive$mean[2])>0) {
      cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[1],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.01)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[2],") and the average Post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly smaller than the average pre-test score (p.adj<0.01)."), sep="\n")
    }
  } else if (pwc$p.adj[3]<0.05) {
    if((descriptive$mean[3]-descriptive$mean[2])>0) {
      cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[1],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.05)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[2],") and the average post2-test score (",descriptive$mean[3],") are significantly different. The average post2-test score is significantly smaller than the average pre-test score (p.adj<0.05)."), sep="\n")
    }
  } else {
    cat(paste0("--> Interpretation for 3: The average pre-test score (",descriptive$mean[1],") and the average post2-test score (",descriptive$mean[3],") are not significantly different (p.adj>0.05)."), sep="\n")
  }

Fourth, if the normality assumption is violated, the function continues to run the Friedman test, which is the non-parametric version of the parametric one-way repeated measures ANOVA. Although the one-way repeated measures ANOVA is known to be robust to a slight violation of the normality, users may want/need to refer to the result from the non-parametric Friedman test. (The following is just an illuatration of what users will see when the normality assumption is violated; that is, users won't see this from testing the function with the provided sample data because it satisfies the assumption.)

  friedman_result <- friedman.test(avg_score_df)
  friedman_pwc <- df_Long %>% wilcox_test(Score~Time, paired=TRUE, p.adjust.method="bonferroni")
  print(friedman_result)
    if (friedman_result$p.value<0.001) {
      cat(paste0("--> Interpretation: the median test score is significantly different at the different time points during the intervention (p<0.001)."), sep="\n")
    } else if (friedman_result$p.value<0.01) {
      cat(paste0("--> Interpretation: the median test score is significantly different at the different time points during the intervention (p<0.01)."), sep="\n")
    } else if (friedman_result$p.value<0.05) {
      cat(paste0("--> Interpretation: the median test score is significantly different at the different time points during the intervention (p<0.05)."), sep="\n")
    } else {
      cat(paste0("--> Interpretation: the median test score is not significantly different at the different time points during the intervention (p>0.05)."), sep="\n")
    }

Then, the result of pairwise comparisons with the non-normal data is presented with an interpretation.

  friedman_result <- friedman.test(avg_score_df)
  friedman_pwc <- df_Long %>% wilcox_test(Score~Time, paired=TRUE, p.adjust.method="bonferroni")
  print(friedman_pwc)
    # Between Pre and post
    if (friedman_pwc$p.adj[2]<0.001) {
      if((descriptive$median[2]-descriptive$median[1])>0) {
        cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are significantly different. The median post-test score is significantly greater than the median pre-test score (p.adj<0.001)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are significantly different. The median post-test score is significantly smaller than the median pre-test score (p.adj<0.001)."), sep="\n")
      }
    } else if (friedman_pwc$p.adj[2]<0.01) {
      if((descriptive$median[2]-descriptive$median[1])>0) {
        cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are significantly different. The median post-test score is significantly greater than the median pre-test score (p.adj<0.01)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are significantly different. The median post-test score is significantly smaller than the median pre-test score (p.adj<0.01)."), sep="\n")
      }
    } else if (friedman_pwc$p.adj[2]<0.05) {
      if((descriptive$median[2]-descriptive$median[1])>0) {
        cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are significantly different. The median post-test score is significantly greater than the median pre-test score (p.adj<0.05)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are significantly different. The median post-test score is significantly smaller than the median pre-test score (p.adj<0.05)."), sep="\n")
      }
    } else {
      cat(paste0("--> Interpretation for 1: the median pre-test score (",descriptive$median[1],") and the median post-test score (",descriptive$median[2],") are not significantly different (p.adj>0.05)."), sep="\n")
    }
    # Between post and Post2
    if (friedman_pwc$p.adj[1]<0.001) {
      if((descriptive$median[3]-descriptive$median[2])>0) {
        cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly greater than the median post-test score (p.adj<0.001)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly smaller than the median post-test score (p.adj<0.001)."), sep="\n")
      }
    } else if (friedman_pwc$p.adj[1]<0.01) {
      if((descriptive$median[3]-descriptive$median[2])>0) {
        cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly greater than the median post-test score (p.adj<0.01)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly smaller than the median post-test score (p.adj<0.01)."), sep="\n")
      }
    } else if (friedman_pwc$p.adj[1]<0.05) {
      if((descriptive$median[3]-descriptive$median[2])>0) {
        cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly greater than the median post-test score (p.adj<0.05)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly smaller than the median post-test score (p.adj<0.05)."), sep="\n")
      }
    } else {
      cat(paste0("--> Interpretation for 2: the median post-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are not significantly different (p.adj>0.05)."), sep="\n")
    }
    # Between Pre and Post2
    if (friedman_pwc$p.adj[3]<0.001) {
      if((descriptive$median[3]-descriptive$median[2])>0) {
        cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[1],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly greater than the median pre-test score (p.adj<0.001)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly smaller than the median pre-test score (p.adj<0.001)."), sep="\n")
      }
    } else if (friedman_pwc$p.adj[3]<0.01) {
      if((descriptive$median[3]-descriptive$median[2])>0) {
        cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[1],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly greater than the median pre-test score (p.adj<0.01)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly smaller than the median pre-test score (p.adj<0.01)."), sep="\n")
      }
    } else if (friedman_pwc$p.adj[3]<0.05) {
      if((descriptive$median[3]-descriptive$median[2])>0) {
        cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[1],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly greater than the median pre-test score (p.adj<0.05)."), sep="\n")
      } else {
        cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[2],") and the median post2-test score (",descriptive$median[3],") are significantly different. The median post2-test score is significantly smaller than the median pre-test score (p.adj<0.05)."), sep="\n")
      }
    } else {
      cat(paste0("--> Interpretation for 3: the median pre-test score (",descriptive$median[1],") and the median post2-test score (",descriptive$median[3],") are not significantly different (p.adj>0.05)."), sep="\n")
    }

Analysis of Demographic Group Differences Function: "demo_group_diff(score_csv_data, group_csv_data, m_cutoff = 0.15, group_name, m_choice = FALSE, key_csv_data = NULL)"

The 'demogroupdiff()' function requires users to type a data file name as shown in the sample code below. Users can input any of the data file names (e.g., "data_treat_pre.csv", "data_treat_post.csv") as they become available for the analysis of demographic group differences.

An example without a file path: demo_group_diff(score_csv_data = "data_treat_pre.csv", group_csv_data = "demographic_data.csv", m_cutoff = 0.1, group_name = "grade"). If you provide only data file names, as shown in the function above, the folder where the file is saved should be set as the working directory. Alternatively, you can provide the path to the folder in the argument independent_samples

Alternative example with a file path: demo_group_diff(score_csv_data = "C:/Users/csong/documents/data_treat_pre.csv", group_csv_data = "C:/Users/csong/documents/demographic_data.csv", m_cutoff = 0.1, group_name = "grade"). The 'm_cutoff = 0.1' in the function indicates removing students with 10 percent or more skipped answers (the default is 15 percent; m_cutoff = 0.15).

An example with multiple-choice answers and without a file path: demo_group_diff(score_csv_data = "data_treat_pre.csv", group_csv_data = "demographic_data.csv", m_cutoff = 0.1, group_name = "grade", m_choice = TRUE, key_csv_data = “answer_keys.csv”). If the ADF contains multiple-choice answers (i.e., before being transformed into binary data with 1 for 'correct' and 0 for 'incorrect'), users must provide this information by adding the argument m_choice = TRUE. Additionally, users should provide the answer keys for transforming multiple-choice answers into binary data by adding another argument, key_csv_data, with NULL replaced by a CSV file containing the answer keys.

You can provide all arguments through a graphical user interface by running 'gui()' in the R console.

This function automatically combines demographic variables to a dataset, and then runs the analysis of variance (ANOVA) with assumptions check to examine demographic sub-group differences for users all at once. The outputs/results from this function are as follows.

In addition to the criteria to handle skipped answers (i.e., missing values), this function further asks users to choose a demographic variable to analyze. Users will see the demographic group variable names with a numeric code assigned (e.g., gender=1, grade=2). When asked "Enter the number assigned to the demographic variable name users want to analyze," put the number in the R console and hit the 'Enter' key. The function runs all scripts at the back-end and presents the results automatically. The results from this function are as follows.

  # Reading
    data_original <- read_csv("data_treat_pre.csv",col_types = cols())
    demographic_data <- read_csv("demographic_data.csv", show_col_types = FALSE)

  # Replace skipped answers with "0"
  data_original[is.na(data_original)]= 0

  # Calculate average scores
  data_original <- data_original %>% mutate(avg_score = round(rowMeans(data_original[,-1]),3))

  # Merge assessment data and demographic data
  data_original <- merge(data_original, demographic_data, by.x = "id",  by.y = "id")

  # Take demographic variable name and create a subset of the dataframe with average score and demographic data
  # group_name <- readline(prompt="Enter demographic variable name: ")
  data_original <- select(data_original, c('avg_score', 'grade'))
  names(data_original) <- c("average_score", "group")

  one_way <- aov(average_score ~ factor(group), data=data_original)
  pwc <- TukeyHSD(one_way)
  one_way2 <- data_original %>% welch_anova_test(average_score ~ group)
  pwc2 <- data_original %>% games_howell_test(average_score ~ group)

First, the function presents a summary statistics for each demographic sub-group along with boxplots, as shown below.

# Descriptive statistics
  descriptive <- data_original %>% group_by(group) %>% get_summary_stats(average_score, type = "mean_sd")
  print(descriptive)
  knitr::include_graphics("demogroupdiff_boxplots.png")

Second, the function continues to test the assumptions for the parametric one-way ANOVA. The first assumption checked is the normality of residuals; the Shapiro-Wilk test is performed, and its result is presented with an interpretation, as shown below.

  # Check normality of residuals
  shapirotest <- shapiro.test(resid(one_way))
  qqplots <- ggqqplot(residuals(one_way),title = "Normal Q-Q Plot of Residuals")
  Residuals <- residuals(one_way)
  print(shapirotest)
  if (shapirotest$p.value > 0.05) {
    cat("## Interpretation: the assumption of normality by group has been met (p>0.05).", sep="\n")
  } else {
    cat("## Interpretation: the assumption of normality by group has NOT been met (p<0.05). Although ANOVA is robust to a violation of the assumption of normality of residuals, you may want to mention this violation in you report. For example, you can say: 'The data has slightly violated the assumption of normality of residuals, but ANCOVA is known to be robust to this violation (so it's not a serious issue).", sep="\n")
  }

If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size. The plot is presented below with a guidance for interpretation.

  knitr::include_graphics("demogroupdiff_histogram.png")
  knitr::include_graphics("demogroupdiff_qqplots.png")

The second assumption checked is homogeneity of variance. The result is presented with an interpretation and a visual representation, as shown below.

  levene_result <- leveneTest(average_score ~ factor(group), data=data_original)
    print(levene_result)
  if (levene_result$`Pr(>F)`[1] > 0.05) {
    cat("## Interpretation: the assumption of equality of variances has been met (p>0.05).", sep="\n")
  } else {
    cat("## Interpretation: the assumption of equality of variances has NOT been met (p<0.05).", sep="\n")
  }

Third, upon presenting the results of assumption checking, the function proceeds to run the one-way ANOVA with equal variances 'assumed' or 'not assumed' depending on the result of testing homegeneity of variance above. Then, the result with an interpretation is provided, as shown below.

  if (levene_result$`Pr(>F)`[1]>0.05) {
  cat("Results of One-way ANOVA: Group Difference(s) (Parametric: Equal variances assumed)", sep="\n")
  print(summary(one_way))
  cat("--> Interpretation: the difference among the demographic sub-groups is not significant (P>0.05).", sep="\n")
  cat("Pairwide Comparisons (Equal variances assumed)", sep="\n")
  } else {
  cat("Results of One-way ANOVA: Group Difference(s) (Parametric: Unequal variances assumed)")
  message("Since the assumption of equal variances has NOT been satisfied, the Welch one-way test and the Games-Howell post-hoc test results are presented below.")
  print(summary(one_way2))
  cat("--> Interpretation: the difference among the demographic sub-groups is not significant (P>0.05).", sep="\n")
  }

Then, the function runs scripts to conduct pairwise comparisons and presents the result, as shown below.

  if (levene_result$`Pr(>F)`[1]>0.05) {
    cat("Pairwide Comparisons (Equal variances assumed)", sep="\n")
    print(pwc)
  } else {
    message("Since the assumption of equal variances has NOT been satisfied, the Welch one-way test and the Games-Howell post-hoc test results are presented below.")
    cat("Pairwide Comparisons (Unequal variances assumed)", sep="\n")
    print(pwc2)
  }

Fourth, in case the normality assumption is violated, the function continues to run the Kruskal-Wallis test, which is the non-parametric version of the parametric one-way ANOVA. Although ANOVA is known to be robust to a slight violation of the normality, users may want/need to refer to the result from the non-parametric Kruskal-Wallis test. The following is just an illuatration of what users will see when the normality assumption is violated; users won't see this from testing the function with the provided sample data because it satisfies the assumption.

  np_one_way <- kruskal.test(average_score ~ group, data=data_original)
  np_pwc <- pairwise.wilcox.test(data_original$average_score, data_original$group, p.adjust.method = "BH")
  cat("Results of Kruskal_Wallis Test (Non-parametric)", sep="\n")
  print(np_one_way)
  cat("Interpretation: The difference among demographic sub-groups is not significant (p>0.05).", sep="\n")
  print(np_pwc)


HelikarLab/DBERlibR documentation built on Sept. 20, 2023, 12:37 p.m.