In gadenbuie/mctestanalysis: Multiple Choice Test Analysis

knitr::opts_chunk$set(message=FALSE, warning=FALSE, echo=FALSE, results = 'asis')
if (!requireNamespace('pander', quietly = TRUE)) {
  message("\npander is not installed. If there are any issues with tables ",
          "in the final output, try `install.packages('pander')`.")
}
as_table <- function(x, pander = requireNamespace('pander', quietly = TRUE), ...) {
  if (pander && !requireNamespace('pander', quietly = TRUE)) pander <- FALSE
  if (pander) {
    x <- as.data.frame(x)
    pander::pandoc.table(x, split.tables = Inf, ...)
  } else {
    knitr::kable(x, row.names = FALSE, ...)
  }
}

embolden <- function(x, type = 'markdown') {
  switch(
    type,
    'markdown' = paste0('**', x, '**'),
    'html' = paste0('<strong>', x, '</strong>')
  )
}

if (!exists('report_options')) report_options <- list()
stopifnot(exists('mctd'))
mctd <- requires(mctd, c('item.score', 'item.analysis', 'alpha', 'irt_models'))

\clearpage

Introduction

The purpose of this generated report is to provide the analytical framework proposed in the paper "An Analytic Framework for Evaluating the Validity of Concept Inventory Claims" [@Jorion2015] from the University of Chicago, while providing extra statistical routines based on Classical Test Theory. Within the contents of this report, you will find graphical representations such as plots, graphs, and tables all intended to support an analysis of a multiple-choice test based on the framework proposed in Jorion’s paper.

Test Overview and Descriptions

Answer Key

as_table(mctd$AnswerKey)

\clearpage

Overall Score Histogram

plotOverallHistogram(mctd)

Option Selection by Item

The following table presents the percentage of students selecting each option by item.

summarizeSelectedOptions(
  mctd,
  include_columns = c('Title', 'Answer', 'Concept'),
  questions_as_row_names = FALSE,
  as_percentage = TRUE,
  correct_vs_incorrect = FALSE
) %>% as_table

Table: Option selection by item

Classic Test Theory

Summary

The following tables provide common statistical parameters used in Classic Test Theory (CTT).

Cronbach Alpha : The coefficient of internal reliability, indicating how closely related the set of items are as a group.

Cronbach Alpha without item (WOI) : The Cronbach Alpha calculated for the test without including the item of interest.

Subscale Alpha : The Cronbach Alpha for the subscale or concept group. The value of alpha is influenced by test length, so it is expected that a low number of items per subscale will result in a lower subscale alpha value.

Difficulty Index : Measures the proportion of students who answered the test item accurately. Higher values close to 1 are indicative of less difficult items (more students answered the item correctly), while lower values close to 0 are associated with more difficult items.

Discrimination Index : Measures the ability of the item to discriminate between high and low scoring students. Positive values indicate that the students who scored well on the overall test tended to answer this question correctly, while students who scored poorly on the overall test were likely to answer this question incorrectly. Negative values indicate the opposite -- low-scoring students were more likely to answer the question correctly, while high-scoring students tended to choose the wrong answer -- and suggest that the item should be reviewed. Values near zero suggest the item does not differentiate between high- and low-performing students.

Item Variance : Measures the spread among item responses.

Point-Biserial Correlation Coefficient : (PBCC) Measures the Pearson correlation between a dichotomous variable, in this case the dichotomously scored item (correct/incorrect), and a continuous variable, in this case the overall test score.

Modified Point-Biserial Correlation Coefficient : (Modified PBCC) Measures PBCC where item scores are correlated to overall test scores without considering the given item in the overall test score.

\newpage

Test Summary

summarizeCTT(mctd) %>% 
  select(-Measure) %>% 
  as_table(digits = 4, round = 4)

Table: Classic Test Theory Summary

Test Summary by Concept Group

as_table(summarizeCTT(mctd, 'concept', digits.round = 3), digits = 4, round = 4)

Table: Classic Test Theory Summary by Concept Group

\newpage \blandscape

Test Summary by Item

as_table(summarizeCTT(mctd, 'item', digits.round = 3), digits = 4, round = 4)

Table: Classic Test Theory Summary by Item

\elandscape \clearpage

Item Discrimination

The below scatter plots compare the three measures of item discrimination with the item difficulty. Dotted guidelines indicate the recommended ranges for each index. Note that the full difficulty index range is from 0 to 1 and the full range of the discrimination indices is from -1 to 1 --- although a discrimination index (or PBCC or Modified PBCC) of less than 0.2 is not recommended.

decorate <- function(g) {
  g + theme(
    panel.border = element_rect(size=1,linetype="solid",color="black",fill=NA),
    plot.margin = unit(rep(0.5, 4), 'cm')
  )
}
gridExtra::grid.arrange(
  top=grid::textGrob("Discrimination Analysis", gp=grid::gpar(fontsize=20)),
  decorate(plotDiscriminationDifficulty(mctd, "conventional", show_gridlines = FALSE)),
  decorate(plotDiscriminationDifficulty(mctd, "pbcc", show_gridlines = FALSE)),
  decorate(plotDiscriminationDifficulty(mctd, "pbcc_modified", show_gridlines = FALSE)),
  ncol = 2
)

\clearpage

Overall Score vs. Question Score

The following plot compares the respondents' performance on a single item (correct or incorrect) to their overall score on the test. The plots are organized by concept group, and within in subplot, the boxplot displays the range of overall test scores among the respondents who correctly and incorrectly answered each question.

Intuitively, a question for which there is very little overlap between the boxplots of the correct and incorrect group is more discerning between the high and low performing students. Questions for which the box plots are mostly overlapping are not as good at differentiating between students.

Additionally, the range of each boxplot indicates whether the question is correctly (or incorrectly) answerd by students with a wide range of overall performance or more consistently by a students of a particular overal ability.

Generally, it is best for the boxplot of the correct group to be mostly above the boxplot of the incorrect group. Questions that have complete overlap between the two boxplots should be reviewed.

plotTestScoreByQuestion(mctd, facet_by_concept = TRUE) +
  ggtitle("Overall Test Score vs. Question Score (Correct/Incorrect)")

\clearpage

Distractor Analysis

if (!is.null(report_options[['distractor.pct']])) {
  distractor.pct <- report_options$distractor.pct
} else distractor.pct <- 0.33

distractor.data <- summarizeDistractors(mctd, distractor.pct)
distractor.data.counts <- distractor.data %>% 
  filter(Question == distractor.data[1, 'Question']) %>% 
  group_by(Group) %>% 
  summarize(total = sum(count))

# For turning percentile into words
percentile_abreviation <- c('th', 'st', 'nd', 'rd', rep('th', 6))
first_digit <- function(x) round(x %% 10, 0)
pct_to_text <- function(x) {
  paste0(round(x * 100, 0), percentile_abreviation[first_digit(x*100) + 1])
}

The following plot and table compare the percentage of all respondents who select a given option for each item. These tables allow the test administrator to analize the performance of item options and to determine if the choice of distracting items reveals information about the misconceptions in students' knowledge. Repondents are grouped into the upper and lower r pct_to_text(distractor.pct) percentiles by overall test score. For this report, there were r distractor.data.counts %>% filter(Group == 'high') %>% .$total respondents in the upper r pct_to_text(distractor.pct) percentile and r distractor.data.counts %>% filter(Group == 'low') %>% .$total repondents in the lower r pct_to_text(distractor.pct) percentile. Percentages are calculated relative to the total number of respondents, in this case r nrow(mctd$Test.complete) students.

plotDistractors(mctd, distractor.pct) + 
  ggtitle('Distractor Analysis')

\clearpage

n_options <- length(unique(distractor.data$Option))
col_justifications <- paste(paste0(rep('c', 2), collapse = ''), paste0(rep('r', n_options*2), collapse = ''), sep = '')
distractor.data %>% 
  mutate(pct = sprintf("%0.2f", pct*100), 
         pct = ifelse(Correct, embolden(pct), pct), 
         Group = c('high' = 'H', 'low' = 'L')[Group], 
         OptionGroup = paste(Option, Group, sep = '')) %>% 
  select(Question, Title, OptionGroup, pct) %>% 
  reshape2::dcast(Question + Title ~ OptionGroup, value.var = 'pct') %>% 
  as_table(justify = col_justifications)

Table: Percentage of total respondents ($N=$ r nrow(mctd$Test.complete)) from upper (High, $N=$ r distractor.data.counts %>% filter(Group == 'high') %>% .$total) and lower (Low, $N=$ r distractor.data.counts %>% filter(Group == 'low') %>% .$total) r pct_to_text(distractor.pct) percentiles having chosen each item option. The percentage of students choosing the correct option for each item are highlighted in bold.

\clearpage

Item Review Recommendations

Review Recommendations Criteria

Alpha : If Cronbach's Alpha for the test with the item deleted is less than the alpha coefficient for the whole test then the recommendation is to Keep the item.

Jorion : If the Difficulty Index is between 0.3 and 0.9, and the Discrimination Index is greater than 0.2, then the recommendation is to Keep the item.

Versatile : This recommendation is based on the Difficulty Index and PBCC and provides a range of recommendations from Remove to Review through Keep, favoring positive PBCC values near to or greater than 0.3 and higher difficulty values. The criteria for this recommendation are based the criteria published by @Sleeper2011, reproduced below.

Stringent : If the Difficulty Index is between 0.3 and 0.9, and the Point-Biserial Correlation Coefficient is greater than 0.3, then the recommendation is to Keep the item.

"Versatile" Recommendation Criteria

The Versatile recommendation criteria are based on criteria published by @Sleeper2011. The table below reproduces the source material that is unfortunately no longer available online.

| Difficulty Score (%) | PBCC $[0.3, 1.0]$ | PBCC $[0.15, 0.3)$ | PBCC $[0.0, 0.15)$ | PBCC $[-1, 0)$ | |----------------------|-------------------|---------------------|--------------------|----------------| | $[0, 30]$ | Review | Review/Remove | Remove | Remove | | $(30, 50]$ | Keep (Tough) | Review | Review/Remove | Remove | | $(50, 80]$ | Keep | Keep | Review/Keep | Review | | $(80, 100]$ | Keep | Keep | Keep (Easy) | Review |

Table: Versatile recommendation criteria from @Sleeper2011

\clearpage

Review Recommendations Table

recommendItemActions(mctd, include_columns = c("Title", "Concept"), digits.round = 2) %>% as_table()

Table: Recommendations for each test item based on the criteria described above.

\clearpage

Item Response Theory

if (!is.null(report_options[['irt_model_choice']])) {
  pl_number <- report_options$irt_model_choice %>% as.integer
  flag_model_chosen <- TRUE
} else {
  flag_model_chosen <- FALSE
  pl_number <- which(mctd$irt_models$AIC == min(mctd$irt_models$AIC)) %>% 
    names() %>% substr(start = 3, stop = 3) %>% as.integer
}
number_words <- c('one', 'two', 'three')
pl_name <- paste0('PL', pl_number)
# Check that chosen PL actually worked and is available
pl_changed <- FALSE
if (is.null(mctd$irt_models[[pl_name]])) {
  pl_others <- c(1:3)[-pl_number]
  pl_new <- c()
  for (i in pl_number) {
    if (!length(pl_new) && !is.null(mctd$irt_models[[paste0("PL", i)]])) {
      pl_new <- i
      pl_changed <- TRUE
      warning("Changed IRT parameter choice to", pl_new)
    } else next
  }
  pl_old <- pl_number
  pl_number <- pl_new
  pl_name <- paste0('PL', pl_number)
}

Model Summary

r ifelse(flag_model_chosen, 'The model selected by the user for', "One-, two- and three-parameter logistic models were fit to the test results data. The model chosen for the remainder of") this analysis was the r number_words[pl_number]-factor logistic model, which had r ifelse(flag_model_chosen, 'an', 'the lowest') AIC of $r mctd$irt_models$AIC[pl_number] %>% round(1)$.

if (pl_changed) {
  cat('\n*Note that the ', pl_old, '-parameter model failed to fit, so the ',
      pl_number, '-paramter model was used instead.*\n', sep = '')
}

Model Parameters

irt_help_text <- list(
  paste(
    "Difficulty\n:   ",
    "The difficulty parameter, \\(\\beta\\), sometimes",
    "called the threshold parameter, describes the difficulty of a given item.",
    "It is the only parameter estimated in the 1-PL (Rasch) model.\n\n"
  ),
  paste(
    "Discrimination\n:   ",
    "The discrimination parameter, \\(\\alpha\\),",
    "reflects the effectiveness of the item in differentiating between high- and",
    "low-performing students. This parameter is estimated in the 2-PL model, in",
    "addition to difficulty.\n\n"
  ),
  paste(
    "Guessing\n:   ",
    "The guessing parameter, \\(\\gamma\\), is included in the",
    "3-PL model, in addition the previous parameters, and reflects the influence",
    "of guessing for each item.\n\n"
  ),
  paste(
    "Prob.\n:   ",
    "The probability column gives the probability that an average",
    "student will correctly answer the item, i.e.",
    "\\(\\mathrm{P}(x_i = 1 \\vert z = 0)\\).\n\n"
  ),
  # Discrimination description for Rasch model
  paste(
    "Discrimination\n:  ",
    "In the 1-PL Rasch model, the discrimination parameter is assumed to be equivalent across all items.",
    "This assumption leads to consistent ICC curves where more difficult questions are always less easy",
    "for all students. When the discrimination parameter is allowed to vary, for two items of similar",
    "difficulty one item can be both easier for low-performing students and harder for high-performing",
    "students when compared with the second item (or vice-versa).\n\n"
  )
)
# Print out IRT model parameter text
do.call('cat', irt_help_text[c(1:pl_number, if(pl_number == 1) 5, 4)])

summarizeIRT(mctd, pl_number) %>% as_table()

Item Characteristic Curves

icc_group <- ifelse(is.null(report_options$icc_group), 'concept', report_options$icc_group)
if (icc_group == 'concept' && length(unique(mctd$AnswerKey$Concept)) < 2) icc_group <- 'question'

if (icc_group == 'concept') {
  n.concept <- length(unique(mctd$AnswerKey$Concept))
  plot.rows <- ifelse(n.concept > 4, ceiling(n.concept/2), 4)
  plot.rows <- ifelse(n.concept <= 2, 3, plot.rows)
  plot.cols <- ifelse(n.concept > 4, 2, 1)
  par(mfrow = c(plot.rows, plot.cols))
  for (concept in unique(mctd$AnswerKey$Concept)) {
    questions <- mctd$AnswerKey %>%
      mutate(n = 1:nrow(.)) %>%
      filter(Concept == concept) %>% .$n
    plot_title <- paste0("Item Characteristic Curves: Concept \"", concept, '"')
    switch(pl_name,
           'PL1' = ltm::plot.rasch(mctd$irt_models[['PL1']], 
                                   type = "ICC", items = questions,
                                   main = plot_title),
           'PL2' = ltm::plot.ltm(mctd$irt_models[['PL2']], 
                                 type = "ICC", items = questions,
                                 main = plot_title),
           'PL3' = ltm::plot.tpm(mctd$irt_models[['PL3']], 
                                 type = "ICC", items = questions,
                                 main = plot_title)
    )
  }
} else {
  n.question <- nrow(mctd$AnswerKey)
  par(mfrow = c(ceiling(n.question/6), 3))
  for (item in 1:n.question) {
    plot_title <- paste0("Item Characteristic Curves: \"", mctd$AnswerKey[item, 'Title'] , '"')
    switch(pl_name,
           'PL1' = ltm::plot.rasch(mctd$irt_models[['PL1']], 
                                   type = "ICC", items = item,
                                   main = plot_title),
           'PL2' = ltm::plot.ltm(mctd$irt_models[['PL2']], 
                                 type = "ICC", items = item,
                                 main = plot_title),
           'PL3' = ltm::plot.tpm(mctd$irt_models[['PL3']], 
                                 type = "ICC", items = item,
                                 main = plot_title)
    )
  }
}
par(mfrow = c(1,1))

\clearpage

Introductory Factor Analysis

Tetrachoric Plot

The following plot shows the item-by-item tetrachoric correlation for all questions in the test. The tetrachoric correlation estimates the correlation between two variables whose measurement is artificially dichotomized but whose underlying joint ditribution is a bivariate normal distribution. In the case of Item Response Theory, the tetrachoric correlation is seen as the correlation between the response to two items when "each item is assumed to represent an underlying ability which is reflected as a probability of responding correctly to the item and the items are coded as correct or incorrect" [@PersonalityProject].

Considering that the tetrachoric correlation matrix represents item correlations for all test items, then the structure of this matrix directly correpsonds to the structure of the underlying latent variables measured by the test. This is --- at a very high level --- the goal of factor analysis. For more information, the Personality Project webpage provides excellent resources. The tetrachoric correlation plot is included here for visual inspection of the underlying structure, as this matrix will be used in the factor analysis that follows.

plotTetrachoric(mctd, TRUE, TRUE)

Scree Plot

scree_factors <- plotScree(mctd, TRUE)

A method for determining the number of factors or components in the tetrachoric correlation matrix of the test responses is to examine the scree plot of the eigenvalues of the correlation matrix. Typically, when using a scree plot, the analyst is looking for a sharp break in the slope of the line between the eigenvalues of the correlation matrix. In parallel analysis, the scree of factors from the observed data is compared to that of a random data matrix of the same size as the observed. Parallel analysis suggests a number of factors/components by comparing the eigenvalues of the factors/components of the observed data to the random data and keeping those that are greater than the random data.

Parallel analysis for the test results in this report suggest that the number of factors is r scree_factors['nfact'] and the number of components is r scree_factors['ncomp'].

flag_nfactors_chosen_by_user <- FALSE
if (!is.null(report_options[['efa.nfactors']])) {
  flag_nfactors_chosen_by_user <- TRUE
  efa.nfactors <- report_options$efa.nfactors
  if (efa.nfactors == 0) efa.nfactors <- length(unique(mctd$AnswerKey$Concept))
  else if (efa.nfactors == -1) efa.nfactors <- scree_factors['nfact']
} else efa.nfactors <- length(unique(mctd$AnswerKey$Concept))
if (efa.nfactors <= 10) cat("\n\\clearpage\n")

Exploratory Factor Analysis

efa.cut         <- ifelse(!is.null(report_options[['efa.cut']]), report_options$efa.cut, 0.3)
efa.n.obs       <- nrow(mctd$Test.complete)
efa.rotate      <- ifelse(!is.null(report_options[['efa.rotate']]), report_options$efa.rotate, 'varimax')
efa.rotate.text <- paste0("`'", efa.rotate, "'`")
efa.fm          <- ifelse(!is.null(report_options[['efa.fm']]), report_options$efa.fm, 'minres')
efa.fm.text     <- paste0("`'", efa.fm, "'`")

The table below presents the factor loadings, where r efa.nfactors were explored, using the fa() function from the psych package (see @pkg:psych for more information on the options available for this function). In this report, the EFA used the r efa.rotate.text rotation method and the r efa.fm.text factoring method. Factors with absolute value loadings less than r efa.cut were suppressed.

if (efa.nfactors > 10) cat("\n\\newpage\n\\blandscape\n\n")
justification <- paste(c('cc', rep('r', efa.nfactors)), collapse = '')
summarizeEFA(mctd,
         cut      = efa.cut,
         nfactors = efa.nfactors, 
         n.obs    = efa.n.obs,
         rotate   = efa.rotate,
         fm       = efa.fm) %>% 
  mutate_if(is.numeric, function(x) ifelse(is.na(x), '-', sprintf('%0.3f', x))) %>%
  as_table(pander = FALSE, align = justification)
cat("\nTable: Exploratory Factor Analysis with ", 
    efa.nfactors, " factors ",
    "using `", efa.rotate.text, "` rotation and ",
    "`", efa.fm.text, "` factoring.\n",
    sep = '')
if (efa.nfactors > 10) cat("\n\\elandscape\n\n")

\clearpage

References

gadenbuie/mctestanalysis documentation built on May 16, 2019, 5:34 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gadenbuie/mctestanalysis
Multiple Choice Test Analysis

In gadenbuie/mctestanalysis: Multiple Choice Test Analysis

Introduction

Test Overview and Descriptions

Answer Key

Overall Score Histogram

Option Selection by Item

Classic Test Theory

Summary

Test Summary

Test Summary by Concept Group

Test Summary by Item

Item Discrimination

Overall Score vs. Question Score

Distractor Analysis

Item Review Recommendations

Review Recommendations Criteria

"Versatile" Recommendation Criteria

Review Recommendations Table

Item Response Theory

Model Summary

Item Characteristic Curves

Introductory Factor Analysis

Tetrachoric Plot

Scree Plot

Exploratory Factor Analysis

References

R Package Documentation

Browse R Packages

We want your feedback!

gadenbuie/mctestanalysis Multiple Choice Test Analysis

In gadenbuie/mctestanalysis: Multiple Choice Test Analysis

Introduction

Test Overview and Descriptions

Answer Key

Overall Score Histogram

Option Selection by Item

Classic Test Theory

Summary

Test Summary

Test Summary by Concept Group

Test Summary by Item

Item Discrimination

Overall Score vs. Question Score

Distractor Analysis

Item Review Recommendations

Review Recommendations Criteria

"Versatile" Recommendation Criteria

Review Recommendations Table

Item Response Theory

Model Summary

Item Characteristic Curves

Introductory Factor Analysis

Tetrachoric Plot

Scree Plot

Exploratory Factor Analysis

References

R Package Documentation

Browse R Packages

We want your feedback!

gadenbuie/mctestanalysis
Multiple Choice Test Analysis