rapidsplit: rapidsplit

View source: R/rapidsplit.R

rapidsplitR Documentation

rapidsplit

Description

A very fast algorithm for computing stratified permutation-based split-half reliability.

Usage

rapidsplit(
  data,
  subjvar,
  aggvar,
  diffvars = NULL,
  stratvars = NULL,
  subscorevar = NULL,
  splits = 6000L,
  aggfunc = c("means", "medians"),
  errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
    600, blockvar = NULL),
  standardize = FALSE,
  include.scores = TRUE,
  verbose = TRUE,
  check = TRUE
)

## S3 method for class 'rapidsplit'
print(x, goal_r = 0.8, ...)

## S3 method for class 'rapidsplit'
plot(
  x,
  type = c("average", "minimum", "maximum", "random", "all"),
  show.labels = TRUE,
  ...
)

rapidsplit.chunks(
  data,
  subjvar,
  aggvar,
  diffvars = NULL,
  stratvars = NULL,
  subscorevar = NULL,
  splits = 6000L,
  aggfunc = c("means", "medians"),
  errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
    600, blockvar = NULL),
  standardize = FALSE,
  include.scores = TRUE,
  verbose = TRUE,
  check = TRUE,
  split.chunksize = 10000L,
  sample.chunksize = 200L
)

Arguments

data

Dataset, a data.frame.

subjvar

Subject ID variable name, a character.

aggvar

Name of variable whose values to aggregate, a character. Examples include reaction times and error rates.

diffvars

Names of variables that determine which conditions need to be subtracted from each other, character.

stratvars

Additional variables that the splits should be stratified by; a character.

subscorevar

A character variable identifying subgroups within a participant's data from which separate scores should be computed. To compute a participant's final score, these subscores will be averaged together. A typical use case is the D-score of the implicit association task.

splits

Number of split-halves to average, an integer. It is recommended to use around 5000.

aggfunc

The function by which to aggregate the variable defined in aggvar; can be "means", "medians", or a custom function (not a function name). This custom function must take a numeric vector and output a single value.

errorhandling

A list with 4 named items, to be used to replace error trials with the block mean of correct responses plus a fixed penalty, as in the IAT D-score. The 4 items are type which can be set to "none" for no error replacement, or "fixedpenalty" to replace error trials as described; errorvar requires name of the logical variable indicating an incorrect response (as TRUE); fixedpenalty indicates how much of a penalty should be added to said block mean; and blockvar indicates the name of the block variable.

standardize

Whether to divide by scores by the subject's SD; a logical. Regardless of whether error penalization is utilized, this standardization will be based on the unpenalized SD of correct and incorrect trials, as in the IAT D-score.

include.scores

Include all individual split-half scores?

verbose

Display progress bars? Defaults to TRUE.

check

Check input for possible problems?

x

rapidsplit object to print or plot.

goal_r

A goal reliability value, which will be used to compute the required test size.

...

Ignored.

type

Character argument indicating what should be plotted. By default, this plots the random split whose correlation is closest to the average. However, this can also plot the random split with the "minimum" or "maximum" split-half correlation, or any "random" split. "all" splits can also be plotted together in one figure, while "many" implies 1000 splits (or less, in case less than 1000 were computed).

show.labels

Should participant IDs be shown above their points in the scatterplot? Defaults to TRUE and is ignored when type is "all".

split.chunksize, sample.chunksize

Number of chunks to divide the splits and sample in for more memory-efficient computation. This has no bearing on the result.

Details

The order of operations (with optional steps between brackets) is:

  • Splitting

  • (Replacing error trials within block within split)

  • Computing aggregates per condition (per subscore) per person

  • Subtracting conditions from each other

  • (Dividing the resulting (sub)score by the SD of the data used to compute that (sub)score)

  • (Averaging subscores together into a single score per person)

  • Computing the covariances of scores from one half with scores from the other half for every split

  • Computing the variances of scores within each half for every split

  • Computing the average split-half correlation with the average variances and covariance across all splits, using corStatsByColumns()

  • Applying the Spearman-Brown formula to the absolute correlation using spearmanBrown(), and restoring the original sign after

cormean() was used to aggregate correlations in previous versions of this package & in the associated manuscript, but the method based on (co)variance averaging was found to be more accurate. This was suggested by prof. John Christie of Dalhousie University.

Value

A list containing

  • r: the averaged reliability.

  • ci: the 95% confidence intervals.

  • allcors: a vector with the reliability of each iteration.

  • nobs: a vector with (1) the number of participants and (2) the average number of values per participant.

  • rcomponents: a list containing the mean variance of the scores of both halves, as well as their mean covariance.

  • scores: the individual participants scores in each split-half, contained in a list with two matrices (Only included if requested with include.scores).

Note

  • rapidsplit() function can use a lot of memory in one go. If you are computing the reliability of a large dataset or you have little RAM, it may pay off to use rapidsplit.chunks() instead.

  • It is currently unclear it is better to pre-process your data before or after splitting it. If you are computing the IAT D-score, you can therefore use errorhandling and standardize to perform these two actions after splitting, or you can process your data before splitting and forgo these two options.

Author(s)

Sercan Kahveci

References

Kahveci, S., Bathke, A.C. & Blechert, J. (2024) Reaction-time task reliability is more accurately computed with permutation-based split-half correlations than with Cronbach’s alpha. Psychonomic Bulletin and Review. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3758/s13423-024-02597-y")}

Examples


data(foodAAT)
# Reliability of the double difference score:
# {RT(push food)-RT(pull food)} - {RT(push object)-RT(pull object)}

frel<-rapidsplit(data=foodAAT,
                 subjvar="subjectid",
                 diffvars=c("is_pull","is_target"),
                 stratvars="stimid",
                 aggvar="RT",
                 splits=100)
                 
print(frel)

plot(frel,type="average")

           
# Compute a single random split-half reliability of the error rate
rapidsplit(data=foodAAT,
           subjvar="subjectid",
           aggvar="error",
           splits=1,
           aggfunc="means")

# Compute the reliability of an IAT D-score
data(raceIAT)
rapidsplit(data=raceIAT,
           subjvar="session_id",
           diffvars="congruent",
           subscorevar="blocktype",
           aggvar="latency",
           errorhandling=list(type="fixedpenalty",errorvar="error",
                              fixedpenalty=600,blockvar="block_number"),
           splits=10,
           standardize=TRUE)


# Compute the reliability of mean RT
# in subsets of 200 splits and 100 participants per run
rapidsplit.chunks(data=foodAAT,
                  subjvar="subjectid",
                  aggvar="RT",
                  splits=400,
                  split.chunksize=200,
                  sample.chunksize=50)


rapidsplithalf documentation built on April 15, 2026, 5:06 p.m.