| rapidsplit | R Documentation |
A very fast algorithm for computing stratified permutation-based split-half reliability.
rapidsplit(
data,
subjvar,
aggvar,
diffvars = NULL,
stratvars = NULL,
subscorevar = NULL,
splits = 6000L,
aggfunc = c("means", "medians"),
errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
600, blockvar = NULL),
standardize = FALSE,
include.scores = TRUE,
verbose = TRUE,
check = TRUE
)
## S3 method for class 'rapidsplit'
print(x, goal_r = 0.8, ...)
## S3 method for class 'rapidsplit'
plot(
x,
type = c("average", "minimum", "maximum", "random", "all"),
show.labels = TRUE,
...
)
rapidsplit.chunks(
data,
subjvar,
aggvar,
diffvars = NULL,
stratvars = NULL,
subscorevar = NULL,
splits = 6000L,
aggfunc = c("means", "medians"),
errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
600, blockvar = NULL),
standardize = FALSE,
include.scores = TRUE,
verbose = TRUE,
check = TRUE,
split.chunksize = 10000L,
sample.chunksize = 200L
)
data |
Dataset, a |
subjvar |
Subject ID variable name, a |
aggvar |
Name of variable whose values to aggregate, a |
diffvars |
Names of variables that determine which conditions
need to be subtracted from each other, |
stratvars |
Additional variables that the splits should
be stratified by; a |
subscorevar |
A |
splits |
Number of split-halves to average, an |
aggfunc |
The function by which to aggregate the variable
defined in |
errorhandling |
A list with 4 named items, to be used to replace error trials
with the block mean of correct responses plus a fixed penalty, as in the IAT D-score.
The 4 items are |
standardize |
Whether to divide by scores by the subject's SD; a |
include.scores |
Include all individual split-half scores? |
verbose |
Display progress bars? Defaults to |
check |
Check input for possible problems? |
x |
|
goal_r |
A goal reliability value, which will be used to compute the required test size. |
... |
Ignored. |
type |
Character argument indicating what should be plotted.
By default, this plots the random split whose correlation is closest to the average.
However, this can also plot the random split with
the |
show.labels |
Should participant IDs be shown above their points in the scatterplot?
Defaults to |
split.chunksize, sample.chunksize |
Number of chunks to divide the splits and sample in for more memory-efficient computation. This has no bearing on the result. |
The order of operations (with optional steps between brackets) is:
Splitting
(Replacing error trials within block within split)
Computing aggregates per condition (per subscore) per person
Subtracting conditions from each other
(Dividing the resulting (sub)score by the SD of the data used to compute that (sub)score)
(Averaging subscores together into a single score per person)
Computing the covariances of scores from one half with scores from the other half for every split
Computing the variances of scores within each half for every split
Computing the average split-half correlation with the average variances and covariance
across all splits, using corStatsByColumns()
Applying the Spearman-Brown formula to the absolute correlation
using spearmanBrown(), and restoring the original sign after
cormean() was used to aggregate correlations in previous versions
of this package & in the associated manuscript, but the method based on
(co)variance averaging was found to be more accurate. This was suggested by
prof. John Christie of Dalhousie University.
A list containing
r: the averaged reliability.
ci: the 95% confidence intervals.
allcors: a vector with the reliability of each iteration.
nobs: a vector with (1) the number of participants and (2) the average number of values per participant.
rcomponents: a list containing the mean variance of the scores of both halves,
as well as their mean covariance.
scores: the individual participants scores in each split-half,
contained in a list with two matrices (Only included if requested with include.scores).
rapidsplit() function can use a lot of memory in one go.
If you are computing the reliability of a large dataset or you have little RAM,
it may pay off to use rapidsplit.chunks() instead.
It is currently unclear it is better to pre-process your data before or after splitting it.
If you are computing the IAT D-score,
you can therefore use errorhandling and standardize to perform these two actions
after splitting, or you can process your data before splitting and forgo these two options.
Sercan Kahveci
Kahveci, S., Bathke, A.C. & Blechert, J. (2024) Reaction-time task reliability is more accurately computed with permutation-based split-half correlations than with Cronbach’s alpha. Psychonomic Bulletin and Review. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3758/s13423-024-02597-y")}
data(foodAAT)
# Reliability of the double difference score:
# {RT(push food)-RT(pull food)} - {RT(push object)-RT(pull object)}
frel<-rapidsplit(data=foodAAT,
subjvar="subjectid",
diffvars=c("is_pull","is_target"),
stratvars="stimid",
aggvar="RT",
splits=100)
print(frel)
plot(frel,type="average")
# Compute a single random split-half reliability of the error rate
rapidsplit(data=foodAAT,
subjvar="subjectid",
aggvar="error",
splits=1,
aggfunc="means")
# Compute the reliability of an IAT D-score
data(raceIAT)
rapidsplit(data=raceIAT,
subjvar="session_id",
diffvars="congruent",
subscorevar="blocktype",
aggvar="latency",
errorhandling=list(type="fixedpenalty",errorvar="error",
fixedpenalty=600,blockvar="block_number"),
splits=10,
standardize=TRUE)
# Compute the reliability of mean RT
# in subsets of 200 splits and 100 participants per run
rapidsplit.chunks(data=foodAAT,
subjvar="subjectid",
aggvar="RT",
splits=400,
split.chunksize=200,
sample.chunksize=50)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.