by_split: Calculate split scores per participant
In splithalfr: Estimate Split-Half Reliabilities

by_split

R Documentation

Calculate split scores per participant

Description

Calculates split scores, by applying fn_score to subsets of data as specified via participants. It provides a range of additional arguments for different splitting methods and to support parallel processing. To learn more about writing scoring algorithms for use with the splithalfr, see the included vignettes. by_split is modeled after the by function, accepting similar values for the first three arguments (data, INDICES, FUN). For more information about different methods for splitting data, see get_split_indexes_from_stratum. For more information about stratification, see split_df

Usage

by_split(
  data,
  participants,
  fn_score,
  stratification = NULL,
  replications = 1,
  method = c("random", "odd_even", "first_second"),
  replace = FALSE,
  split_p = 0.5,
  subsample_p = 1,
  subsample_n = NULL,
  careful = TRUE,
  match_participants = FALSE,
  ncores = detectCores(),
  seed = NULL,
  verbose = TRUE
)

Arguments

`data`	(data frame) data frame containing data to score. Data should be in long format, with one row per combination of participant and trial or item.
`participants`	(vector) Vector that identifies participants in `data`.
`fn_score`	(function) This function receives split data and should return a single number.
`stratification`	(vector). Vector that identifies which subsets of `data` should be split separately (denoted as strata in splitting functions) in order to ensure they are evenly distributed between parts. By default, the dataset of a participant formes a single stratum.
`replications`	(numeric) Number of replications that split scores are calculated.
`method`	(character) Splitting method. Note that `first_second` and `odd_even` splitting method will only deliver a valid split with default settings for other arguments (`split_p = 0.5, replace = FALSE, subsample_p = 1`)
`replace`	(logical) If TRUE, stratum is sampled with replacement.
`split_p`	(numeric) Desired length of both parts, expressed as a proportion of the length of the data per participant. If `split_p` is larger than 1 and `careful` is FALSE, then parts are automatically sampled with replacement.
`subsample_p`	(numeric) Subsample a proportion of `stratum` before splitting. See Figure 1 of Pronk et al. (2023) <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.3758/s13428-022-01885-6")}>
`subsample_n`	(numeric) Subsample a number of participants before splitting.
`careful`	(boolean) If TRUE, stop with an error when called with arguments that may yield unexpected splits
`match_participants`	(logical) Default FALSE. If FALSE, the split-halves are newly randomized for each iteration and participant. If TRUE, the split-halves are newly randomized for each replication, but within a replication the same randomization is applied across participants. If the order of rows of datasets per participant denotes similar observations (such as items in a questionnaire), `match_participants` can be set to TRUE to ensure that per iteration, the same items are assigned to each part of the split-halves across participants. If `method` is "odd_even" or "first_second", splits are based on row number, so `match_participants` generally has little effects. If TRUE, each stratum should have the same number of rows, as checked via `check_strata`.
`ncores`	(integer). By default, all available CPU cores are used. If 1, split replications are executed serially (via `lapply`). If greater than 1, split replications are executed in parallel, via (via `parLapply`).
`seed`	(integer). When split replications are exectured in parallel, `seed` can be used to specificy a random seet to generate random seeds from for each worker via `clusterSetRNGStream`.
`verbose`	(logical) If TRUE, reports progress. Note that progress across split replications is not displayed when these are executed in parallel.

Value

(data frame) Returns a data frame with a column for participant, a column replication that counts split replications, and score_1 and score_2 for the score calculated of each part via fn_score.

Examples

# N.B. This example uses R script from the vignette: "rapi_sum"
data("ds_rapi", package = "splithalfr")
# Convert to long format
ds_long <- reshape(
  ds_rapi,
  varying = paste("V", 1 : 23, sep = ""),
  v.names = "answer",
  direction = "long",
  idvar = "twnr",
  timevar = "item"
)
# Function for RAPI sum score
rapi_fn_score <- function (data) {
  return (sum(data$answer))
}
# Calculate scores on full data
by(
  ds_long,
  ds_long$twnr,
  rapi_fn_score
)
# Permutation split, one iteration, items matched across participants
split_scores <- by_split(
  ds_long,
  ds_long$twnr,
  rapi_fn_score,
  ncores = 1,
  match_participants = TRUE
)
# Mean flanagan-rulon coefficient across splits
fr <- mean(split_coefs(split_scores, flanagan_rulon))

splithalfr documentation built on June 8, 2025, 10 a.m.