by_split: Calculate split scores per participant

by_splitR Documentation

Calculate split scores per participant

Description

Calculates split scores, by applying fn_score to subsets of data as specified via participants. It provides a range of additional arguments for different splitting methods and to support parallel processing. To learn more about writing scoring algorithms for use with the splithalfr, see the included vignettes. by_split is modeled after the by function, accepting similar values for the first three arguments (data, INDICES, FUN). For more information about different metods for splitting data, see get_split_indexes_from_stratum. For more information about stratification, see split_df

Usage

by_split(
  data,
  participants,
  fn_score,
  stratification = NULL,
  replications = 1,
  method = c("random", "odd_even", "first_second"),
  replace = FALSE,
  split_p = 0.5,
  subsample_p = 1,
  subsample_n = NULL,
  careful = TRUE,
  match_participants = FALSE,
  ncores = detectCores(),
  seed = NULL,
  verbose = TRUE
)

Arguments

data

(data frame) data frame containing data to score. Data should be in long format, with one row per combination of participant and trial or item.

participants

(vector) Vector that identifies participants in data.

fn_score

(function) receives full or split sets, should return a single number.

stratification

(vector). Vector that identifies which subsets of data should be split separately (denoted as strata in splitting functions) in order to ensure they are evenly distributed between parts. By default, the dataset of a participant formes a single stratum.

replications

(numeric) Number of replications that split scores are calculated.

method

(character) Splitting method. Note that first_second and odd_even splitting method will only deliver a valid split with default settings for other arguments (split_p = 0.5, replace = FALSE, subsample_p = 1)

replace

(logical) If TRUE, stratum is sampled with replacement.

split_p

(numeric) Desired length of both parts, expressed as a proportion of the length of the data per participant. If split_p is larger than 1 and careful is FALSE, then parts are automatically sampled with replacement

subsample_p

(numeric) Subsample a proportion of stratum before splitting.

subsample_n

(numeric) Subsample a number of participants before splitting.

careful

(boolean) If TRUE, stop with an error when called with arguments that may yield unexpected splits

match_participants

(logical) Default FALSE. If FALSE, the split-halves are newly randomized for each iteration and participant. If TRUE, the split-halves are newly randomized for each replication, but within a replication the same randomization is applied across participants. If the order of rows of datasets per participant denotes similar observations (such as items in a questionnaire), match_participants can be set to TRUE to ensure that per iteration, the same items are assigned to each part of the split-halves across participants. If method is "odd_even" or "first_second", splits are based on row number, so match_participants generally has little effects. If TRUE, each stratum should have the same number of rows, as checked via check_strata.

ncores

(integer). By default, all available CPU cores are used. If 1, split replications are executed serially (via lapply). If greater than 1, split replications are executed in parallel, via (via parLapply).

seed

(integer). When split replications are exectured in parallel, seed can be used to specificy a random seet to generate random seeds from for each worker via clusterSetRNGStream.

verbose

(logical) If TRUE, reports progress. Note that progress across split replications is not displayed when these are executed in parallel.

Value

(data frame) Returns a data frame with a column for participant, a column replication that counts split replications, and score_1 and score_2 for the score calculated of each part via fn_score.

Examples

# N.B. This example uses R script from the vignette: "rapi_sum"
data("ds_rapi", package = "splithalfr")
# Convert to long format
ds_long <- reshape(
  ds_rapi,
  varying = paste("V", 1 : 23, sep = ""),
  v.names = "answer",
  direction = "long",
  idvar = "twnr",
  timevar = "item"
)
# Function for RAPI sum score
rapi_fn_score <- function (data) {
  return (sum(data$answer))
}
# Calculate scores on full data
by(
  ds_long,
  ds_long$twnr,
  rapi_fn_score
)
# Permutation split, one iteration, items matched across participants
split_scores <- by_split(
  ds_long,
  ds_long$twnr,
  rapi_fn_score,
  ncores = 1,
  match_participants = TRUE
)
# Mean flanagan-rulon coefficient across splits
fr <- mean(split_coefs(split_scores, flanagan_rulon))

splithalfr documentation built on Sept. 15, 2023, 1:08 a.m.