Split: Splitting observations into non-overlapping sets
In sharp: Stability-enHanced Approaches using Resampling Procedures

Split

R Documentation

Splitting observations into non-overlapping sets

Description

Generates a list of length(tau) non-overlapping sets of observation IDs.

Usage

Split(data, family = NULL, tau = c(0.5, 0.25, 0.25))

Arguments

`data`	vector or matrix of data. In regression, this should be the outcome data.
`family`	type of regression model. This argument is defined as in `glmnet`. Possible values include `"gaussian"` (linear regression), `"binomial"` (logistic regression), `"multinomial"` (multinomial regression), and `"cox"` (survival analysis).
`tau`	vector of the proportion of observations in each of the sets.

Details

With categorical outcomes (i.e. family argument is set to "binomial", "multinomial" or "cox"), the split is done such that the proportion of observations from each of the categories in each of the sets is representative of that of the full sample.

Value

A list of length length(tau) with sets of non-overlapping observation IDs.

Examples

# Splitting into 3 sets
simul <- SimulateRegression()
ids <- Split(data = simul$ydata)
lapply(ids, length)

# Balanced splits with respect to a binary variable
simul <- SimulateRegression(family = "binomial")
ids <- Split(data = simul$ydata, family = "binomial")
lapply(ids, FUN = function(x) {
  table(simul$ydata[x, ])
})

sharp documentation built on April 11, 2025, 5:44 p.m.