balancedataset: balance a data set according to some grouping factor(s)
In gobbios/cfp: Christof's functions

balancedataset

R Documentation

balance a data set according to some grouping factor(s)

Description

balance a data set according to some grouping factor(s)

Usage

balancedataset(xdata, whattobalance, n = NULL)

Arguments

`xdata`	a `data.frame`
`whattobalance`	a character vector with column names. The corresponding columns typically are either factor or character.
`n`	integer, the number of cases to select for each factor level (or combination of factor levels)

Details

the function requires either one or two factors to be balanced over

if n is larger than the largest possible number, there will be a warning to that effect and n will be reset to the largest possible number, i.e. the function behaves as if n = NULL (the default)

Value

a list with 5 items

$seldata the subset of xdata with the selected rows
$unseldata the subset of xdata with the rows that were not selected
$sel the row indices of the selected rows
$unsel the row indices of the rows not selected
$factors the balance factor(s) (= whattobalance)

Author(s)

Christof Neumann

Examples

set.seed(123)
xdata <- data.frame(ID = sample(letters[1:4], 30, replace = TRUE),
context = sample(LETTERS[21:22], 30, replace = TRUE),
var1 = rnorm(30), var2 = rnorm(30))
table(xdata$ID, xdata$context)
balancedataset(xdata = xdata, whattobalance = c("context"), n = 2)$seldata
balancedataset(xdata = xdata, whattobalance = c("context"), n = 3)$seldata
balancedataset(xdata = xdata, whattobalance = c("context"))$seldata

# with two factors
balancedataset(xdata = xdata, whattobalance = c("context", "ID"), n = 1)$seldata

# one combination occurs only once (d/V): row 27 has to be in each data set
table(xdata$ID, xdata$context)
x <- sapply(1:50, function(X){
  row.names(balancedataset(xdata = xdata, whattobalance = c("context", "ID"))$seldata)
})
table(x)

gobbios/cfp documentation built on April 11, 2022, 2:22 a.m.