stratrs: Perform stratified random sampling to balance outcomes

View source: R/stratrs.R

stratrsR Documentation

Perform stratified random sampling to balance outcomes

Description

This function is used to perform stratified random sampling to balance outcomes among the shards.

Usage

stratrs(y, C=5, P=0)

Arguments

y

The binary/categorical/continuous outcome.

C

The number of shards to break the data set into.

P

For continuous data, we break the range into P segments via the quantiles. Specifying, P=20 seems to work reasonably well.

Details

To perform BART with large data sets, random sampling is employed to break the data into C shards. Each shard should be balanced with respect to the outcome. For binary/categorical outcomes, stratified random sampling is employed with this function.

Value

A vector is returned with each element assigned to a shard.

See Also

rs.pbart

Examples

set.seed(12)
x <- rbinom(25000, 1, 0.1)
a <- stratrs(x)
table(a, x)
z <- pmin(rpois(25000, 0.8), 5)
b <- stratrs(z)
table(b, z)

BART documentation built on March 31, 2023, 5:17 p.m.