ordGroupBoot: Minimally Group an Ordinal Variable So Bootstrap Samples Will...

View source: R/ordGroupBoot.r

ordGroupBootR Documentation

Minimally Group an Ordinal Variable So Bootstrap Samples Will Contain All Distinct Values

Description

When bootstrapping models for ordinal Y when Y is fairly continuous, it is frequently the case that one or more bootstrap samples will not include one or more of the distinct original Y values. When fitting an ordinal model (including a Cox PH model), this means that an intercept cannot be estimated, and the parameter vectors will not align over bootstrap samples. To prevent this from happening, some grouping of Y may be necessary. The ordGroupBoot function uses cutGn() to group Y so that the minimum number in any group is guaranteed to not exceed a certain integer m. ordGroupBoot tries a range of m and stops at the lowest m such that either all B tested bootstrap samples contain all the original distinct values of Y (if B>0), or that the probability that a given sample of size n with replacement will contain all the distinct original values exceeds aprob (B=0). This probability is computed approximately using an approximation to the probability of complete sample coverage from the coupon collector's problem and is quite accurate for our purposes.

Usage

ordGroupBoot(
  y,
  B = 0,
  m = 7:min(15, floor(n/3)),
  what = c("mean", "factor", "m"),
  aprob = 0.9999,
  pr = TRUE
)

Arguments

y

a numeric vector

B

number of bootstrap samples to test, or zero to use a coverage probability approximation

m

range of minimum group sizes to test; the default range is usually adequate

what

specifies that either the mean y in each group should be returned, a factor version of this with interval endpoints in the levels, or the computed value of m should be returned

aprob

minimum coverage probability sought

pr

set to FALSE to not print the computed value of the minimum m satisfying the needed condition

Value

a numeric vector corresponding to y but grouped, containing eithr the mean of y in each group or a factor variable representing grouped y, either with the minimum m that satisfied the required sample covrage

Author(s)

Frank Harrell

See Also

cutGn()

Examples

set.seed(1)
x <- c(1:6, NA, 7:22)
ordGroupBoot(x, m=5:10)
ordGroupBoot(x, m=5:10, B=5000, what='factor')

harrelfe/Hmisc documentation built on Feb. 10, 2025, 4:09 p.m.