build_autostrata: Build Autostrata object

View source: R/auto_stratify.R

build_autostrataR Documentation

Build Autostrata object

Description

Not meant to be called externally. Given the arguments to auto_stratify, build the prognostic scores and return the analysis set, the prognostic scores, the pilot set, the prognostic model, and the outcome string. The primary function of this code is to determine the type of prognosis and handle it appropriately.

Usage

build_autostrata(
  data,
  treat,
  prognosis,
  outcome,
  pilot_fraction,
  pilot_size,
  pilot_sample,
  group_by_covariates
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

prognosis

information on how to build prognostic scores. Three different input types are allowed:

  1. vector of prognostic scores for all individuals in the data set. Should be in the same order as the rows of data.

  2. a formula for fitting a prognostic model

  3. an already-fit prognostic score model

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

pilot_sample

a data.frame of held aside samples for building prognostic score model. If pilot_sample is specified, pilot_size and pilot_fraction are both ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Value

a list of: analysis set, prognostic scores, pilot set, prognostic model, and outcome string

See Also

auto_stratify


stratamatch documentation built on March 31, 2022, 9:07 a.m.