format_long_hazards: Generate Augmented Repeated Measures Data for Pooled Hazards...
In nhejazi/haldensify: Highly Adaptive Lasso Conditional Density Estimation

format_long_hazards

R Documentation

Generate Augmented Repeated Measures Data for Pooled Hazards Regression

Description

Generate Augmented Repeated Measures Data for Pooled Hazards Regression

Usage

format_long_hazards(
  A,
  W,
  wts = rep(1, length(A)),
  grid_type = c("equal_range", "equal_mass"),
  n_bins = NULL,
  breaks = NULL
)

Arguments

`A`	The `numeric` vector or similar of the observed values of an intervention for a group of observational units of interest.
`W`	A `data.frame`, `matrix`, or similar giving the values of baseline covariates (potential confounders) for the observed units whose observed intervention values are provided in the previous argument.
`wts`	A `numeric` vector of observation-level weights. The default is to weight all observations equally.
`grid_type`	A `character` indicating the strategy (or strategies) to be used in creating bins along the observed support of the intervention `A`. For bins of equal range, use "equal_range"; consult documentation of `cut_interval` for more information. To ensure each bin has the same number of points, use "equal_mass"; consult documentation of `cut_number` for details.
`n_bins`	Only used if `grid_type` is set to `"equal_range"` or `"equal_mass"`. This `numeric` value indicates the number(s) of bins into which the support of `A` is to be divided.
`breaks`	A `numeric` vector of break points to be used in dividing up the support of `A`. This is passed through the `...` argument to `cut.default` by `cut_interval` or `cut_number`.

Details

Generates an augmented (long format, or repeated measures) dataset that includes multiple records for each observation, a single record for each discretized bin up to and including the bin in which a given observed value of A falls. Such bins are derived from selecting break points over the support of A. This repeated measures dataset is suitable for estimating the hazard of failing in a particular bin over A using a highly adaptive lasso (or other) classification model.

Value

A list containing the break points used in dividing the support of A into discrete bins, the length of each bin, and the reformatted data. The reformatted data is a data.table of repeated measures data, with an indicator for which bin an observation fails in, the bin ID, observation ID, values of W for each given observation, and observation-level weights.

nhejazi/haldensify documentation built on Sept. 25, 2024, 2:32 p.m.