indiv_cohorts: Construct individualized cohorts

View source: R/indiv-cohorts.r

indiv_cohortsR Documentation

Construct individualized cohorts

Description

An individualized cohort for a case x consists of the most similar, or relevant, cases to x in the available corpus.

Usage

indiv_cohorts(
  data,
  new_data = NULL,
  simil_method = "cosine",
  threshold = NULL,
  cardinality = NULL,
  ties_method = "min",
  weight = NULL,
  .full_cohorts = FALSE
)

Arguments

data

A data frame, used as the corpus.

new_data

A data frame of index cases, or an integer vector of row names or numbers used to slice cases from data.

simil_method

A character value, passed to the method parameter of proxy::simil().

threshold

A numeric value that similarities between each index case and the cases in its individualized cohort must exceed.

cardinality

An integer value that bounds the size of each individualized cohort (up to rank ties).

ties_method

passed to the ties.method parameter of rank().

weight

The name of a weight object (a character value) or an object itself (of the form *_weight). If NULL, the default, no weights are calculated.

.full_cohorts

Logical; whether to retain a column of cohorts in addition to a column of their index sets with respect to data.

Details

The individualized cohort about an index case may be capped at a number or a similarity threshold. When the index cases are drawn from the corpus, they are excluded from their own cohorts.

Value

A tibble with columns row (either seq(nrow(new_data)) or new_data, depending on new_data), new_datum (each use case formatted as a one-row data frame), idx (the row numbers in data of the constructed individual cohort), and, optionally, cohort (the individualized cohort, formatted as a data frame).

Examples

# sample "new data" (testing data) from `mtrows`
set.seed(0)
mtcars_new <- sample(seq(nrow(mtcars)), 5L)
# fix modeling formula
mtcars_form <- as.formula(mpg ~ cyl + disp + hp)
# fit linear model to training data
mtcars_mod <- lm(mtcars_form, mtcars[-mtcars_new, , drop = FALSE])
# construct a cohort for each testing datum (include full cohorts in result)
mtcars_cohorts <- indiv_cohorts(
  mtcars, new_data = mtcars_new, simil_method = "correlation",
  threshold = .9, cardinality = 10L, ties_method = "min", .full_cohorts = TRUE
)
# fit linear model to each cohort
# -+- this needs to be made into a standalone function -+-
mtcars_cohorts %>%
  dplyr::mutate(fit = purrr::map(cohort, ~ lm(mtcars_form, .x))) %>%
  dplyr::mutate(pred = purrr::map2_dbl(
    row, fit,
    ~ predict(.y, newdata = dplyr::slice(mtcars, .x))
  )) %>%
  print() ->
  mtcars_fits
# compare global predictions to individualized predictions
tibble::tibble(
  response = mtcars$mpg[mtcars_new],
  lm_pred = predict(mtcars_mod, dplyr::slice(mtcars, mtcars_new)),
  im_pred = mtcars_fits$pred
)
# construct a cohort for each testing datum (include indices only)
mtcars_cohorts <- indiv_cohorts(
  mtcars, new_data = mtcars_new, simil_method = "correlation",
  threshold = .9, cardinality = 10L, ties_method = "min"
)
# fit linear model to each cohort
mtcars_cohorts %>%
  # -+- how much memory is allocated here? -+-
  dplyr::mutate(fit = purrr::map(
    idx,
    ~ lm(mtcars_form, dplyr::slice(mtcars, .x))
  )) %>%
  dplyr::mutate(pred = purrr::map2_dbl(
    row, fit,
    ~ predict(.y, newdata = dplyr::slice(mtcars, .x))
  )) %>%
  print() ->
  mtcars_fits
# compare global predictions to individualized predictions
tibble::tibble(
  response = mtcars$mpg[mtcars_new],
  lm_pred = predict(mtcars_mod, dplyr::slice(mtcars, mtcars_new)),
  im_pred = mtcars_fits$pred
)

corybrunson/imtidy documentation built on Sept. 15, 2022, 1:11 a.m.