indiv_cohorts: Construct individualized cohorts
In corybrunson/imtidy: 'tidymodels' Extension for Individualized Models

View source: R/indiv-cohorts.r

indiv_cohorts

R Documentation

Construct individualized cohorts

Description

An individualized cohort for a case x consists of the most similar, or relevant, cases to x in the available corpus.

Usage

indiv_cohorts(
  data,
  new_data = NULL,
  simil_method = "cosine",
  threshold = NULL,
  cardinality = NULL,
  ties_method = "min",
  weight = NULL,
  .full_cohorts = FALSE
)

Arguments

`data`	A data frame, used as the corpus.
`new_data`	A data frame of index cases, or an integer vector of row names or numbers used to slice cases from `data`.
`simil_method`	A character value, passed to the `method` parameter of `proxy::simil()`.
`threshold`	A numeric value that similarities between each index case and the cases in its individualized cohort must exceed.
`cardinality`	An integer value that bounds the size of each individualized cohort (up to rank ties).
`ties_method`	passed to the `ties.method` parameter of `rank()`.
`weight`	The name of a weight object (a character value) or an object itself (of the form `*_weight`). If `NULL`, the default, no weights are calculated.
`.full_cohorts`	Logical; whether to retain a column of cohorts in addition to a column of their index sets with respect to `data`.

Details

The individualized cohort about an index case may be capped at a number or a similarity threshold. When the index cases are drawn from the corpus, they are excluded from their own cohorts.

Value

A tibble with columns row (either seq(nrow(new_data)) or new_data, depending on new_data), new_datum (each use case formatted as a one-row data frame), idx (the row numbers in data of the constructed individual cohort), and, optionally, cohort (the individualized cohort, formatted as a data frame).

Examples

# sample "new data" (testing data) from `mtrows`
set.seed(0)
mtcars_new <- sample(seq(nrow(mtcars)), 5L)
# fix modeling formula
mtcars_form <- as.formula(mpg ~ cyl + disp + hp)
# fit linear model to training data
mtcars_mod <- lm(mtcars_form, mtcars[-mtcars_new, , drop = FALSE])
# construct a cohort for each testing datum (include full cohorts in result)
mtcars_cohorts <- indiv_cohorts(
  mtcars, new_data = mtcars_new, simil_method = "correlation",
  threshold = .9, cardinality = 10L, ties_method = "min", .full_cohorts = TRUE
)
# fit linear model to each cohort
# -+- this needs to be made into a standalone function -+-
mtcars_cohorts %>%
  dplyr::mutate(fit = purrr::map(cohort, ~ lm(mtcars_form, .x))) %>%
  dplyr::mutate(pred = purrr::map2_dbl(
    row, fit,
    ~ predict(.y, newdata = dplyr::slice(mtcars, .x))
  )) %>%
  print() ->
  mtcars_fits
# compare global predictions to individualized predictions
tibble::tibble(
  response = mtcars$mpg[mtcars_new],
  lm_pred = predict(mtcars_mod, dplyr::slice(mtcars, mtcars_new)),
  im_pred = mtcars_fits$pred
)
# construct a cohort for each testing datum (include indices only)
mtcars_cohorts <- indiv_cohorts(
  mtcars, new_data = mtcars_new, simil_method = "correlation",
  threshold = .9, cardinality = 10L, ties_method = "min"
)
# fit linear model to each cohort
mtcars_cohorts %>%
  # -+- how much memory is allocated here? -+-
  dplyr::mutate(fit = purrr::map(
    idx,
    ~ lm(mtcars_form, dplyr::slice(mtcars, .x))
  )) %>%
  dplyr::mutate(pred = purrr::map2_dbl(
    row, fit,
    ~ predict(.y, newdata = dplyr::slice(mtcars, .x))
  )) %>%
  print() ->
  mtcars_fits
# compare global predictions to individualized predictions
tibble::tibble(
  response = mtcars$mpg[mtcars_new],
  lm_pred = predict(mtcars_mod, dplyr::slice(mtcars, mtcars_new)),
  im_pred = mtcars_fits$pred
)

corybrunson/imtidy documentation built on Sept. 15, 2022, 1:11 a.m.