View source: R/indiv-cohorts.r
indiv_cohorts | R Documentation |
An individualized cohort for a case x consists of the most similar, or relevant, cases to x in the available corpus.
indiv_cohorts( data, new_data = NULL, simil_method = "cosine", threshold = NULL, cardinality = NULL, ties_method = "min", weight = NULL, .full_cohorts = FALSE )
data |
A data frame, used as the corpus. |
new_data |
A data frame of index cases, or an integer vector of row
names or numbers used to slice cases from |
simil_method |
A character value, passed to the |
threshold |
A numeric value that similarities between each index case and the cases in its individualized cohort must exceed. |
cardinality |
An integer value that bounds the size of each individualized cohort (up to rank ties). |
ties_method |
passed to the |
weight |
The name of a weight object (a character value) or an object
itself (of the form |
.full_cohorts |
Logical; whether to retain a column of cohorts in
addition to a column of their index sets with respect to |
The individualized cohort about an index case may be capped at a number or a similarity threshold. When the index cases are drawn from the corpus, they are excluded from their own cohorts.
A tibble with columns row
(either seq(nrow(new_data))
or
new_data
, depending on new_data
), new_datum
(each use case formatted
as a one-row data frame), idx
(the row numbers in data
of the
constructed individual cohort), and, optionally, cohort
(the
individualized cohort, formatted as a data frame).
# sample "new data" (testing data) from `mtrows` set.seed(0) mtcars_new <- sample(seq(nrow(mtcars)), 5L) # fix modeling formula mtcars_form <- as.formula(mpg ~ cyl + disp + hp) # fit linear model to training data mtcars_mod <- lm(mtcars_form, mtcars[-mtcars_new, , drop = FALSE]) # construct a cohort for each testing datum (include full cohorts in result) mtcars_cohorts <- indiv_cohorts( mtcars, new_data = mtcars_new, simil_method = "correlation", threshold = .9, cardinality = 10L, ties_method = "min", .full_cohorts = TRUE ) # fit linear model to each cohort # -+- this needs to be made into a standalone function -+- mtcars_cohorts %>% dplyr::mutate(fit = purrr::map(cohort, ~ lm(mtcars_form, .x))) %>% dplyr::mutate(pred = purrr::map2_dbl( row, fit, ~ predict(.y, newdata = dplyr::slice(mtcars, .x)) )) %>% print() -> mtcars_fits # compare global predictions to individualized predictions tibble::tibble( response = mtcars$mpg[mtcars_new], lm_pred = predict(mtcars_mod, dplyr::slice(mtcars, mtcars_new)), im_pred = mtcars_fits$pred ) # construct a cohort for each testing datum (include indices only) mtcars_cohorts <- indiv_cohorts( mtcars, new_data = mtcars_new, simil_method = "correlation", threshold = .9, cardinality = 10L, ties_method = "min" ) # fit linear model to each cohort mtcars_cohorts %>% # -+- how much memory is allocated here? -+- dplyr::mutate(fit = purrr::map( idx, ~ lm(mtcars_form, dplyr::slice(mtcars, .x)) )) %>% dplyr::mutate(pred = purrr::map2_dbl( row, fit, ~ predict(.y, newdata = dplyr::slice(mtcars, .x)) )) %>% print() -> mtcars_fits # compare global predictions to individualized predictions tibble::tibble( response = mtcars$mpg[mtcars_new], lm_pred = predict(mtcars_mod, dplyr::slice(mtcars, mtcars_new)), im_pred = mtcars_fits$pred )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.