group_pd: Partial dependence grouping

View source: R/group_pd.R

group_pdR Documentation

Partial dependence grouping

Description

Grouping of feature values/levels by binning continuous/ordinal features and clustering nominal features. Partial dependencies are used to perform the grouping of feature values/levels with similar behavior in a data-driven way.

Usage

group_pd(pd, ngroups)

group_pd_ckseg(pd, ngroups)

group_pd_ckmns(pd, ngroups)

Arguments

pd

Data frame containing the partial dependence effect as returned by get_pd.

ngroups

Integer specifying the number of groups.

Value

Tidy data frame (i.e., a "tibble" object) supplied in pd with three additional columns: xgrp, ygrp and wgrp. Column xgrp contains feature groups, column ygrp the average partial dependence for the group and wgrp the sum of observation counts for the group.

Functions

  • group_pd_ckseg: Grouping via Cksegs.1d.dp.

  • group_pd_ckmns: Grouping via Ckmeans.1d.dp.

Examples

## Not run: 
data('mtpl_be')
features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat'))
set.seed(12345)
gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~',
                               paste(features, collapse = ' + '))),
                    distribution = 'poisson',
                    data = mtpl_be,
                    n.trees = 50,
                    interaction.depth = 3,
                    shrinkage = 0.1)
gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response'))
gbm_fit %>% get_pd(var = 'ageph',
                   grid = get_grid(var = 'ageph', data = mtpl_be),
                   data = mtpl_be,
                   subsample = 10000,
                   fun = gbm_fun) %>%
            group_pd(ngroups = 5)

## End(Not run)

henckr/maidrr documentation built on July 27, 2023, 3:17 p.m.