optimal_ngroups: Optimal number of groups

View source: R/optimal_ngroups.R

optimal_ngroupsR Documentation

Optimal number of groups

Description

Determine the optimal number of groups for a feature.

Usage

optimal_ngroups(
  pd,
  lambda,
  max_ngrps = 15,
  search_grid = seq_len(min(length(unique(pd$y)), max_ngrps))
)

Arguments

pd

Data frame containing the partial dependence effect as returned by get_pd.

lambda

The complexity parameter in the penalized loss function (see the accompanying research paper or R vignette for details on this aspect).

max_ngrps

Integer specifying the maximum number of groups that each feature's values/levels are allowed to be grouped into.

search_grid

Integer vector containing the grid of values to evaluate for the number of groups.

Value

Integer specifying the optimal number of groups. When multiple groupings lead to the lowest loss, the smallest value is returned.

Examples

## Not run: 
data('mtpl_be')
features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat'))
set.seed(12345)
gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~',
                               paste(features, collapse = ' + '))),
                    distribution = 'poisson',
                    data = mtpl_be,
                    n.trees = 50,
                    interaction.depth = 3,
                    shrinkage = 0.1)
gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response'))
gbm_fit %>% get_pd(var = 'ageph',
                   grid = 'ageph' %>% get_grid(data = mtpl_be),
                   data = mtpl_be,
                   subsample = 10000,
                   fun = gbm_fun) %>%
            optimal_ngroups(lambda = 0.00001)

## End(Not run)

henckr/maidrr documentation built on July 27, 2023, 3:17 p.m.