segmentation: Data segmentation

View source: R/segmentation.R

segmentationR Documentation

Data segmentation

Description

Segmentation of observations based on the grouping of feature effects.

Usage

segmentation(fx_vars, data, type, values, max_ngrps = 15)

Arguments

fx_vars

List of data frames containing the feature effects.

data

Data frame containing the original training data.

type

String specifying the type of segmentation. Options are:

'ngroups'

the number of groups to use for grouping the features.

'lambdas'

optimal number of groups determined by penalized loss.

values

The values for ngroups or lambdas. This can be a numeric value (same is used for all features in fx_vars) or a named numeric vector of length(fx_vars) (for feature-specific values). In this case, the names must match the comment attributes in fx_vars.

max_ngrps

Integer specifying the maximum number of groups that each feature's values/levels are allowed to be grouped into. Only used when determinining the optimal number of groups via type = 'lambdas'.

Value

Data frame with the segmented data. The grouped features are added to the original data and have a trailing underscore in their name.

Examples

## Not run: 
data('mtpl_be')
features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat'))
set.seed(12345)
gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~',
                               paste(features, collapse = ' + '))),
                    distribution = 'poisson',
                    data = mtpl_be,
                    n.trees = 50,
                    interaction.depth = 3,
                    shrinkage = 0.1)
gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response'))
gbm_fit %>% insights(vars = c('ageph', 'bm', 'coverage', 'fuel', 'bm_fuel'),
                     data = mtpl_be,
                     interactions = 'user',
                     pred_fun = gbm_fun) %>%
            segmentation(data = mtpl_be,
                         type = 'ngroups',
                         values = setNames(c(7, 8, 2, 2, 3), c('ageph', 'bm', 'coverage', 'fuel', 'bm_fuel')))

## End(Not run)

henckr/maidrr documentation built on July 27, 2023, 3:17 p.m.