feature_importance_permutation: feature_importance_permutation

Description Usage Arguments Value Examples

View source: R/feature_importance_permutation.R

Description

returns which columns are the most important in the fitted model. This is done by permuting the inputs and measuring the deterioration of the metric This implementation is not suitbile for one-hot encoded categorical variables. The permutation importance is defined to be the difference between the baseline metric and metric from permutating the feature column.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
feature_importance_permutation(
  data,
  model,
  actual,
  weight = rep(1, nrow(data)),
  metric = metric_rmse,
  nrounds = 10,
  seed = 666,
  ...
)

Arguments

data

dataframe - data from which the model can give predictions. For xgboost it must contain only features used in model in the correct order Often this dataset is the validation data

model

model object - tested examples are lm glm and xgboost

actual

vector[Numeric] - target to be predicted. Must be normalised by exposure

weight

vector[Numeric] - exposure for predictions

metric

function - Of type in admr::metric_ - must have arguments actual, predicted, weight

nrounds

integer - Number of times to permute each feature

seed

integer - random seed for permuations

...

OPTIONAL: - Arguments included but not defined above will be carried through to the metric

Value

dataframe with columns :col_index - position of feature in data :feature - name of feature in data :importance_mean - importance of feature :importance_sd - standard deviation of importance, will be NA if nrounds = 1 this data can be used to find the most important features in the model

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
input_data <- data.frame(x1=runif(100, 0, 25), x2=runif(100, 0, 25), x3=runif(100, 0, 25)) %>%
  mutate(target=x1^2 * 0.01 + x2 + rnorm(n(),sd=5))

#LM
model_lm <- glm(target ~ poly(x1, 2) + x2, data=input_data)

feature_importance_permutation(data=input_data %>% select(-target), model=model_lm, actual=input_data[["target"]])

#GLM
model_glm <- glm(target ~ poly(x1, 2) + x2 + x3, data=input_data)

feature_importance_permutation(data=input_data %>% select(-target), model=model_glm, actual=input_data[["target"]])

#GBM
model_gbm <- xgboost(data = as.matrix(input_data %>% select(-target)), label=input_data[["target"]], nrounds=20, verbose = 0)


feature_importance_permutation(model=model_gbm,
                              data=input_data %>% select(-target),
                              actual=input_data[["target"]])

gloverd2/admr documentation built on Dec. 2, 2020, 11:16 p.m.