feature_importance_permutation: feature_importance_permutation
In gloverd2/admr: Contains standard functions for Admiral Advance Analytics

Description Usage Arguments Value Examples

View source: R/feature_importance_permutation.R

returns which columns are the most important in the fitted model. This is done by permuting the inputs and measuring the deterioration of the metric This implementation is not suitbile for one-hot encoded categorical variables. The permutation importance is defined to be the difference between the baseline metric and metric from permutating the feature column.

feature_importance_permutation(
  data,
  model,
  actual,
  weight = rep(1, nrow(data)),
  metric = metric_rmse,
  nrounds = 10,
  seed = 666,
  ...
)

`data`	dataframe - data from which the model can give predictions. For xgboost it must contain only features used in `model` in the correct order Often this dataset is the validation data
`model`	model object - tested examples are lm glm and xgboost
`actual`	vector[Numeric] - target to be predicted. Must be normalised by exposure
`weight`	vector[Numeric] - exposure for predictions
`metric`	function - Of type in admr::metric_ - must have arguments actual, predicted, weight
`nrounds`	integer - Number of times to permute each feature
`seed`	integer - random seed for permuations
`...`	OPTIONAL: - Arguments included but not defined above will be carried through to the metric

dataframe with columns :col_index - position of feature in data :feature - name of feature in data :importance_mean - importance of feature :importance_sd - standard deviation of importance, will be NA if nrounds = 1 this data can be used to find the most important features in the model

input_data <- data.frame(x1=runif(100, 0, 25), x2=runif(100, 0, 25), x3=runif(100, 0, 25)) %>%
  mutate(target=x1^2 * 0.01 + x2 + rnorm(n(),sd=5))

#LM
model_lm <- glm(target ~ poly(x1, 2) + x2, data=input_data)

feature_importance_permutation(data=input_data %>% select(-target), model=model_lm, actual=input_data[["target"]])

#GLM
model_glm <- glm(target ~ poly(x1, 2) + x2 + x3, data=input_data)

feature_importance_permutation(data=input_data %>% select(-target), model=model_glm, actual=input_data[["target"]])

#GBM
model_gbm <- xgboost(data = as.matrix(input_data %>% select(-target)), label=input_data[["target"]], nrounds=20, verbose = 0)


feature_importance_permutation(model=model_gbm,
                              data=input_data %>% select(-target),
                              actual=input_data[["target"]])