knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 8,
  message = FALSE,
  warning = FALSE
)

Libraries

library(regionaldrivers)

Data loading

SSCT_data and SSCT_labelled_data are loaded with the package. Here is what 10 random rows of each looks like:

SSCT_labelled_data %>% dplyr::sample_n(10) %>%
    knitr::kable(digits = 3, format = "html", caption = "SSCT_labelled_data") %>% 
    kableExtra::kable_styling(bootstrap_options = c("hover", "condensed")) %>% 
    kableExtra::scroll_box(width = "7in", height = "5in")
SSCT_data %>% dplyr::sample_n(10) %>%
    knitr::kable(digits = 3, format = "html", caption = "SSCT_data") %>% 
    kableExtra::kable_styling(bootstrap_options = c("hover", "condensed")) %>% 
    kableExtra::scroll_box(width = "7in", height = "5in")   

Data transformation

We transform the data to retain what we need for the analysis.

drivers_data <- cbind(SSCT_labelled_data, SSCT_data) %>%
  dplyr::select(-c("SiteID", "POINT_X", "POINT_X"))

Feature importance

The function feature_importance() derives the feature importance from drivers_data and the name of the target column, here ward. The filter_methods argument accepts filter methods from mlr. The function resamples the dataset iters time, for this vignette, the number of iterations was lowered to 10 from the default value of 500.

class(drivers_data[["ward"]])
varimp <- feature_importance(
  drivers_data, "ward", 
  filter_methods = c("FSelector_information.gain", "randomForest_importance"),
  .iters = 10,
  .first = 10)
varimp$p

The function also works for numerical data using a regression task. This behavior is triggered by the class() of the target. If the target is not an integer, character or factor, feature_importance() attemps to create a regression task. Some methods, like FSelectorRcpp_information.gain will warn you that they are discretizing numeric input.

class(drivers_data[["SLOPE"]])
varimp <- feature_importance(
  drivers_data, "SLOPE", 
  filter_methods = c("FSelector_information.gain", "randomForest_importance"),
  .iters = 10,
  .first = 10)
varimp$p


hrvg/regionaldrivers documentation built on June 20, 2021, 7:50 a.m.