knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7 )
This vignette is intended to be used as a supplementary tools to the introduction to the vivid
package vignette. The goal of this vignette is to show how vivid
handles a range of different response variables and to show a brief guide on how to create visualisations from the package to display varible importance and variable interactions (VIVI). To begin we show a standard response from a random forest regression model.
First, we load the required packages. vivid
to create the visualizations and mlr3
and mlr3learners
will be used to create our models.
library(vivid) library(mlr3) library(mlr3learners)
For this we will simulate some data from the Friedman benchmark problem 1 using the genFriedman()
function from vivid
, with 9 features and 300 observations. The output is created according to:
Create the data:
set.seed(1701) myData <- genFriedman(noFeatures = 9, noSamples = 300, sigma = 1, bins = NULL, seed = NULL)
Create an mlr3
ranger
random forest model:
set.seed(1701) taskCont <- TaskRegr$new(id = "Friedman", backend = myData, target = "y") set.seed(1701) lrnCont <- lrn("regr.ranger", importance = "permutation") set.seed(1701) modCont <- lrnCont$train(taskCont)
Create a matrix to be supplied to the plotting functions.
set.seed(1701) myMatrix <- vividMatrix(task = taskCont, model = modCont, gridSize = 100)
Create a heatmap of the results
plot(myMatrix, type = "heatMap")
Figure 1.0 shows a heatmap of VIVI values on the Friedman data. We can clearly see the variables $x_1$ to $x_5$ are the only important variables for predeicting $y$, with $x_4$ being the most important. We can also seee a strong interaction between $x_1$ and $x_2$ Create a partial dependence pairs plot (GPDP)
ggpdpPairs(taskCont, modCont, gridsize = 20)
Create a zenplot partial dependence plot (ZPDP)
pdpZenplot(taskCont, modCont, gridsize = 20)
Here we can use genFriedman
again to create data with a binary response. To do this we set the number of bins to be equal to 2. This will create (roughly) equal sized bins to split the response into. Setting a value greater than 1 essentially turns this into a classification problem where bins determines the number of classes.
Create the data:
set.seed(1701) myDataBin <- genFriedman(noFeatures = 9, noSamples = 300, sigma = 1, bins = 2, seed = NULL) myDataBin$y <- as.numeric(myDataBin$y)
Take a quick look at our data to see that the response has been split into 2 separate classes.
head(myDataBin)
Create an mlr3
ranger
random forest model:
set.seed(1701) taskBin <- TaskRegr$new(id = "Friedman", backend = myDataBin, target = "y") set.seed(1701) lrnBin <- lrn("regr.ranger", importance = "impurity") set.seed(1701) modBin <- lrnCont$train(taskCont)
Create a matrix to be supplied to the plotting functions.
set.seed(1701) myMatrixBin <- vividMatrix(task = taskBin, model = modBin, gridSize = 100)
Create a heatmap of the results
plot(myMatrixBin, type = "heatMap")
Create a partial dependence pairs plot (PDPP)
ggpdpPairs(taskBin, modBin, gridsize = 20)
Create a zenplot partial dependence plot (ZPDP)
pdpZenplot(taskBin, modBin, gridsize = 20)
For this we will use the iris dataset to show how to use vivd
when there is a multiclass categorical response.
To begin, we create our mlr3
model.
# mlr3 TASK set.seed(1701) ir_T <- TaskClassif$new(id = "iris", backend = iris, target = "Species") # learner set.seed(1701) lrn <- lrn("classif.ranger", importance = 'impurity') # model set.seed(1701) ir_M <- lrn$train(ir_T)
To create our vivid matrix of values, we have to specify the main variable of interest by use of the main
argument. In the example below, we choose to look at the species "virginica".
myMatrixMCR <- vividMatrix(ir_T, ir_M, gridsize = 100, main = "virginica")
Plot the results:
plot(myMatrixMCR, type = 'heatMap', title = "virginica")
Create a partial dependence pairs plot (PDPP)
ggpdpPairs(ir_T, ir_M, gridsize = 20)
Create a zenplot partial dependence plot (ZPDP)
pdpZenplot(ir_T, ir_M, gridsize = 20)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.