knitr::opts_chunk$set(echo = TRUE)
totalvis
is a model agnostic vizualization tool used to summarize complex relationships in black-box models. The package relies on a PCA based transformation of the training data to group correlated features and display their total effect on the response. At a high level, the method generates partial dependence plots (PDPs) for the principal components and provides interpretable insight into the importance of the features. This vignette demonstrates base usage of 'totalvis' by reproducing all figures used in the associated paper, as well as introducing a few additional components.
Install totalvis
:
#devtools::install_github("nickseedorff/totalvis") library(totalvis)
The vignette uses the following packages:
library(MASS) library(randomForest) library(MachineShop) library(gbm) library(nnet) library(caret) library(kknn)
PDPs and individual conditional expectation (ICE) curves serve as powerful tools to study relationships in black-box models. A primary limitation of these methods is the assumption of low dependence between the covariate(s) of interest and the complement set. For both approaches, the procedure may generate observations that are far outside of the joint distribution of the training data when features are correlated. An instance of this can be referenced below, where the red indicates the data points that would be used to calculate predictions when setting X1 = 1.5 (done when assessing the marginal effect of X1). Additionally, PDPs and ICE curves are limited to low-dimensional views and may miss complex relationships. totalvis
looks to mitigate these issues by grouping correlated features and interpreting their effect as a whole.
set.seed(2021) X <- MASS::mvrnorm(n = 500, mu = c(2, 2), Sigma = matrix(c(1, 0.9, 0.9, 1), nrow = 2)) colnames(X) <- c("X1", "X2") plot(X) points(rep(1.5, 500), X[, 2], col = "red")
Through three simulated examples we display several characteristics of totalvis
. The first example addresses the concept of a 'total effect', while the following simulations demonstrate the use of the partial_effects
plot as a diagnostic tool.
This example demonstates the 'total effect' notion of totalvis
plots. Marginally, the covariates affect the outcome in piece-wise linear fashions in separate locations of their support. totalvis
looks to group the features based on their positive correlation and describe the total impact they have on the response.
## Generate data set.seed(2021) X <- mvrnorm(n = 1000, mu = c(1, 1.2), Sigma = matrix(c(1, 0.2, 0.2, 1), nrow = 2)) colnames(X) <- c("X1", "X2") X_new <- cbind(X[, 1], as.numeric(X[, 1] > 0.5), as.numeric(X[, 2] < 0.5), X[, 2]) ## Covariates have piece-wise linear behavior y = 2 * X_new[, 1] * (1 - X_new[, 2]) + X_new[, 2] + X_new[, 3] + 2 * X_new[, 4] * (1 - X_new[, 3]) + rnorm(1000, 0, 0.2) par(mfrow=c(1, 2)) plot(X[, 1], y, xlab = "X1") plot(X[, 2], y, xlab = "X2") par(mfrow=c(1, 1)) ## Train the model mod_frame <- ModelFrame(X, y) knn_mod <- fit(mod_frame, model = KNNModel)
The PDPs correctly identify the marginal effects.
## Partial depedence of X1 plot(totalvis(knn_mod, X, feature = "X1", pc_num = NULL)) ## Partial depedence of X2 plot(totalvis(knn_mod, X, feature = "X2", pc_num = NULL))
totalvis
groups the features based on their correlation and presents a reasonable summary of their combined effect.
## Total effect plot plot(totalvis(knn_mod, X), legend_cex = 0.9)
When using PCA to transform the training data, the assocation with the response is not taken into account. Thus, the principal components should be assessed to see if the covariates behave constructively on the outcome. To assess this, we generate two correlated predictors who have opposing effects on the outcome and look to identify their contradictory behavior.
## Generate data set.seed(2021) X <- as.data.frame(mvrnorm(n = 1000, mu = c(2, 2), Sigma = matrix(c(1, 0.9, 0.9, 1), nrow = 2))) colnames(X) <- c("X1", "X2") y = X[, 1] - X[, 2] + rnorm(1000, 0, 0.1) ## Train model df <- data.frame(y, X) gbm_mod <- gbm(y ~ ., data = df, n.trees = 200, distribution = "gaussian")
Due to the negating effects of the two covariates, the average prediction remains near 0 throughout the range of the prinicpal component.
## Total effect plot plot(totalvis(gbm_mod, X), legend_loc = "topleft")
Using a partial_effects
plot, we take a deeper look at the dependence structure. Highlighted in this is the opposing effects of the two covariates, which appear to cancel each other out across the range of the principal component.
## Partial effects plot plot(partial_effects(gbm_mod, X), legend_cex = 0.82)
In the final simulated example, we assess totalvis
' ability to correctly identify inputs which are relevant to the response. To do this, we generate two correlated features, but only one acts on the outcome. Additionally, this example indicates the usability of totalvis
in the classification setting.
## Generate data set.seed(2021) X <- mvrnorm(n = 1000, mu = c(2, 2), Sigma = matrix(c(1, 0.8, 0.8, 1), nrow = 2)) colnames(X) <- c("X1", "X2") y = as.factor((X[, 1] + rnorm(1000, 0, 0.1)) > 1.5) ## Train model, plot pdp_pca mod <- train(X, y , method = "nnet", trace = FALSE)
To relate the total effect of group of predictors to a covariate of interest, we make use of the 'pin' argument, which offers a more interpretable x-axis. In this case, we present a 'pinplot' for X2, which is positively related to X1 but does not affect the outcome. As referenced below, the 'pinplot' is not able to distiguish between the effects of correlated predictors.
## Pin plot plot(totalvis(mod, X, pin = "X2", type = "classification"))
However, partial_effects
is able to identify the relevant predictors in the model. For this example, we make use of the differenced = FALSE
argument to add the partial_effects
lines to the 'total effect' curve. At all points across the range of the principal component, the partial effect curve of X2 mirrors the total effect, suggesting that X2 has little influence on the predictions (because shifting X2 had little impact on the average prediction).
## Partial effects plot diag_obj <- partial_effects(mod, X, type = "classification") plot(diag_obj, differenced = FALSE, legend_loc = "right")
The application uses the Community and Crimes Unnormalized Data Set, which comes the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php). Our study uses 101 of the predictors and the response variable of 'murders per 100k population'. Many of the predictiors are related, with 12\% of the correlations above 0.5.
## Get data crimes <- read.csv(paste0("https://archive.ics.uci.edu/ml/machine-learning-", "databases/00211/CommViolPredUnnormalizedData.txt"), header = F) ## Add column names crimes_descr <- readLines(paste0("http://archive.ics.uci.edu/ml/datasets/", "Communities+and+Crime+Unnormalized")) crimes_descr <- strsplit(crimes_descr[grepl(pattern = "<br>@attribute", x = crimes_descr)], split = " ") colnames(crimes) <- vapply(crimes_descr, FUN = function(rw) {rw[[2]]}, FUN.VALUE = "STRING") ## Replace "?" with NA crimes[crimes == "?"] <- NA crimes <- crimes[, colSums(is.na(crimes)) == 0] ## Select a subset of predictors, Convert all columns to numeric, add outcome col_start <- which(names(crimes) == "pop") end_start <- which(names(crimes) == "pctOfficDrugUnit") X <- as.data.frame(sapply(crimes[, col_start:end_start], as.numeric)) outcome <- crimes$murdPerPop ## Train the model set.seed(2021) rf_mod <- randomForest(X, outcome)
Visual summaries (partial dependence and ICE plots) are often produced to interpret the effects of the most important features in black-box models. The example uses a randomForest
model from the randomForest
package, which has readily available measures of feature importance. We select one of the primary features, 'pctKids2Par', for further exploration.
## Feature importance varImpPlot(rf_mod, n.var = 10, main = "Feature Importance", pt.cex = 2)
In this instance, 'pctKids2Par' presents high correlation with several other predictiors, limiting the appropriateness of PDPs and ICE curves.
## Correlation cor_vals <- abs(cor(X[, "pctKids2Par"], X[, setdiff(colnames(X), "pctKids2Par")])) hist(cor_vals, breaks = 30, xlab = "abs(correlation)", main = "Correlation with pctKids2Par") num <- quantile(cor_vals, 0.8) abline(v = num, col = "red", lwd = 3) text(x = num + 0.18, y = 10, " 80th Percentile")
Due to high correlations with our feature of interest, we turn to totalvis
to interpret groups of features through their total effect. totalvis
accomplishes this by transforming the training data with PCA and then generating PDPs for the principal components. The top contributors to the linear combination are presented in the legend, where contribution is defined by the magnitudes of the loadings. In cases such as this, there is a clear latent factor captured by the top features, income or economic status of a neighborhood. Using this, we can interpret the impact that neighborhood income level has on the response.
## totalvis plot vis_object <- totalvis(rf_mod, X) df <- plot(vis_object, num_load = 8, legend_cex = 1.0) knitr::kable(df, row.names = F, align = "lc")
At this point, we associate the total effect with the covariate of interest using a 'pinplot'. Comparing this to a partial dependence plot of the same feature, we see a much larger range of predicted values.
## pinplot plot(totalvis(rf_mod, X, pin = "pctKids2Par")) ## pdp plot(totalvis(rf_mod, X, feature = "pctKids2Par", pc_num = NULL))
As previously shown, there are a variety of insights to be gleaned for the partial_effects
plot. In this instance, shifting any covariate leads to an increase in the predicted value (except for a small range of pctPoverty), thus the underlying assumptions of totalvis
appear to be reasonable. Additionally, the magnitudes of the partial effect lines offer results consistent with the feature importance plot. For partial_effect
, the default is to plot a standardized or 'differenced' version of the predictions.
## partial_effects (default) diag_obj <- partial_effects(rf_mod, X, num_load = 8) plot(diag_obj, legend_loc = "topleft", legend_cex = 0.72)
Given the reliance of the package on PCA, we offer a wrapper for the screeplot
function that works with several of the totalvis
objects.
## Wrapper to scree plot scree_plot(totalvis(rf_mod, X))
As a final addition to the package, we offer simple ICE curves. The default is to center the curves so they all have the same starting value in order to highlight differences in trajectory. This can be changed with the center = FALSE
argument
## ice plot (default is centered) vis_obj <- totalvis(rf_mod, X, ice = T, feature = "pctKids2Par") plot(vis_obj) ## ice plot (uncentered) plot(vis_obj, center = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.