Description Usage Arguments Details Value See Also Examples
alluvial plots are capable of displaying higher dimensional data
on a plane, thus lend themselves to plot the response of a statistical model
to changes in the input data across multiple dimensions. The practical limit
here is 4 dimensions. We need the data space (a sensible range of data
calculated based on the importance of the explanatory variables of the model
as created by get_data_space
and the predictions
returned by the model in response to the data space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  alluvial_model_response(
pred,
dspace,
imp,
degree = 4,
bins = 5,
bin_labels = c("LL", "ML", "M", "MH", "HH"),
col_vector_flow = c("#FF0065", "#009850", "#A56F2B", "#005EAA", "#710500", "#7B5380",
"#9DD1D1"),
method = "median",
force = FALSE,
params_bin_numeric_pred = list(center = T, transform = T, scale = T),
pred_train = NULL,
stratum_label_size = 3.5,
...
)

pred 
vector, predictions, if method = 'pdp' use

dspace 
data frame, returned by

imp 
dataframe, with not more then two columns one of them numeric containing importance measures and one character or factor column containing corresponding variable names as found in training data. 
degree 
integer, number of top important variables to select. For plotting more than 4 will result in two many flows and the alluvial plot will not be very readable, Default: 4 
bins 
integer, number of bins for numeric variables, increasing this number might result in too many flows, Default: 5 
bin_labels 
labels for the bins from low to high, Default: c("LL", "ML", "M", "MH", "HH") 
col_vector_flow, 
character vector, defines flow colours, Default: c('#FF0065','#009850', '#A56F2B', '#005EAA', '#710500') 
method, 
character vector, one of c('median', 'pdp')
. Default: 'median' 
force 
logical, force plotting of over 1500 flows, Default: FALSE 
params_bin_numeric_pred 
list, additional parameters passed to

pred_train 
numeric vector, base the automated binning of the pred vector on the distribution of the training predictions. This is useful if marginal histograms are added to the plot later. Default = NULL 
stratum_label_size 
numeric, Default: 3.5 
... 
additional parameters passed to

this model visualisation approach follows the "visualising the model in the dataspace" principle as described in Wickham H, Cook D, Hofmann H (2015) Visualizing statistical models: Removing the blindfold. Statistical Analysis and Data Mining 8(4) <doi:10.1002/sam.11271>
ggplot2 object
alluvial_wide
,
get_data_space
,
alluvial_model_response_caret
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  df = mtcars2[, ! names(mtcars2) %in% 'ids' ]
m = randomForest::randomForest( disp ~ ., df)
imp = m$importance
dspace = get_data_space(df, imp, degree = 3)
pred = predict(m, newdata = dspace)
alluvial_model_response(pred, dspace, imp, degree = 3)
# partial dependency plotting method
## Not run:
pred = get_pdp_predictions(df, imp
, .f_predict = randomForest:::predict.randomForest
, m
, degree = 3
, bins = 5)
alluvial_model_response(pred, dspace, imp, degree = 3, method = 'pdp')
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.