get_data_space: calculate data space
In easyalluvial: Generate Alluvial Plots with a Single Line of Code

get_data_space

R Documentation

calculate data space

Description

calculates a dataspace based on the modeling dataframe and the importance of the explanatory variables. It only considers the most important variables as defined by the degree parameter. It selects a number (defined by bins) of sensible single values spread over the range of the numeric variables and creates all possible value combinations among the most important variables. The values of the remaining variables are set to mode(factors) or median(numerics).

Usage

get_data_space(df, imp, degree = 4, bins = 5, max_levels = 10)

Arguments

`df`	dataframe, training data
`imp`	dataframe, with not more then two columns one of them numeric containing importance measures and one character or factor column containing corresponding variable names as found in training data.
`degree`	integer, number of top important variables to select. For plotting more than 4 will result in two many flows and the alluvial plot will not be very readable, Default: 4
`bins`	integer, number of bins for numeric variables, and maximum number of levels for factor variables, increasing this number might result in too many flows, Default: 5
`max_levels`	integer, maximum number of levels per factor variable, Default: 10

Details

It selects a the top most important variables based on the degree parameter and bins the numeric variables using manip_bin_numerics, while leaving categoric variables unchanged. The number of bins for each numeric variable is set to bins -2. Next the median is picked for each of the bins and the min and the max value is added for each numeric variable So that we get (median(bin) X bins -2, max, min) for each numeric variable. Then all possible combinations between those values and the categoric factor levels are created. The total number of all possible combinations defines the range of the data space. The values of the remaining variables are set to mode(factors) or median(numerics).

this model visualisation approach follows the "visualising the model in the dataspace" principle as described in Wickham H, Cook D, Hofmann H (2015) Visualizing statistical models: Removing the blindfold. Statistical Analysis and Data Mining 8(4) <doi:10.1002/sam.11271>

Value

data frame

Examples

df = mtcars2[, ! names(mtcars2) %in% 'ids' ]
m = randomForest::randomForest( disp ~ ., df)
imp = m$importance
dspace = get_data_space(df, imp)

easyalluvial documentation built on May 29, 2024, 5:32 a.m.

easyalluvial index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

easyalluvial
Generate Alluvial Plots with a Single Line of Code

get_data_space: calculate data space
In easyalluvial: Generate Alluvial Plots with a Single Line of Code

calculate data space

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to get_data_space in easyalluvial...

R Package Documentation

Browse R Packages

We want your feedback!

easyalluvial Generate Alluvial Plots with a Single Line of Code

get_data_space: calculate data space In easyalluvial: Generate Alluvial Plots with a Single Line of Code

calculate data space

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to get_data_space in easyalluvial...

R Package Documentation

Browse R Packages

We want your feedback!

easyalluvial
Generate Alluvial Plots with a Single Line of Code

get_data_space: calculate data space
In easyalluvial: Generate Alluvial Plots with a Single Line of Code