f_model_plot_var_dep_over_spec_var_range: plot vmodel varaible dependency over the range of a specified...

Description Usage Arguments Value See Also Examples

View source: R/f_model_var_dep.R

Description

Some models are able to capture relative dependencies. In order to visualise them the dataset is split into three parts. 0-25,25-75,75-100 percentile or the three most common factors.Then variable dependencies for each of the three splits are plotted. In the mtcars example below we can see that the model predicts an increase in disp if drat increases for cars with 8 cylinders, while the opposite is true for cars with only 6 cylinders.

Usage

1
2
f_model_plot_var_dep_over_spec_var_range(m, title, variables, range_variable,
  data, formula, data_ls, variable_color_code, log_y = F, limit = 12)

Arguments

m

a model

title

model title

variables

character vector with variable names, or ranked variables as returned by f_model_importance()

range_variable

character vector denoting range variable

data

dataset

formula

formula

data_ls

data_ls object generated by f_clean_data(), or a named list list( data = <dataframe>, numericals = < vector with column names of numerical columns>) - The data_ls object provides the entire dataset

variable_color_code

dataframe created by f_plot_color_code_variables()

log_y

boolean log_scale for y axis

limit

integer limit the number of variables to be plotted, Default: 12

data_ls

PARAM_DESCRIPTION

Value

grid can be printed with gridExtra::grid.arrange()

See Also

arrangeGrob

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## Not run: 

 # single output example ---------------------------------------
.f                  = randomForest::randomForest
data_ls             = f_clean_data(mtcars)
data                = data_ls$data
formula             = disp~mpg+cyl+am+hp+drat+qsec+vs+gear+carb
m                   = .f(formula, data)
variables           = f_model_importance( m, data)
title               = unlist( stringr::str_split( class(m)[1], '\\.') )[1]
variable_color_code = f_plot_color_code_variables(data_ls)
limit               = 10
log_y               = F

range_variable_num  = data_ls$numericals[1]
range_variable_cat  = data_ls$categoricals[1]

grid_num = f_model_plot_var_dep_over_spec_var_range(m
                                                    , title
                                                    , variables
                                                    , range_variable_num
                                                    , data
                                                    , formula
                                                    , data_ls
                                                    , variable_color_code
                                                    , log_y
                                                    , limit  )
gridExtra::grid.arrange(grid_num)

# pipe example ------------------------------------------------

data_ls = f_clean_data(mtcars)
form = as.formula('disp~cyl+mpg+hp+am+gear+drat+wt+vs+carb')
variable_color_code = f_plot_color_code_variables(data_ls)

grids = pipelearner::pipelearner(data_ls$data) %>%
  pipelearner::learn_models( twidlr::rpart, form ) %>%
  pipelearner::learn_models( twidlr::randomForest, form ) %>%
  pipelearner::learn_models( twidlr::svm, form ) %>%
  pipelearner::learn() %>%
  dplyr::mutate( imp = map2(fit, train, f_model_importance)
                 , range_var = map_chr(imp, function(x) head(x,1)$row_names )
                 , grid = pmap( list( m = fit
                                      , title = model
                                      , variables = imp
                                      , range_variable = range_var
                                      , data = test
                 )
                 , f_model_plot_var_dep_over_spec_var_range
                 , formula = form
                 , data_ls = data_ls
                 , variable_color_code = variable_color_code
                 , log_y = F
                 , limit = 12
                 )
  )  %>%
  .$grid

f_plot_obj_2_html( grids,  type = "grids", output_file =  'test_me', title = 'Grids', height = 30 )

file.remove('test_me.html')

## End(Not run)

erblast/oetteR documentation built on Feb. 15, 2018, 5:12 p.m.