make_validation_plot: Create a plot representing a Hosmer-Lemeshow goodness-of-fit...

Description Usage Arguments Examples

Description

Create a plot representing a Hosmer-Lemeshow goodness-of-fit test.

Usage

1
2
3
4
5
6
7
8
make_validation_plot(models, validation_data, buckets = 10,
  dep_var_name = "dep_var", id_var_name = "id", plot = TRUE, ...)

output_validation_plot(validation_object_list, scale = 1,
  output_type = grDevices::png, filename = NULL, ...)

validation_plot(models, validation_data, buckets = 10,
  dep_var_name = "dep_var", id_var_name = "id", plot = TRUE, ...)

Arguments

models

list or numeric. If a numeric vector, this will be interpreted as a vector of risk scores to compare to a dependent variable. If a tundraContainer (see the tundra package), the second argument should be a validation data set to be scored ad-hoc with the model object. If a named list is passed, multiple validation graphs will be overlayed onto one plot; this list can heterogeneously consist of numeric vectors or scores or tundraContainer model objects.

validation_data

data.frame or integer. If a data.frame, the column given by dep_var_name (by default "dep_var") will be extracted and used as the empirical signal (0 or 1). For each bucket (see the buckets parameter), a comparison of the mean score in that quantile to the empirical mean of the dependent variable will be graphed. An ideal classifier will fully separate the positive and negative cases and look like the Hamiltonian step function jumping from 0 to 1 after some cutoff.

If an integer is passed, this will be assumed to be the dependent variable in the same order as the scores given by the models parameter (if it is also a numeric vector). Note that one cannot pass model objects, that is tundraContainers, in the first parameter models if validation_data is a vector.

buckets

integer. The number of cuts (by default, 10).

dep_var_name

character. The name of the dependent variable. This will be used to extract a column out of the validation_data for comparison against the fitted risk scores. By default, "dep_var".

id_var_name

character. The name of the ID variable. By default, simply "id".

plot

logical. Whether or not to plot to an output device straight away, by default TRUE.

...

additional arguments to plot.

validation_object_list

validation_object_list. Internal parameter.

scale

numeric. Adjust plot size, by default 1.

output_type

function. The output type for the plot, by default png.

filename

character. Path to save output png file (for only a single plot_type).

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
   set.seed(100) # to make it determenistic
   validation_dat <- data.frame(dep_var = sample(c(1, 0), 1000, replace = TRUE))
   good_preds     <- validation_dat[['dep_var']] + rnorm(NROW(validation_dat))
   bad_preds      <- rnorm(NROW(validation_dat))
   make_validation_plot(models = list(good_mod = good_preds, bad_mod  = bad_preds),
                        validation_data = validation_dat)

## End(Not run)

avantcredit/validationplot documentation built on May 11, 2019, 4:07 p.m.