score_profile: Profile deciles from a fitted model.

View source: R/score_profile.R

score_profileR Documentation

Profile deciles from a fitted model.

Description

This function can be used to profile the deciles from a fitted model. Given a vector of numeric scores (fitted values) and predictors, it computes basic summary statistics for each predictor by score quantile.

Usage

## S3 method for class 'formula'
score_profile(formula, data, groups = 10,
  statistic = "mean", direction = "D", categorize = TRUE, nBins = 4,
  continuous = 4, digitsN = NULL, digitsF = NULL, digitsB = NULL,
  groupVar = NULL, excludeNA = FALSE, LaTex = FALSE)

## S3 method for class 'score_profile'
is(x)

## S3 method for class 'score_profile'
print(x, ...)

Arguments

formula

A formula expression of the form score ~ predictors, where the score represents the predictions from a fitted model.

data

A data frame in which to interpret the variables named in the formula.

groups

Number of groups of equal observations in which to partition the data set to show results. The default value is 10 (deciles).

statistic

Functions that operate on a vector and produce a single value, as mean and sd do. It may be a user-defined function. To request several statistics, use the + operator. For example, statistic = "mean + min + max". This argument only applies to numeric variables when categorize = FALSE. Factors are always shows as percentages within each group.

direction

Possible values are "D" or "I", for group number labels which are decreasing or increasing with the model score, respectively.

categorize

Should numeric predictors be categorized at their quantiles?

nBins

The number of bins created for numeric variables. The bins are created based on quantiles, with a default value of 4 (quartiles). Only applicable when categorize=TRUE.

continuous

When categorize=TRUE, it specifies the threshold for when a numeric variable should be categorized at their quantiles, or at their unique values. When there are at least continuous unique values, bins are created based on quantiles. Otherwise, the variables is converted to factor with levels being equal to the variable's unique values.

digitsN

Number of decimal places to show for numeric predictors.

digitsF

Number of decimal places to show for factor predictors.

digitsB

Number of digits used in formatting the breaks

groupVar

A character string with the variable name in the data which holds the grouped predictions. If this argument is not null, groups of predictions are not created based on their quantiles but already declared from the named variable supplied to this argument.

excludeNA

Should the results exclude observations with missing values in any of the variables named in the formula?

LaTex

Should the function output LaTex code?

x

A score_profile object.

...

Additional arguments for the S3 methods.

Details

This function ranks the variable supplied in the left-hand side of the model formula and classifies it into groups with approximately the same number of observations. It subsequently calls the function tables::tabular to compute the average of each numeric predictor, and the distribution of each factor within each group.

Value

An object of class score_profile, which is a list with the following components:

  • data The data frame containing the data used for plotting.

  • Table An object of class tabular See ?tables::tabular for details.

Author(s)

Leo Guelman leo.guelman@rbc.com

See Also

ggplot.score_profile.

Examples


### Simulate some data
set.seed(123)
x1 <- rnorm(1000)
x2 <- rnorm(1000)
f1 <- sample(c(0, 1), 1000, replace = TRUE)
z <- 1 + 2 * x1 + 3 * x2  + f1
pr <- 1 / (1 + exp( -z))
y <- rbinom(1000, 1, pr)
df <- data.frame(y = y, x1 = x1, x2 = x2, f1 = factor(f1))
### Fit model and get fitted values
Fitted <- fitted(glm(y ~ x1 + x2 + f1, data = df, family = "binomial"))
### Profile deciles
score_profile(Fitted ~ x1 + x2 + f1, data = df, direction = "I")

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.