# knitr::opts_chunk$set( # collapse = TRUE, # comment = "#>" # )
When working with Generalized Linear Models it is often useful to create informative and beautiful summaries of the fitted model coefficients. The goal of prettyglm
is to provide a set of functions to visualize the Generalized Linear Models coefficients and performance in interactive plots which can easily be embedded in rmarkdown reports or separately exported and shared with stakeholders. This document introduces prettyglm
’s main sets of functions, and shows you how to apply them.
Please see the website prettyglm for more detailed documentation with html outputs, some of the outputs have been excluded from this documentation for publication on CRAN.
If you don't find the function you are looking for in prettyglm
consider checking out some other great packages which help visualize the output from glms:
tidycat
jtools
You can install the latest CRAN release with:
install.packages('prettyglm')
To explore the functionality of prettyglm
we will use the titanic data set to perform logistic regression. This data set was sourced from kaggle and contains information about passengers aboard the titanic, and a target variable which indicates if they survived.
library(dplyr) library(prettyglm) data('titanic') head(titanic) %>% select(-c(PassengerId, Name, Ticket)) %>% knitr::kable(table.attr = "style='width:10%;'" ) %>% kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
A critical step for this package to work is to set all categorical predictors as factors.
# Easy way to convert multiple columns to a factor. columns_to_factor <- c('Pclass', 'Sex', 'Cabin', 'Embarked', 'Cabintype') meanage <- base::mean(titanic$Age, na.rm=T) titanic <- titanic %>% dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>% dplyr::mutate(Age =base::ifelse(is.na(Age)==T,meanage,Age))
For this vignette we will use stats::glm()
to build a logistic regression model. Currently working on support for parsnip
and workflow
model objects which use the glm
model engine.
survival_model <- stats::glm(Survived ~ Pclass + Sex + Fare + Age + Embarked + SibSp + Parch, data = titanic, family = binomial(link = 'logit'))
pretty_coefficients()
The function pretty_coefficients()
allows you to create a pretty table of model coefficients, which by default includes categorical base levels.
The simplest way to call this function is just with the model object.
pretty_coefficients(model_object = survival_model)
You can also complete a type III test on the coefficients by specifying a type_iii
argument. Warning Wald
type III tests will fail if there are aliased coefficients in the model.
You can change the significance level highlighted in the table with significance_level
.
pretty_coefficients(survival_model, type_iii = 'Wald', significance_level = 0.1)
By default pretty_coefficients
shows "model" variable importance. But vimethod
also accepts "permute" and "firm" methods from \link[vip]{vi}. Additional parameters for these methods should also be passed into pretty_coefficients
.
pretty_coefficients(model_object = survival_model, type_iii = 'Wald', significance_level = 0.1, vimethod = 'permute', target = 'Survived', metric = 'auc', pred_wrapper = predict.glm, reference_class = 0)
pretty_relativities()
pretty_relativities()
will create a plot of the desired model variable. A different plot will be generated depending on the class of the variable.
A model relativity is a transform of the model estimate. By default pretty_relativities()
uses 'exp(estimate)-1' which is useful for GLM's which use a log or logit link function.
The term 'relativity' is some times referred to as "odds-ratio" or "Likelihood". You can customize the label with the relativity_label
input.
For categorical variables pretty_relativities()
creates an interactive duel axis plot, which plots the fitted relativity on one y axis, and the number of records in that category on the other y axis.
pretty_relativities(feature_to_plot= 'Embarked', model_object = survival_model, relativity_label = 'Liklihood of Survival' )
For continuous variables pretty_relativities
will plot the relativity over the variables range, and the density of that variable on a duel axis.
If desired you can cut off the tail end of the distributions with upper_percentile_to_cut
or lower_percentile_to_cut
.
pretty_relativities(feature_to_plot= 'Fare', model_object = survival_model, relativity_label = 'Liklihood of Survival', upper_percentile_to_cut = 0.1)
To highlight some more of prettyglm
's functionality we will now build a logistic regression model with some interactions.
survival_model2 <- stats::glm(Survived ~ Pclass:Fare + Age + Embarked:Sex + SibSp + Parch, data = titanic, family = binomial(link = 'logit'))
You can also choose to facet the plots by one of the variables.
pretty_relativities(feature_to_plot= 'Embarked:Sex', model_object = survival_model2, relativity_label = 'Liklihood of Survival', iteractionplottype = 'facet', facetorcolourby = 'Sex' )
You can also choose to colour the plots by one of the variables.
pretty_relativities(feature_to_plot= 'Embarked:Sex', model_object = survival_model2, relativity_label = 'Liklihood of Survival', iteractionplottype = 'colour', facetorcolourby = 'Embarked' )
You can create these relativity plots as you would for a non-interaction.
pretty_relativities(feature_to_plot= 'Embarked:Sex', model_object = survival_model2, relativity_label = 'Liklihood of Survival' )
By default continuous and factor interaction plots will colour by the factor variable.
pretty_relativities(feature_to_plot= 'Pclass:Fare', model_object = survival_model2, relativity_label = 'Liklihood of Survival', upper_percentile_to_cut = 0.03 )
You can also facet by the factor variable.
pretty_relativities(feature_to_plot= 'Pclass:Fare', model_object = survival_model2, relativity_label = 'Liklihood of Survival', iteractionplottype = 'facet', upper_percentile_to_cut = 0.03, height = 800 )
To highlight some more of prettyglm
's functionality we will now build a logistic regression model with a spline.
prettyglm
includes a function splineit
to help construct splines. This can be incorporated in the dplyr workflow as follows.
For splines to work nicely in prettyglm
use the naming convention Variable#Start#End where # represents your desired separator.
titanic <- titanic %>% dplyr::mutate(Age_0_18 = prettyglm::splineit(Age,0,18), Age_18_35 = prettyglm::splineit(Age,18,35), Age_35_120 = prettyglm::splineit(Age,35,120)) %>% dplyr::mutate(Fare_0_55 = prettyglm::splineit(Fare,0,55), Fare_55_600 = prettyglm::splineit(Fare,55,600))
survival_model4 <- stats::glm(Survived ~ Pclass + Sex:Fare_0_55 + Sex:Fare_55_600 + Age_0_18 + Age_18_35 + Age_35_120 + Embarked + SibSp + Parch, data = titanic, family = binomial(link = 'logit'))
For interactions variables are grouped on the left pane.
pretty_coefficients(survival_model4, significance_level = 0.1, spline_seperator = '_')
You also need to provide a spline_seperator
input in pretty_relativities
.
pretty_relativities(feature_to_plot= 'Age', model_object = survival_model4, relativity_label = 'Liklihood of Survival', spline_seperator = '_' )
By default pretty_relativities
will colour by the factor variable.
pretty_relativities(feature_to_plot= 'Sex:Fare', model_object = survival_model4, relativity_label = 'Liklihood of Survival', spline_seperator = '_', upper_percentile_to_cut = 0.03 )
If you prefer to facet by the factor variable, change iteractionplottype
to "facet"
pretty_relativities(feature_to_plot= 'Sex:Fare', model_object = survival_model4, relativity_label = 'Liklihood of Survival', spline_seperator = '_', upper_percentile_to_cut = 0.03, iteractionplottype = 'facet' )
one_way_ave()
For continuous variables one_way_ave
will bucket value into 30 buckets by default, and plot the density on a dual axis.
one_way_ave(feature_to_plot = 'Age', model_object = survival_model4, target_variable = 'Survived', data_set = titanic, upper_percentile_to_cut = 0.1, lower_percentile_to_cut = 0.1)
one_way_ave(feature_to_plot = 'Cabintype', model_object = survival_model4, target_variable = 'Survived', data_set = titanic)
You can facet the one_way_ave
plot by providing a variable to facet by in facetby
.
one_way_ave(feature_to_plot = 'Age', model_object = survival_model4, target_variable = 'Survived', facetby = 'Sex', data_set = titanic, upper_percentile_to_cut = 0.1, lower_percentile_to_cut = 0.1)
By default one_way_ave
uses \link[stats]{predict.glm}. If you would like to use one_way_ave
with another model type (which is not compatible with predict.glm), or provide modified predictions, one_way_ave
allows a custom prediction function.
This function must return a data.frame with two columns: "Actual_Values" and "Predicted_Values".
# Custom Predict Function and facet a_custom_predict_function <- function(target, model_object, dataset){ dataset <- base::as.data.frame(dataset) Actual_Values <- dplyr::pull(dplyr::select(dataset, tidyselect::all_of(c(target)))) if(class(Actual_Values) == 'factor'){ Actual_Values <- base::as.numeric(as.character(Actual_Values)) } Predicted_Values <- base::as.numeric(stats::predict(model_object, dataset, type='response')) to_return <- base::data.frame(Actual_Values = Actual_Values, Predicted_Values = Predicted_Values) to_return <- to_return %>% dplyr::mutate(Predicted_Values = base::ifelse(Predicted_Values > 0.4,0.4,Predicted_Values)) return(to_return) } one_way_ave(feature_to_plot = 'Age', model_object = survival_model4, target_variable = 'Survived', data_set = titanic, upper_percentile_to_cut = 0.1, lower_percentile_to_cut = 0.1, predict_function = a_custom_predict_function)
actual_expected_bucketed()
actual_expected_bucketed(target_variable = 'Survived', model_object = survival_model4, data_set = titanic)
actual_expected_bucketed(target_variable = 'Survived', model_object = survival_model4, data_set = titanic, facetby = 'Sex')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.