In easystats/see: Model Visualisation Toolbox for 'easystats' and 'ggplot2'

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  warning = FALSE,
  message = FALSE
)

can_evaluate <- FALSE
pkgs <- c("datawizard", "ggplot2")
successfully_loaded <- vapply(pkgs, requireNamespace, FUN.VALUE = logical(1L), quietly = TRUE)
all_deps_available <- all(successfully_loaded)

# even if all dependencies are available, evaluate only if internet access is available
if (all_deps_available) {
  can_evaluate <- curl::has_internet()
}

if (can_evaluate) {
  knitr::opts_chunk$set(eval = TRUE)
  vapply(pkgs, require, FUN.VALUE = logical(1L), quietly = TRUE, character.only = TRUE)
} else {
  knitr::opts_chunk$set(eval = FALSE)
}

This vignette can be referred to by citing the package:

citation("see")

Introduction

datawizard is a lightweight package to easily manipulate, clean, transform, and prepare your data for analysis. Most courses and tutorials about statistical modeling assume that you are working with a clean and tidy dataset. In practice, however, a major part of doing statistical modeling is preparing your data-cleaning up values, creating new columns, reshaping the dataset, or transforming some variables. datawizard provides easy to use tools to perform these common, critical, and sometimes tedious data preparation tasks.

For more, see: https://easystats.github.io/datawizard/

Setup and Model Fitting

library(datawizard)
library(see)
library(ggplot2)
theme_set(theme_modern())

Description of Variable Distributions

(related function documentation)

Histogram for Numbers with Fractional Part

data(iris)
result <- describe_distribution(iris$Sepal.Length)
result

plot(result)

Add Range of Dispersion (SD or MAD)

plot(result, dispersion = TRUE)

Thin Bars for Integer Values

set.seed(333)
x <- sample(1:100, 1000, replace = TRUE)
result <- describe_distribution(x)
result

plot(result)

Use a Normal Curve instead of Ribbon

plot(result, dispersion = TRUE, dispersion_style = "curve")

Highlighting Categories

set.seed(123)
result <- describe_distribution(sample(LETTERS[1:10], 1000, TRUE))

# highlight one category
plot(result, highlight = "D")

# highlight multiple categories
plot(result, highlight = c("D", "H"), size_bar = 0.4)

# own color scales - pass a named vector to 'scale_fill_manual()'
# the name of the non-highlighted color is "no_highlight".
plot(result, highlight = c("D", "H", "A"), size_bar = 0.4) +
  scale_fill_manual(values = c(D = "red", H = "green", A = "gold", no_highlight = "steelblue"))