In CWWhitney/uncertainty: Calculates functions for assessing uncertainty

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

#install and load packages
library(ggplot2)
library(rmarkdown)
# devtools::install_github("CWWhitney/uncertainty")
library(uncertainty)

#Automatically write R package citation entries to a .bib file
knitr::write_bib(c(.packages(), 
                   'ggplot2'), 'bib/packages.bib')

#resolves an issue with the scientific style of numbers, like one can read inside this answer from @Paul Hiemstra: https://stackoverflow.com/a/25947542/4061993 from the documentation of ?options:
options(scipen=999)

Create random influence variable {in_var} and outcome variable {out_var} data using sample().

in_var <- sample(x = 10:500, size = 20, replace = TRUE)
out_var <- sample(x = 1000:5000, size = 20, replace = TRUE)

Using varscatter() create a scatter plot of an influence variable of interest {in_var} (x) and an outcome variable {out_var} (y) with associated and estimated densities with loess smooth linear fits density curves. Plot made with the scatter.hist() function in the psych() package [@R-psych] in R [@R-base].

uncertainty::varscatter(in_var, out_var)

Use varkernel() for another type of graphical representation of the same relationship. varkernel() performs a two-dimensional kernel density estimation for the expected value of the outcome variable {out_var} and the influence variable of interest {in_var} using the kde2d() function in the MASS() package from base R [@R-base]. The function produces a matrix of the estimated density (z) of the expected value of the outcome variable {out_var} (y) and the influence variable of interest {in_var} (x). It looks slightly different than the scatter plot estimates above because the density function restricts the shape of the kernel to a bivariate normal kernel.

uncertainty::varkernel(in_var, out_var)

varkernel() shows a density surface plot of a given influence variable of interest {in_var} (x) and the outcome variable {out_var} (y). The legend shows the value for the estimated density (z). The plot is made with the filled.contour() function of the graphics() package.

In varkernel() the density (z) over the entire plot integrates to one, and therefore represents the relative probability of an observation (the outcome variable {out_var} along y-axis) given a specific value for a given influence variable of interest {in_var} (along x-axis).

Estimated outcome variable {out_var} given the expected variable

varkernelslice() calculates the estimated outcome variable {out_var} given the expected value of the influence variable of interest {in_var}, based on a slice of 'z' from the Kernel density calculated with varkernel(). The expectedvariable parameter is set to 30.

uncertainty::varkernelslice(in_var, out_var, expectedin_var = 300)

varkernelslice() plots the probabilities (shown along the y-axis) for the expected value of the outcome variable {out_var} (shown along the x-axis). Since this is a cut through the density kernel varkernel(), which integrates to 1, the probability values are relative, not absolute measures.

Influencing variable intervals

varviolin() determines different possible influencing variable intervals {in_var} by calculating the optimal interval width for the given variable values. This is done using the IQR() function in the stats() package, after the Freedman-Diaconis rule (IQR = interquartile range). Optimal interval width for our sample = 2 * interquartile range for our sample / (total number of observations for the interquartile range for our sample)^(1/3)

uncertainty::varviolin(in_var, out_var)

varviolin() shows violin plots of influencing variable {in_var} values (x) and outcome variable {out_var} (y) values, with six different intervals of influencing variable {in_var} values. Plot made with ggplot2() [@R-ggplot2].

Probability of outcome variable given variable

varkernelslicerange() shows the influencing variable {in_var} value intervals, optimized interquartile ranges shown in varviolin() can be used to select a range to slice from the density kernel varkernel() as was done for a single variable value in varkernelslice(). Here we set the maximum variable value to 200 and the minimum to 100.

uncertainty::varkernelslicerange(in_var = in_var, out_var = out_var, 
                                 min_in_var = 100, max_in_var = 200)

varkernelslicerange() plots the probabilities (shown along the y-axis) for the expected value of the outcome variable {out_var} (shown along the x-axis). As with varkernelslice() the probability values shown are relative, not absolute measures. They are the result of cuts through the density kernel varkernel(), which integrates to 1.

This document was generated using R Markdown [@R-rmarkdown].