View source: R/plotting_functions.R
lineplot | R Documentation |
Function to visualize how predicted probabilities change under MLE-recalibration and boldness-recalibration.
lineplot(
x = NULL,
y = NULL,
t_levels = NULL,
plot_original = TRUE,
plot_MLE = TRUE,
df = NULL,
Pmc = 0.5,
event = 1,
return_df = FALSE,
epsilon = .Machine$double.eps,
title = "Line Plot",
ylab = "Probability",
xlab = "Posterior Model Probability",
ylim = c(0, 1),
breaks = seq(0, 1, by = 0.2),
thin_to = NULL,
thin_prop = NULL,
thin_by = NULL,
thin_percent = deprecated(),
seed = 0,
optim_options = NULL,
nloptr_options = NULL,
ggpoint_options = list(alpha = 0.35, size = 1.5, show.legend = FALSE),
ggline_options = list(alpha = 0.25, linewidth = 0.5, show.legend = FALSE)
)
x |
a numeric vector of predicted probabilities of an event. Must only contain values in [0,1]. |
y |
a vector of outcomes corresponding to probabilities in |
t_levels |
Vector of desired level(s) of calibration at which to plot contours. |
plot_original |
Logical. If |
plot_MLE |
Logical. If |
df |
Dataframe returned by previous call to lineplot() specially formatted for use in this function. Only used for faster plotting when making minor cosmetic changes to a previous call. |
Pmc |
The prior model probability for the calibrated model |
event |
Value in |
return_df |
Logical. If |
epsilon |
Amount by which probabilities are pushed away from 0 or 1
boundary for numerical stability. If a value in |
title |
Plot title. |
ylab |
Label for x-axis. |
xlab |
Label for x-axis. |
ylim |
Vector with bounds for y-axis, must be in [0,1]. |
breaks |
Locations along y-axis at which to draw horizontal guidelines,
passed to |
thin_to |
When non-null, the observations in (x,y) are randomly sampled
without replacement to form a set of size |
thin_prop |
When non-null, the observations in (x,y) are randomly
sampled without replacement to form a set that is |
thin_by |
When non-null, the observations in (x,y) are thinned by
selecting every |
thin_percent |
This argument is deprecated, use |
seed |
Seed for random thinning. Set to NULL for no seed. |
optim_options |
List of additional arguments to be passed to optim(). |
nloptr_options |
List with options to be passed to |
ggpoint_options |
List with options to be passed to |
ggline_options |
List with options to be passed to |
This function leverages ggplot()
and related functions from the ggplot2
package (REF).
The goal of this function is to visualize how predicted probabilities change
under different recalibration parameters. By default this function only shows
how the original probabilities change after MLE recalibration. Argument
t_levels
can be used to specify a vector of levels of
boldness-recalibration to visualize in addition to MLE recalibration.
While the x-axis shows the posterior model probabilities of each set of
probabilities, note the posterior model probabilities are not in ascending or
descending order. Instead, they simply follow the ordering of how one might
use the BRcal
package: first looking at the original predictions, then
maximizing calibration, then examining how far they can spread out
predictions while maintaining calibration with boldness-recalibration.
If return_df = TRUE
, a list with the following attributes is
returned:
plot |
A |
df |
Dataframe used to create |
Otherwise just the ggplot
object of the plot is returned.
return_df
While this function does not typically come with a large burden on time
under moderate sample sizes, there is still a call to optim()
under the
hood for MLE recalibration and a call to nloptr()
for each level of
boldness-recalibration that could cause a bottleneck on time. With this in
mind, users can specify return_df=TRUE
to return the underlying dataframe
used to build the resulting lineplot. Then, users can pass this dataframe
to df
in subsequent calls of lineplot
to circumvent these calls to
optim
and nloptr
and make cosmetic changes to the plot.
When return_df=TRUE
, both the plot and the dataframe are returned in a
list. The dataframe contains 6 columns:
probs
: the values of each predicted probability under each set
outcome
: the corresponding outcome for each predicted probability
post
: the posterior model probability of the set as a whole
id
: the id of each individual probability used for mapping observations between sets
set
: the set with which the probability belongs to
label
: the label used for the x-axis in the lineplot
Essentially, each set of probabilities (original, MLE-, and each level of
boldness-recalibration) and outcomes are "stacked" on top of each other.
The id
tells the plotting function how to connect (with line) the same
observation as is changes from the original set to MLE- or
boldness-recalibration.
Another strategy to save time when plotting is to thin the amount of data
plotted. When sample sizes are large, the plot can become overcrowded and
slow to plot. We provide three options for thinning: thin_to
,
thin_prop
, and thin_by
. By default, all three of these settings are
set to NULL
, meaning no thinning is performed. Users can only specify
one thinning strategy at a time. Care should be taken in selecting a
thinning approach based on the nature of your data and problem. Note that
MLE recalibration and boldness-recalibration will be done using the full
set.
Also note that if a thinning strategy is used with return_df=TRUE
, the
returned data frame will only contain the reduced set (i.e. the data
after thinning).
geom_point()
and geom_line()
To make cosmetic changes to the points and lines plotted, users can pass a
list of any desired arguments of geom_point()
and geom_line()
to
ggpoint_options
and ggline_options
, respectively. These will overwrite
everything passed to geom_point()
or geom_line()
except any aesthetic
arguments in aes()
.
Guthrie, A. P., and Franck, C. T. (2024) Boldness-Recalibration for Binary Event Predictions, The American Statistician 1-17.
Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
set.seed(28)
# Simulate 100 predicted probabilities
x <- runif(100)
# Simulated 100 binary event outcomes using x
y <- rbinom(100, 1, x) # By construction, x is well calibrated.
# Lineplot show change in probabilities from original to MLE-recalibration to
# specified Levels of Boldness-Recalibration via t_levels
# Return a list with dataframe used to construct plot with return_df=TRUE
lp1 <- lineplot(x, y, t_levels=c(0.98, 0.95), return_df=TRUE)
lp1$plot
# Reusing the previous dataframe to save calculation time
lineplot(df=lp1$df)
# Adjust geom_point cosmetics via ggpoint
# Increase point size and change to open circles
lineplot(df=lp1$df, ggpoint_options=list(size=3, shape=4))
# Adjust geom_line cosmetics via ggline
# Increase line size and change transparencys
lineplot(df=lp1$df, ggline_options=list(linewidth=2, alpha=0.1))
# Thinning down to 75 randomly selected observation
lineplot(df=lp1$df, thin_to=75)
# Thinning down to 53% of the data
lineplot(df=lp1$df, thin_prop=0.53)
# Thinning down to every 3rd observation
lineplot(df=lp1$df, thin_by=3)
# Setting a different seed for thinning
lineplot(df=lp1$df, thin_prop=0.53, seed=47)
# Setting NO seed for thinning (plot will be different every time)
lineplot(df=lp1$df, thin_to=75, seed=NULL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.