visualize: Visualize statistical inference
In tidymodels/infer: Tidy Statistical Inference

visualize

R Documentation

Visualize statistical inference

Description

Visualize the distribution of the simulation-based inferential statistics or the theoretical distribution (or both!).

Learn more in vignette("infer").

Usage

visualize(data, bins = 15, method = "simulation", dens_color = "black", ...)

visualise(data, bins = 15, method = "simulation", dens_color = "black", ...)

Arguments

`data`	A distribution. For simulation-based inference, a data frame containing a distribution of `calculate()`d statistics or `fit()`ted coefficient estimates. This object should have been passed to `generate()` before being supplied or `calculate()` to `fit()`. For theory-based inference, the output of `assume()`.
`bins`	The number of bins in the histogram.
`method`	A string giving the method to display. Options are `"simulation"`, `"theoretical"`, or `"both"` with `"both"` corresponding to `"simulation"` and `"theoretical"`. If `data` is the output of `assume()`, this argument will be ignored and default to `"theoretical"`.
`dens_color`	A character or hex string specifying the color of the theoretical density curve.
`...`	Additional arguments passed along to functions in ggplot2. For `method = "simulation"`, `stat_bin()`, and for `method = "theoretical"`, `geom_path()`. Some values may be overwritten by infer internally.

Details

In order to make the visualization workflow more straightforward and explicit, visualize() now only should be used to plot distributions of statistics directly. A number of arguments related to shading p-values and confidence intervals are now deprecated in visualize() and should now be passed to shade_p_value() and shade_confidence_interval(), respectively. visualize() will raise a warning if deprecated arguments are supplied.

Value

For calculate()-based workflows, a ggplot showing the simulation-based distribution as a histogram or bar graph. Can also be used to display theoretical distributions.

For assume()-based workflows, a ggplot showing the theoretical distribution.

For fit()-based workflows, a patchwork object showing the simulation-based distributions as a histogram or bar graph. The interface to adjust plot options and themes is a bit different for patchwork plots than ggplot2 plots. The examples highlight the biggest differences here, but see patchwork::plot_annotation() and patchwork::&.gg for more details.

Examples


# generate a null distribution
null_dist <- gss %>%
  # we're interested in the number of hours worked per week
  specify(response = hours) %>%
  # hypothesizing that the mean is 40
  hypothesize(null = "point", mu = 40) %>%
  # generating data points for a null distribution
  generate(reps = 1000, type = "bootstrap") %>%
  # calculating a distribution of means
  calculate(stat = "mean")

# or a bootstrap distribution, omitting the hypothesize() step,
# for use in confidence intervals
boot_dist <- gss %>%
  specify(response = hours) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")

# we can easily plot the null distribution by piping into visualize
null_dist %>%
  visualize()

# we can add layers to the plot as in ggplot, as well...
# find the point estimate---mean number of hours worked per week
point_estimate <- gss %>%
  specify(response = hours) %>%
  calculate(stat = "mean")

# find a confidence interval around the point estimate
ci <- boot_dist %>%
  get_confidence_interval(point_estimate = point_estimate,
                          # at the 95% confidence level
                          level = .95,
                          # using the standard error method
                          type = "se")

# display a shading of the area beyond the p-value on the plot
null_dist %>%
  visualize() +
  shade_p_value(obs_stat = point_estimate, direction = "two-sided")

# ...or within the bounds of the confidence interval
null_dist %>%
  visualize() +
  shade_confidence_interval(ci)

# plot a theoretical sampling distribution by creating
# a theory-based distribution with `assume()`
sampling_dist <- gss %>%
  specify(response = hours) %>%
  assume(distribution = "t")

visualize(sampling_dist)

# you can shade confidence intervals on top of
# theoretical distributions, too---the theoretical
# distribution will be recentered and rescaled to
# align with the confidence interval
visualize(sampling_dist) +
  shade_confidence_interval(ci)


# to plot both a theory-based and simulation-based null distribution,
# use a theorized statistic (i.e. one of t, z, F, or Chisq)
# and supply the simulation-based null distribution
null_dist_t <- gss %>%
  specify(response = hours) %>%
  hypothesize(null = "point", mu = 40) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "t")

obs_stat <- gss %>%
  specify(response = hours) %>%
  hypothesize(null = "point", mu = 40) %>%
  calculate(stat = "t")

visualize(null_dist_t, method = "both")

visualize(null_dist_t, method = "both") +
  shade_p_value(obs_stat, "both")


# to visualize distributions of coefficients for multiple
# explanatory variables, use a `fit()`-based workflow

# fit 1000 models with the `hours` variable permuted
null_fits <- gss %>%
 specify(hours ~ age + college) %>%
 hypothesize(null = "independence") %>%
 generate(reps = 1000, type = "permute") %>%
 fit()

null_fits

# visualize distributions of resulting coefficients
visualize(null_fits)

# the interface to add themes and other elements to patchwork
# plots (outputted by `visualize` when the inputted data
# is from the `fit()` function) is a bit different than adding
# them to ggplot2 plots.
library(ggplot2)

# to add a ggplot2 theme to a `calculate()`-based visualization, use `+`
null_dist %>% visualize() + theme_dark()

# to add a ggplot2 theme to a `fit()`-based visualization, use `&`
null_fits %>% visualize() & theme_dark()


# More in-depth explanation of how to use the infer package
## Not run: 
vignette("infer")

## End(Not run)

tidymodels/infer documentation built on Sept. 25, 2024, 6:08 a.m.