ggbarstats
In ggstatsplot: 'ggplot2' Based Plots with Statistical Details

#| label = "setup",
#| include = FALSE
source("../setup.R")

You can cite this package/vignette as:

#| label = "citation",
#| echo = FALSE,
#| comment = ""
citation("ggstatsplot")

Lifecycle:

Introduction to `ggbarstats`

The function ggbarstats can be used for quick data exploration and/or to prepare publication-ready pie charts to summarize the statistical relationship(s) among one or more categorical variables. We will see examples of how to use this function in this vignette.

To begin with, here are some instances where you would want to use ggbarstats-

to check if the proportion of observations matches our hypothesized proportion, this is typically known as a "Goodness of Fit" test
to see if the frequency distribution of two categorical variables are independent of each other using the contingency table analysis
to check if the proportion of observations at each level of a categorical variable is equal

Note: The following demo uses the pipe operator (%>%), if you are not familiar with this operator, here is a good explanation: http://r4ds.had.co.nz/pipes.html.

ggbarstats works only with data organized in data frames or tibbles. It will not work with other data structures like base-R tables or matrices. It can operate on data frames that are organized with one row per observation or data frames that have one column containing counts. This vignette provides examples of both (see examples below).

To help demonstrate how ggbarstats can be used with categorical (also known as nominal) data, a modified version of the original Titanic dataset (from the datasets library) has been provided in the {ggstatsplot} package with the name Titanic_full. The Titanic Passenger Survival Dataset provides information "on the fate of passengers on the fatal maiden voyage of the ocean liner Titanic, including economic status (class), sex, age, and survival."

Let's have a look at the structure of both.

#| label = "titanic1"
library(dplyr)

# looking at the original data in tabular format
dplyr::glimpse(Titanic)

# looking at the dataset as a tibble or data frame
dplyr::glimpse(Titanic_full)

Independence (or association) with `ggbarstats`

Let's next investigate whether the passenger's sex was independent of, or associated with, their survival status, i.e., we want to test whether the proportion of people who survived was different between the sexes.

#| label = "ggbarstats1",
#| fig.height = 5,
#| fig.width = 8

ggbarstats(
  data = Titanic_full,
  x = Survived,
  y = Sex
)

The plot clearly shows that survival rates were very different between males and females. The Pearson's $\chi^2$-test of independence is significant given our large sample size. Additionally, for both females and males, the survival rates were significantly different than 50% as indicated by a goodness of fit test for each gender.

Grouped analysis with `grouped_ggbarstats`

What if we want to do the same analysis of gender but also factor in the passenger's age (Age)? We have information that classifies the passengers as Child or Adult, perhaps that makes a difference to their survival rate?

{ggstatsplot} provides a special helper function for such instances: grouped_ggbarstats. It is a convenient wrapper function around combine_plots. It applies ggbarstats across all levels of a specified grouping variable and then combines the list of individual plots into a single plot. Note that the grouping variable can be anything: conditions in a given study, groups in a study sample, different studies, etc.

#| label = "ggbarstats4",
#| fig.height = 10,
#| fig.width = 8

grouped_ggbarstats(
  # arguments relevant for `ggbarstats()`
  data = Titanic_full,
  x = Survived,
  y = Sex,
  grouping.var = Age,
  digits.perc = 1,
  package = "ggsci",
  palette = "category10_d3",
  # arguments relevant for `combine_plots()`
  title.text = "Passenger survival on the Titanic by gender and age",
  caption.text = "Asterisks denote results from proportion tests; \n***: p < 0.001, ns: non-significant",
  plotgrid.args = list(nrow = 2L)
)

The resulting pie charts and statistics make the story clear. For adults gender very much matters. Women survived at much higher rates than men. For children gender is not significantly associated with survival and both male and female children have a survival rate that is not significantly different from 50/50.

Grouped analysis with `ggbarstats` + `{purrr}`

Although grouped_ggbarstats provides a quick way to explore the data, it leaves much to be desired. For example, we may want to add different captions, titles, themes, or palettes for each level of the grouping variable, etc. For cases like these, it would be better to use {purrr} package.

See the associated vignette here: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html

Working with data organized by `counts`

ggbarstats can also work with data frame containing counts (aka tabled data), i.e., when each row doesn't correspond to a unique observation. For example, consider the following notional fishing data frame containing data from two boats (A and B) about the number of different types fish they caught in the months of February and March. In this data frame, each row corresponds to a unique combination of Boat and Month.

#| label = "ggbarstats7",
#| fig.height = 8,
#| fig.width = 9


# (this is completely fictional; I don't know first thing about fishing!)

fishing <- tibble::as_tibble(data.frame(
  Boat = c(rep("B", 4), rep("A", 4), rep("A", 4), rep("B", 4)),
  Month = c(rep("February", 2), rep("March", 2), rep("February", 2), rep("March", 2)),
  Fish = c(
    "Bass",
    "Catfish",
    "Cod",
    "Haddock",
    "Cod",
    "Haddock",
    "Bass",
    "Catfish",
    "Bass",
    "Catfish",
    "Cod",
    "Haddock",
    "Cod",
    "Haddock",
    "Bass",
    "Catfish"
  ),
  SumOfCaught = c(25, 20, 35, 40, 40, 25, 30, 42, 40, 30, 33, 26, 100, 30, 20, 20)
))

fishing

When the data is organized this way, we make a slightly different call to the ggbarstats() function: we use the counts argument.

If we want to investigate the relationship of type of fish by month (a test of independence), our command would be:

#| label = "ggbarstats8",
#| fig.height = 5,
#| fig.width = 8
ggbarstats(
  data = fishing,
  x = Fish,
  y = Month,
  counts = SumOfCaught,
  label = "both",
  package = "ggsci",
  palette = "default_jama",
  title = "Type fish caught by month",
  caption = "Source: completely made up",
  legend.title = "Type fish caught: "
)

The results support our hypothesis that the type of fish caught is related to the month in which we're fishing. The $\chi^2$ independence test results at the top of the plot. In February we catch significantly more Haddock than we would hypothesize for an equal distribution. Whereas in March our results indicate there's no strong evidence that the distribution isn't equal.

Within-subjects designs

Let's imagine we're conducting clinical trials for some new imaginary wonder drug. We have 134 subjects entering the trial. Some of them enter healthy (n = 96), some of them enter the trial already being sick (n = 38). All of them receive our treatment or intervention. Then we check back in a month to see if they are healthy or sick. A classic pre/post experimental design. We're interested in seeing the change in both groupings. In the case of within-subjects designs, you can set paired = TRUE, which will display results from McNemar test in the subtitle.

(Note: If you forget to set paired = TRUE, the results will be inaccurate.)

#| label = "ggbarstats9",
#| fig.height = 5,
#| fig.width = 8

# create imaginary data
clinical_trial <- tibble::tribble(
  ~SickBefore, ~SickAfter, ~Counts,
  "No", "Yes", 4,
  "Yes", "No", 25,
  "Yes", "Yes", 13,
  "No", "No", 92
)

ggbarstats(
  data = clinical_trial,
  x = SickAfter,
  y = SickBefore,
  counts = Counts,
  paired = TRUE,
  label = "both",
  title = "Results from imaginary clinical trial",
  package = "ggsci",
  palette = "default_ucscgb"
)

The results bode well for our experimental wonder drug. Of the 96 who started out healthy only 4% were sick after a month. Ideally, we would have hoped for zero but reality is seldom perfect. On the other side of the 38 who started out sick that number has reduced to just 13 or 34% which is a marked improvement.

Summary of graphics and tests

Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggbarstats.html

Reporting

```{asis, file="../../man/md-fragments/reporting.md"}

For example, let's see the following example:

```r
#| label = "reporting",
#| fig.height = 6,
#| fig.width = 8
ggbarstats(mtcars, am, cyl)

The narrative context (assuming type = "parametric") can complement this plot either as a figure caption or in the main text-

Pearson's $\chi^2$-test of independence revealed that, across 32 automobiles, showed that there was a significant association between transmission engine and number of cylinders. The Bayes Factor for the same analysis revealed that the data were r round(exp(2.82), 2) times more probable under the alternative hypothesis as compared to the null hypothesis. This can be considered strong evidence (Jeffreys, 1961) in favor of the alternative hypothesis.

Suggestions

If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues

Any scripts or data that you put into this service are public.

ggstatsplot documentation built on Sept. 11, 2025, 5:11 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ggstatsplot
'ggplot2' Based Plots with Statistical Details

ggbarstats
In ggstatsplot: 'ggplot2' Based Plots with Statistical Details

Introduction to `ggbarstats`

Independence (or association) with `ggbarstats`

Grouped analysis with `grouped_ggbarstats`

Grouped analysis with `ggbarstats` + `{purrr}`

Working with data organized by `counts`

Within-subjects designs

Summary of graphics and tests

Reporting

Suggestions

Try the ggstatsplot package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ggstatsplot 'ggplot2' Based Plots with Statistical Details

ggbarstats In ggstatsplot: 'ggplot2' Based Plots with Statistical Details

Introduction to ggbarstats

Independence (or association) with ggbarstats

Grouped analysis with grouped_ggbarstats

Grouped analysis with ggbarstats + {purrr}

Working with data organized by counts

Within-subjects designs

Summary of graphics and tests

Reporting

Suggestions

Try the ggstatsplot package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ggstatsplot
'ggplot2' Based Plots with Statistical Details

ggbarstats
In ggstatsplot: 'ggplot2' Based Plots with Statistical Details

Introduction to `ggbarstats`

Independence (or association) with `ggbarstats`

Grouped analysis with `grouped_ggbarstats`

Grouped analysis with `ggbarstats` + `{purrr}`

Working with data organized by `counts`