#| label = "setup", #| include = FALSE source("../setup.R")
#| label = "suggested_pkgs", #| include = FALSE pkgs <- "PMCMRplus" successfully_loaded <- purrr::map_lgl(pkgs, requireNamespace, quietly = TRUE) can_evaluate <- all(successfully_loaded) if (can_evaluate) { purrr::walk(pkgs, library, character.only = TRUE) } else { knitr::opts_chunk$set(eval = FALSE) }
You can cite this package/vignette as:
#| label = "citation", #| echo = FALSE, #| comment = "" citation("ggstatsplot")
Following are a few of the common questions asked in GitHub issues and on social media platforms.
All functions in {ggstatsplot}
that display results from statistical analysis in
a subtitle have argument results.subtitle
. Setting it to FALSE
will return
only the plot.
Sometimes you may not wish include so many details in the subtitle. In that case, you can extract the expression and copy-paste only the part you wish to include. For example, here only statistic and p-values are included:
#| label = "custom_expr" library(ggplot2) library(statsExpressions) # extracting detailed expression data_results <- oneway_anova(iris, Species, Sepal.Length, var.equal = TRUE) data_results$expression[[1]] # adapting the details to your liking ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot() + labs(subtitle = ggplot2::expr(paste( italic("F"), "(", "2", ",", "147", ")=", "119.26", ", ", italic("p"), "<", "0.001" )))
Error in grid.Call
errorSometimes, if you are working in RStudio
, you might see the following error-
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : polygon edge not found
This can possibly be solved by increasing the size of RStudio viewer pane.
In order to prevent the entire plotting function from failing when statistical
analysis fails, functions in {ggstatsplot}
default to first attempting to run
the analysis and if they fail, then return empty (NULL
) subtitle/caption. In
such cases, if you wish to diagnose why the analysis is failing, you will have
to do so using the underlying function used to carry out statistical analysis.
For example, the following returns only the plot but not the statistical details in a subtitle.
#| label = "null_subtitle" df <- data.frame(x = 1, y = 2) ggscatterstats(df, x, y, type = "robust")
To see why the statistical analysis failed, you can look at the error from the underlying function:
#| error = TRUE library(statsExpressions) df <- data.frame(x = 1, y = 2) corr_test(df, x, y, type = "robust")
In case you are not sure what was the statistical test that produced the results shown in the subtitle of the plot, the best way to get that information is to either look at the documentation for the function used or check out the associated vignette.
Summary of all analysis is handily available in README
:
https://github.com/IndrajeetPatil/ggstatsplot/blob/master/README.md
{ggstatsplot}
functions in a for
loop?Given that all functions in {ggstatsplot}
use tidy evaluation, running these
functions in a for
loop requires minor adjustment to how inputs are entered:
#| label = "loop", #| eval = FALSE col.name <- colnames(mtcars) # executing the function in a `for` loop for (i in 3:length(col.name)) { ggbetweenstats( data = mtcars, x = cyl, y = !!col.name[i] ) }
That said, if repeating function execution across multiple columns in a
data frame in what you want to do, I will recommend {purrr}
-based solution:
This solution would work for x
and y
arguments, but not for grouping.var
argument, which first needs to be converted to a symbol:
#| label = "tidyeval-grouping-var", #| eval = FALSE df <- dplyr::filter(movies_long, genre == "Comedy" | genre == "Drama") grouped_ggscatterstats( data = df, x = !!colnames(df)[3], y = !!colnames(df)[5], grouping.var = !!rlang::sym(colnames(df)[8]), results.subtitle = FALSE )
grouped_
functions?Across different facets of a grouped_
plot, the axes ranges might sometimes
differ. You can use the ggplot.component
parameter (present in all functions)
to have the same scale across the individual plots:
#| label = "grouped_y_axes", #| fig.height = 6, #| fig.width = 8 # provide a list of further `{ggplot2}` modifications using `ggplot.component` grouped_ggscatterstats( mtcars, disp, hp, grouping.var = am, results.subtitle = FALSE, ggplot.component = list(ggplot2::scale_y_continuous( breaks = seq(50, 350, 50), limits = (c(50, 350)) )) )
{ggstatsplot}
work with plotly
?The plotly
R graphing library makes it easy to produce interactive web
graphics via plotly.js
.
The {ggstatsplot}
functions are compatible with plotly
.
library(plotly) # creating ggplot object with `{ggstatsplot}` p <- ggbetweenstats(mtcars, cyl, mpg) # converting to plotly object plotly::ggplotly(p, width = 480, height = 480)
grouped_
functions with more than one group?Currently, the grouped_
variants of functions only support repeating the
analysis across a single grouping variable. Often, you have to run the same
analysis across a combination of more than two grouping variables. This can be
easily achieved using {purrr}
package.
Here is an example-
#| label = "grouped_2", #| fig.width = 6, #| fig.height = 6 # creating a list by splitting data frame by combination of two different # grouping variables df_list <- mpg %>% dplyr::filter(drv %in% c("4", "f"), fl %in% c("p", "r")) %>% split(f = list(.$drv, .$fl), drop = TRUE) # checking if the length of the list is 4 length(df_list) # running correlation analyses between; this will return a *list* of plots plot_list <- purrr::pmap( .l = list( data = df_list, x = "displ", y = "hwy", results.subtitle = FALSE ), .f = ggscatterstats ) # arrange the list in a single plot grid combine_plots( plotlist = plot_list, plotgrid.args = list(nrow = 2), annotation.args = list(tag_levels = "i") )
#| label = "facet_expr", #| fig.width = 6, #| fig.height = 8 library(ggplot2) # data mtcars1 <- mtcars p <- grouped_ggbetweenstats( data = mtcars1, x = cyl, y = mpg, grouping.var = am ) expr1 <- extract_subtitle(p[[1L]]) expr2 <- extract_subtitle(p[[2L]]) mtcars1$am <- factor(mtcars1$am, levels = c(0, 1), labels = c(expr1, expr2)) mtcars1 %>% ggplot(aes(x = cyl, y = mpg)) + geom_jitter() + facet_wrap( vars(am), ncol = 1, strip.position = "top", labeller = ggplot2::label_parsed )
Currently, for ggbetweenstats
and ggwithinstats
, you can either display all
significant comparisons, all non-significant comparisons, or all
comparisons. But what if I am only interested in just one particular comparison?
Here is a workaround using {ggsignif}
:
#| label = "custom_pairwise", #| fig.width = 7, #| fig.height = 6 library(ggsignif) ggbetweenstats(mtcars, cyl, wt, pairwise.display = "none") + geom_signif(comparisons = list(c("4", "6")), test.args = list(exact = FALSE))
Behind the scenes, {ggstatsplot}
uses statsExpressions::pairwise_comparisons()
function.
You can use it to extract actual data frames used in {ggstatsplot}
functions.
library(ggplot2) pairwise_comparisons(mtcars, cyl, wt)
{ggstatsplot}
defaults to displaying exact p-values or logged Bayes Factor
values for pairwise comparisons. But what if you wish to adopt a different
annotation labels?
You will have to customize them yourself:
#| label = "comp_asterisks" library(ggplot2) library(ggsignif) # converting to factor mtcars$cyl <- as.factor(mtcars$cyl) # creating the base plot p <- ggbetweenstats(mtcars, cyl, wt, pairwise.display = "none") # using `pairwise_comparisons()` function to create a data frame with results df <- pairwise_comparisons(mtcars, cyl, wt) %>% dplyr::mutate(groups = purrr::pmap(.l = list(group1, group2), .f = c)) %>% dplyr::arrange(group1) %>% dplyr::mutate(asterisk_label = c("**", "***", "**")) df # adding pairwise comparisons using `{ggsignif}` package p + ggsignif::geom_signif( comparisons = df$groups, map_signif_level = TRUE, annotations = df$asterisk_label, y_position = c(5.5, 5.75, 6.0), test = NULL, na.rm = TRUE )
You can use the extract_stats()
helper function for this.
#| label = "onesample_df" library(ggplot2) p <- ggpiestats(mtcars, am, cyl) # data frame with results extract_stats(p)
geom
layer from the plot?Sometimes you may not want a particular geom
layer to be displayed. You can
remove them by setting transparency (alpha
) for that layer to 0.
For example, let's say I want to remove the points from ggwithintstats()
plot:
#| label = "geom_removal", #| fig.width = 7, #| fig.height = 5 # before ggwithinstats( data = bugs_long, x = condition, y = desire, results.subtitle = FALSE, pairwise.display = "none" ) # after ggwithinstats( data = bugs_long, x = condition, y = desire, point.args = list(alpha = 0), results.subtitle = FALSE, pairwise.display = "none" )
Sometimes you may not be satisfied with the available color palette values. In this case, you can also change the colors by manually specifying these values.
#| label = "ggbar_colors" library(ggplot2) ggbarstats(mtcars, am, cyl, results.subtitle = FALSE) + scale_fill_manual(values = c("#E7298A", "#66A61E"))
The same can also be done for grouped_
functions:
#| label = "ggpie_colors", #| fig.width = 12, #| fig.height = 6 grouped_ggpiestats( data = mtcars, grouping.var = am, x = cyl, ggplot.component = ggplot2::scale_fill_grey() )
grouped_
outputs using {ggplot2}
functions?All {ggstatsplot}
are ggplot
objects, which can be further modified, just like
any other ggplot
object. But exception to these are all plots returned by
grouped_
functions, but there is a way to tackle this.
#| label = "grouped_modify" library(paletteer) library(ggplot2) grouped_ggbetweenstats( mtcars, cyl, wt, grouping.var = am, results.subtitle = FALSE, pairwise.display = "none", # modify further with `{ggplot2}` functions ggplot.component = list( scale_color_manual(values = paletteer::paletteer_c("viridis::viridis", 3)), theme(axis.text.x = element_text(angle = 90)) ) )
{ggstatsplot}
?{ggstatsplot}
can return expressions in the subtitle and caption, but what if
you want to actually get back data frame containing the results?
You have two options:
ggstatsplot::extract_stats()
function{statsExpressions}
(see examples)ggbarstats
?library(gginnards) ## create a plot p <- ggbarstats(mtcars, am, cyl) ## remove layer corresponding to sample size delete_layers(p, "GeomText")
By default, since {ggstatsplot}
always allows just one type of test per
statistical approach, sometimes your favorite test might not be available. For
example, {ggstatsplot}
provides only Spearman's $\rho$, but not Kendall's
$\tau$ as a non-parametric correlation test.
In such cases, you can override the defaults and use {statsExpressions}
to
create custom expressions to display in the plot. But be forewarned that the
expression building function in {statsExpressions}
is not stable yet.
#| label = "custom_test", #| fig.width = 6, #| fig.height = 6 library(correlation) library(statsExpressions) library(ggplot2) # data with two variables of interest df <- dplyr::select(mtcars, wt, mpg) # correlation results results <- correlation(df, method = "kendall") %>% insight::standardize_names(style = "broom") # creating expression out of these results df_results <- statsExpressions::add_expression_col( data = results, no.parameters = 0L, statistic.text = list(quote(italic("T"))), effsize.text = list(quote(widehat(italic(tau))["Kendall"])), n = results$n.obs[[1]] ) # using custom expression in plot ggscatterstats(df, wt, mpg, results.subtitle = FALSE) + labs(subtitle = df_results$expression[[1]])
No, there is no way to adjust alpha if you use grouped_
functions (e.g.,
grouped_ggwithinstats
). You will have to just report in the
paper/article/report, what your adjusted alpha is.
So, for example, iif 2 tests are being carried out, the alpha is going to be
0.05/2 = 0.025
. So, when you describe the Methods section, you can mention
that only those tests should be considered significant where p < 0.025
. Or you
can even mention this in the caption.
Shiny
app using {ggstatsplot}
functions?Below is an example using ggbetweenstats
function.
library(shiny) library(rlang) ui <- fluidPage( headerPanel("Example - ggbetweenstats"), sidebarPanel( selectInput("x", "xcol", "X Variable", choices = names(iris)[5]), selectInput("y", "ycol", "Y Variable", choices = names(iris)[1:4]) ), mainPanel(plotOutput("plot")) ) server <- function(input, output) { output$plot <- renderPlot({ ggbetweenstats(iris, !!input$x, !!input$y) }) } shinyApp(ui, server)
grouped_*
functions?library(ggplot2) grouped_ggbetweenstats( data = dplyr::filter(ggplot2::mpg, drv != "4"), x = year, y = hwy, grouping.var = drv, results.subtitle = FALSE, ## arguments given to `{patchwork}` for combining plots annotation.args = list( title = "this is my title", subtitle = "this is my subtitle", theme = ggplot2::theme( plot.subtitle = element_text(size = 20), plot.title = element_text(size = 30) ) ) )
ggbetweenstats( data = iris, x = Species, y = Sepal.Length, ggplot.component = list(theme(plot.subtitle = element_text(size = 20, face = "bold"))) )
This is not possible out of the box, but see this comment.
{ggstatsplot}
carry out assumption checks?No, {ggstatsplot}
does not carry out any analysis of whether assumptions are
met or not. It will just carry out whatever test you ask it to carry out.
To check these assumptions, you can use a different package called {performance}
:
{PMCMRplus}
?Linux users may encounter some installation problems. In particular, the
{ggstatsplot}
package depends on the {PMCMRplus}
package.
ERROR: dependencies ‘gmp’, ‘Rmpfr’ are not available for package ‘PMCMRplus’
This means that your operating system lacks gmp
and Rmpfr
libraries.
If you use Ubuntu
, you can install these dependencies:
sudo apt-get install libgmp3-dev sudo apt-get install libmpfr-dev
The following README
file briefly describes the installation procedure:
https://CRAN.R-project.org/package=PMCMRplus/readme/README.html
For MacOS, have a look at this post.
ggbetweenstats( mtcars, cyl, wt, ggplot.component = list( ggplot2::scale_y_continuous(sec.axis = ggplot2::dup_axis(name = "My custom test")) ) )
set.seed(123) library(ggstatsplot) library(WRS2) ggwithinstats( WineTasting, Wine, Taste, paired = TRUE ) ggwithinstats( WineTasting, Wine, Taste, paired = TRUE, digits = 4L )
If you find any bugs or have any suggestions/remarks, please file an issue on
GitHub
: https://github.com/IndrajeetPatil/ggstatsplot/issues
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.