#| label = "setup", #| include = FALSE source("../setup.R")
You can cite this package/vignette as:
#| label = "citation", #| echo = FALSE, #| comment = "" citation("ggstatsplot")
The function ggdotplotstats
can be used for data exploration and to
provide an easy way to make publication-ready dot plots/charts with
appropriate and selected statistical details embedded in the plot itself. In
this vignette, we will explore several examples of how to use it.
This function is a sister function of gghistostats
with the difference being
it expects a labeled numeric variable.
ggdotplotstats
Let's begin with a very simple example from the {ggplot2}
package
(ggplot2::mpg
), a subset of the fuel economy data that the EPA makes available
on http://fueleconomy.gov.
#| label = "mpg" ## looking at the structure of the data using glimpse dplyr::glimpse(ggplot2::mpg)
Let's say we want to visualize the distribution of mileage by car manufacturer.
#| label = "mpg2", #| fig.height = 7, #| fig.width = 9 ## removing factor level with very few no. of observations df <- dplyr::filter(ggplot2::mpg, cyl %in% c("4", "6")) ## creating a vector of colors using `paletteer` package paletter_vector <- paletteer::paletteer_d( palette = "palettetown::venusaur", n = nlevels(as.factor(df$manufacturer)), type = "discrete" ) ggdotplotstats( data = df, x = cty, y = manufacturer, xlab = "city miles per gallon", ylab = "car manufacturer", test.value = 15.5, point.args = list( shape = 16, color = paletter_vector, size = 5 ), title = "Distribution of mileage of cars", ggtheme = ggplot2::theme_dark() )
grouped_ggdotplotstats
What if we want to do the same analysis separately for different engines with different numbers of cylinders?
{ggstatsplot}
provides a special helper function for such instances:
grouped_ggdotplotstats
. This is merely a wrapper function around
combine_plots
. It applies ggdotplotstats
across all levels of
a specified grouping variable and then combines the individual plots into a
single plot.
Let's see how we can use this function to apply ggdotplotstats
to accomplish our
task.
#| label = "grouped1", #| fig.height = 12, #| fig.width = 7 ## removing factor level with very few no. of observations df <- dplyr::filter(ggplot2::mpg, cyl %in% c("4", "6")) grouped_ggdotplotstats( ## arguments relevant for ggdotplotstats data = df, grouping.var = cyl, ## grouping variable x = cty, y = manufacturer, xlab = "city miles per gallon", ylab = "car manufacturer", type = "bayes", ## Bayesian test test.value = 15.5, ## arguments relevant for `combine_plots` annotation.args = list(title = "Fuel economy data"), plotgrid.args = list(nrow = 2) )
{purrr}
Although this is a quick and dirty way to explore a large amount of data with
minimal effort, it does come with an important limitation: reduced flexibility.
For example, if we wanted to add, let's say, a separate test.value
argument
for each gender, this is not possible with grouped_ggdotplotstats
. For cases
like these, or to run separate kinds of tests (robust for some, parametric for
other, while Bayesian for some other levels of the group) it would be better to
use {purrr}
.
See the associated vignette here: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html
Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggdotplotstats.html
```{asis, file="../../man/md-fragments/reporting.md"}
For example, let's see the following example: ```r #| label = "reporting" ggdotplotstats(morley, Speed, Expt, test.value = 800)
The narrative context (assuming type = "parametric"
) can complement this plot
either as a figure caption or in the main text-
Student's t-test revealed that, across 5 experiments, the speed of light was significantly different than posited speed. The effect size $(g = 1.22)$ was very large, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were
r round(exp(1.24), 2)
times more probable under the alternative hypothesis as compared to the null hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the alternative hypothesis.
If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.