#| label = "setup", #| include = FALSE source("../setup.R")
#| label = "suggested_pkgs", #| include = FALSE pkgs <- "psych" successfully_loaded <- purrr::map_lgl(pkgs, requireNamespace, quietly = TRUE) can_evaluate <- all(successfully_loaded) if (can_evaluate) { purrr::walk(pkgs, library, character.only = TRUE) } else { knitr::opts_chunk$set(eval = FALSE) }
You can cite this package/vignette as:
#| label = "citation", #| echo = FALSE, #| comment = "" citation("ggstatsplot")
The function gghistostats
can be used for data exploration
and to provide an easy way to make publication-ready histograms with
appropriate and selected statistical details embedded in the plot itself. In
this vignette we will explore several examples of how to use it.
Some instances where you would want to use gghistostats
-
gghistostats
Let's begin with a very simple example from the psych
package
(psych::sat.act
), a sample of 700 self-reported scores on the SAT Verbal, SAT
Quantitative and ACT tests. ACT composite scores may range from 1 - 36. National
norms have a mean of 20.
#| label = "psychact" ## loading needed libraries library(psych) library(dplyr) ## looking at the structure of the data using glimpse dplyr::glimpse(psych::sat.act)
To get a simple histogram with no statistics and no special information.
gghistostats
will by default choose a binwidth max(x) - min(x) / sqrt(N)
.
You should always check this value and explore multiple widths to find the best
to illustrate the stories in your data since histograms are sensitive to
binwidth.
Let's display the national norms (labeled as "Test") and test the hypothesis
that our sample mean is the same as our national population mean of 20 using a
parametric one sample t-test (type = "p"
).
#| label = "psychact3", #| fig.height = 6, #| fig.width = 7 gghistostats( data = psych::sat.act, ## data from which variable is to be taken x = ACT, ## numeric variable xlab = "ACT Score", ## x-axis label title = "Distribution of ACT Scores", ## title for the plot test.value = 20, ## test value caption = "Data courtesy of: SAPA project (https://sapa-project.org)" )
gghistostats
computed Bayes Factors to quantify the likelihood of the research (BF10) and
the null hypothesis (BF01). In our current example, the Bayes Factor value provides
very strong evidence (Kass and Rafferty, 1995)
in favor of the research hypothesis: these ACT scores are much higher than the
national average. The log(Bayes factor) of 492.5 means the odds are 7.54e+213:1
that this sample is different.
grouped_gghistostats
What if we want to do the same analysis separately for each gender?
{ggstatsplot}
provides a special helper function for such instances:
grouped_gghistostats
. This is merely a wrapper function around
combine_plots
. It applies gghistostats
across all levels of
a specified grouping variable and then combines the individual plots into a
single plot. Note that the grouping variable can be anything: conditions in a
given study, groups in a study sample, different studies, etc.
Let's see how we can use this function to apply gghistostats
to accomplish our
task.
#| label = "grouped1", #| fig.height = 10, #| fig.width = 7 grouped_gghistostats( ## arguments relevant for gghistostats data = psych::sat.act, x = ACT, ## same outcome variable xlab = "ACT Score", grouping.var = gender, ## grouping variable males = 1, females = 2 type = "robust", ## robust test: one-sample percentile bootstrap test.value = 20, ## test value against which sample mean is to be compared centrality.line.args = list(color = "#D55E00", linetype = "dashed"), # ggtheme = ggthemes::theme_stata(), ## changing default theme ## turn off ggstatsplot theme layer ## arguments relevant for combine_plots annotation.args = list( title = "Distribution of ACT scores across genders", caption = "Data courtesy of: SAPA project (https://sapa-project.org)" ), plotgrid.args = list(nrow = 2) )
As can be seen from these plots, the mean value is much higher than the national norm. Additionally, we see the benefits of plotting this data separately for each gender. We can see the differences in distributions.
{purrr}
Although this is a quick and dirty way to explore a large amount of data with
minimal effort, it does come with an important limitation: reduced flexibility.
For example, if we wanted to add, let's say, a separate test.value
argument
for each gender, this is not possible with grouped_gghistostats
.
For cases like these, or to run separate kinds of tests (robust for some,
parametric for other, while Bayesian for some other levels of the group) it
would be better to use {purrr}
.
See the associated vignette here: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html
Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/gghistostats.html
```{asis, file="../../man/md-fragments/reporting.md"}
For example, let's see the following example: ```r #| label = "reporting" gghistostats(trees, Height, test.value = 75)
The narrative context (assuming type = "parametric"
) can complement this plot
either as a figure caption or in the main text-
Student's t-test revealed that, across 31 felled black cherry trees, although the height was higher than expected height of 75 ft., this effect was not statistically significant. The effect size $(g = 0.15)$ was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were
r round(exp(1.3), 2)
times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis.
If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.