#| label = "setup", #| include = FALSE source("../setup.R")
#| label = "suggested_pkgs", #| include = FALSE pkgs <- "ggside" successfully_loaded <- purrr::map_lgl(pkgs, requireNamespace, quietly = TRUE) can_evaluate <- all(successfully_loaded) if (can_evaluate) { purrr::walk(pkgs, library, character.only = TRUE) } else { knitr::opts_chunk$set(eval = FALSE) }
The function ggscatterstats
is meant to provide a publication-ready
scatterplot with all statistical details included in the plot itself to show
association between two continuous variables. This function is also helpful
during the data exploration phase. We will see examples of how to use this
function in this vignette with the ggplot2movies
dataset.
To begin with, here are some instances where you would want to use
ggscatterstats
-
Note before: The following demo uses the pipe operator (%>%
), so in case
you are not familiar with this operator, here is a good explanation:
http://r4ds.had.co.nz/pipes.html
ggscatterstats
To illustrate how this function can be used, we will rely on the ggplot2movies
dataset. This dataset provides information about movies scraped from
IMDB. Specifically, we will be using cleaned version of
this dataset included in the {ggstatsplot}
package itself.
#| label = "ggplot2movies1" ## see the selected data (we have data from 1813 movies) dplyr::glimpse(movies_long)
Now that we have a clean dataset, we can start asking some interesting questions. For example, let's see if the average IMDB rating for a movie has any relationship to its budget. Additionally, let's also see which movies had a high budget but low IMDB rating by labeling those data points.
To reduce the processing time, let's only work with 30% of the dataset.
#| label = "ggscatterstats1", #| fig.height = 6, #| fig.width = 8 ggscatterstats( data = movies_long, ## data frame from which variables are taken x = budget, ## predictor/independent variable y = rating, ## dependent variable xlab = "Budget (in millions of US dollars)", ## label for the x-axis ylab = "Rating on IMDB", ## label for the y-axis label.var = title, ## variable to use for labeling data points label.expression = rating < 5 & budget > 100, ## expression for deciding which points to label point.label.args = list(alpha = 0.7, size = 4, color = "grey50"), xfill = "#CC79A7", ## fill for marginals on the x-axis yfill = "#009E73", ## fill for marginals on the y-axis title = "Relationship between movie budget and IMDB rating", caption = "Source: www.imdb.com" )
There is indeed a small, but significant, positive correlation between the amount of money studio invests in a movie and the ratings given by the audiences.
grouped_ggscatterstats
What if we want to do the same analysis do the same analysis for movies with different MPAA (Motion Picture Association of America) film ratings (NC-17, PG, PG-13, R)?
{ggstatsplot}
provides a special helper function for such instances:
grouped_ggstatsplot
. This is merely a wrapper function around
combine_plots
. It applies {ggstatsplot}
across all levels of
a specified grouping variable and then combines list of individual plots
into a single plot. Note that the grouping variable can be anything: conditions
in a given study, groups in a study sample, different studies, etc.
Let's see how we can use this function to apply ggscatterstats
for all MPAA
ratings. Also, let's run a robust test this time.
#| label = "grouped1", #| fig.height = 12, #| fig.width = 7 grouped_ggscatterstats( ## arguments relevant for ggscatterstats data = movies_long, x = budget, y = rating, grouping.var = mpaa, label.var = title, label.expression = rating < 5 & budget > 80, type = "r", # ggtheme = ggthemes::theme_tufte(), ## arguments relevant for combine_plots annotation.args = list( title = "Relationship between movie budget and IMDB rating", caption = "Source: www.imdb.com" ), plotgrid.args = list(nrow = 3, ncol = 1) )
As seen from the plot, this analysis has revealed something interesting: The relationship we found between budget and IMDB rating holds only for PG-13 and R-rated movies.
ggscatterstats
+ {purrr}
Although this is a quick and dirty way to explore large amount of data with
minimal effort, it does come with an important limitation: reduced flexibility.
For example, if we wanted to add, let's say, a separate type of marginal
distribution plot for each MPAA rating or if we wanted to use different types of
correlations across different levels of MPAA ratings (NC-17 has only 6 movies,
so a robust correlation would be a good idea), this is not possible. But this
can be easily done using {purrr}
.
See the associated vignette here: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/purrr_examples.html
Details about underlying functions used to create graphics and statistical tests carried out can be found in the function documentation: https://indrajeetpatil.github.io/ggstatsplot/reference/ggscatterstats.html
```{asis, file="../../man/md-fragments/reporting.md"}
For example, let's see the following example: ```r #| label = "reporting" ggscatterstats(mtcars, qsec, drat)
The narrative context (assuming type = "parametric"
) can complement this plot
either as a figure caption or in the main text-
Pearson's correlation test revealed that, across 32 cars, a measure of acceleration (1/4 mile time;
qsec
) was positively correlated with rear axle ratio (drat
), but this effect was not statistically significant. The effect size $(r = 0.09)$ was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data werer round(exp(1.20), 2)
times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (of absence of any correlation between these two variables).
If you find any bugs or have any suggestions/remarks, please file an issue on GitHub: https://github.com/IndrajeetPatil/ggstatsplot/issues
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.