knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ggstats) library(dplyr) library(ggplot2)
The purpose of gglikert()
is to generate a centered bar plot comparing the answers of several questions sharing a common Likert-type scale.
likert_levels <- c( "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree" ) set.seed(42) df <- tibble( q1 = sample(likert_levels, 150, replace = TRUE), q2 = sample(likert_levels, 150, replace = TRUE, prob = 5:1), q3 = sample(likert_levels, 150, replace = TRUE, prob = 1:5), q4 = sample(likert_levels, 150, replace = TRUE, prob = 1:5), q5 = sample(c(likert_levels, NA), 150, replace = TRUE), q6 = sample(likert_levels, 150, replace = TRUE, prob = c(1, 0, 1, 1, 0)) ) %>% mutate(across(everything(), ~ factor(.x, levels = likert_levels))) likert_levels_dk <- c( "Strongly disagree", "Disagree", "Neither agree nor disagree", "Agree", "Strongly agree", "Don't know" ) df_dk <- tibble( q1 = sample(likert_levels_dk, 150, replace = TRUE), q2 = sample(likert_levels_dk, 150, replace = TRUE, prob = 6:1), q3 = sample(likert_levels_dk, 150, replace = TRUE, prob = 1:6), q4 = sample(likert_levels_dk, 150, replace = TRUE, prob = 1:6), q5 = sample(c(likert_levels_dk, NA), 150, replace = TRUE), q6 = sample( likert_levels_dk, 150, replace = TRUE, prob = c(1, 0, 1, 1, 0, 1) ) ) %>% mutate(across(everything(), ~ factor(.x, levels = likert_levels_dk)))
Simply call gglikert()
.
gglikert(df)
The list of variables to plot (all by default) could by specify with include
. This argument accepts tidy-select syntax.
gglikert(df, include = q1:q3)
The generated plot is a standard ggplot2
object. You can therefore use ggplot2
functions to custom many aspects.
gglikert(df) + ggtitle("A Likert-type items plot", subtitle = "generated with gglikert()") + scale_fill_brewer(palette = "RdYlBu")
You can sort the plot with sort
.
gglikert(df, sort = "ascending")
By default, the plot is sorted based on the proportion being higher than the center level, i.e. in this case the proportion of answers equal to "Agree" or "Strongly Agree". Alternatively, the questions could be transformed into a score and sorted accorded to their mean.
gglikert(df, sort = "ascending", sort_method = "mean")
You can reverse the order of the answers with reverse_likert
.
gglikert(df, reverse_likert = TRUE)
Proportion labels could be removed with add_labels = FALSE
.
gglikert(df, add_labels = FALSE)
or customized.
gglikert( df, labels_size = 3, labels_accuracy = .1, labels_hide_below = .2, labels_color = "white" )
By default, totals are added on each side of the plot. In case of an uneven number of answer levels, the central level is not taken into account for computing totals. With totals_include_center = TRUE
, half of the proportion of the central level will be added on each side.
gglikert( df, totals_include_center = TRUE, sort = "descending", sort_prop_include_center = TRUE )
Totals could be customized.
gglikert( df, totals_size = 4, totals_color = "blue", totals_fontface = "italic", totals_hjust = .20 )
Or removed.
gglikert(df, add_totals = FALSE)
If you are using variable labels (see labelled::set_variable_labels()
), they will be taken automatically into account by gglikert()
.
if (require(labelled)) { df <- df %>% set_variable_labels( q1 = "first question", q2 = "second question", q3 = "this is the third question with a quite long variable label" ) } gglikert(df)
You can also provide custom variable labels with variable_labels
.
gglikert( df, variable_labels = c( q1 = "alternative label for the first question", q6 = "another custom label" ) )
You can control how variable labels are wrapped with y_label_wrap
.
gglikert(df, y_label_wrap = 20) gglikert(df, y_label_wrap = 200)
Sometimes, the dataset could contain certain values that you should not be displayed.
gglikert(df_dk)
A first option could be to convert the don't knows into NA
. In such case, the proportions will be computed on non missing.
df_dk %>% mutate(across(everything(), ~ factor(.x, levels = likert_levels))) %>% gglikert()
Or, you could use exclude_fill_values
to not display specific values, but still counting them in the denominator for computing proportions.
df_dk %>% gglikert(exclude_fill_values = "Don't know")
To define facets, use facet_rows
and/or facet_cols
.
df_group <- df df_group$group1 <- sample(c("A", "B"), 150, replace = TRUE) df_group$group2 <- sample(c("a", "b", "c"), 150, replace = TRUE) gglikert(df_group, q1:q6, facet_cols = vars(group1), labels_size = 3 ) gglikert(df_group, q1:q2, facet_rows = vars(group1, group2), labels_size = 3 ) gglikert(df_group, q3:q6, facet_cols = vars(group1), facet_rows = vars(group2), labels_size = 3 ) + scale_x_continuous( labels = label_percent_abs(), expand = expansion(0, .2) )
To compare answers by subgroup, you can alternatively map .question
to facets, and define a grouping variable for y
.
gglikert(df_group, q1:q4, y = "group1", facet_rows = vars(.question), labels_size = 3, facet_label_wrap = 15 )
For a more classical stacked bar plot, you can use gglikert_stacked()
.
gglikert_stacked(df) gglikert_stacked( df, sort = "asc", add_median_line = TRUE, add_labels = FALSE ) gglikert_stacked( df_group, include = q1:q4, y = "group2" ) + facet_grid( rows = vars(.question), labeller = label_wrap_gen(15) )
Internally, gglikert()
is calling gglikert_data()
to generate a long format dataset combining all questions into two columns, .question
and .answer
.
gglikert_data(df) %>% head()
Such dataset could be useful for other types of plot, for example for a classic stacked bar plot.
ggplot(gglikert_data(df)) + aes(y = .question, fill = .answer) + geom_bar(position = "fill")
gglikert()
, gglikert_stacked()
and gglikert_data()
accepts a weights
argument, allowing to specify statistical weights.
df$sampling_weights <- runif(nrow(df)) gglikert(df, q1:q4, weights = sampling_weights)
The function position_likert()
used to center bars.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.