knitr::opts_chunk$set(echo = TRUE) library(Durga) library(datasets)
Durga
is an R package that aims to simplify the estimation and visualisation of one type of effect size: differences in group means. The package consists of two main functions and several auxiliary functions. The DurgaDiff
function estimates raw or standardised mean differences, while DurgaPlot
will create a plot for visualising both the original data and the effect size, and provides an extensive range of display options. The function DurgaBrackets
can add confidence brackets to an existing Durga plot. Since Durga
is implemented in base R, Durga plots can be positioned using the standard R plotting layout tools: par(mfrow = c(...))
, layout()
and split.screen()
.
This vignette shows some example Durga plots. Each plot is preceded by the code used to generate it, which can be used to learn how to create the plot.
Before plotting your data, you must estimate the differences in group means by calling DurgaDiff
. Your raw data should be in a data frame (or similar) in either long or wide format. A long format data frame contains one observation per row, a column for the measurement or value being analysed, and another column for the group or treatment. In addition, paired or repeated measures data requires an ID column to match measurements for the same specimen/individual. For example, within the insulin
data set, the value being measured - blood sugar level - is recorded in the column sugar
(specified in the DurgaDiff
call by passing the argument data.col = "sugar"
); treatments are "before"
and "after"
in the treatment
column (group.col = "treatment"
); and the identity of the individual (rabbit) being tested is given in the id
column (id.col = "id"
). The blood sugar level of each rabbit was tested before and after treatment with insulin, so for each rabbit, there is a row with the before
sugar level, and another with the after
sugar level. Both long and wide format can be used to represent paired and unpaired data, however long format is most appropriate for unpaired data.
# Inspect the insulin data set, paired data in long format head(insulin[insulin$id < 4, ]) # Calculate differences in group means d <- DurgaDiff(insulin, data.col = "sugar", group.col = "treatment", id.col = "id") d
When your data set is in long format, you may alternatively identify your data with a formula. The formula interface allows a little more flexibility when specifying your model, but the end results are equivalent. The formula must be specified as <data variable> ~ <group variable>
. The data table must be in long format. For paired data, specify the id
column using the id.col
argument.
# Use a formula to define the model DurgaDiff(sugar ~ treatment, insulin, id.col = "id")
DurgaDiff also accepts data in wide format. A wide format data frame contains a measurement column for each group. For example, an experiment with a control and a treatment might comprise a column named control
and another named treatment
. To pass wide format data to DurgaDiff
, do not specify the data.col
or group.col
arguments. Instead, you must identify the groups in your data with the groups
argument. Each group must be the name of a column in the data frame. Wide format may be a natural way to represent paired data, as each row represents a specimen. Wide format is not as appropriate for unpaired data, as a single row will contain measurements from two (or more) unrelated specimens, although Durga will handle it. To use wide format with unpaired data, specify id.col = NULL
. If your groups have different sample sizes, you will also have to specify na.rm = TRUE
.
# Inspect the wide insulin data set head(insulin.wide) # Calculate differences in group means (d <- DurgaDiff(insulin.wide, groups = c("sugar.before", "sugar.after")))
Durga can estimate unstandardised or various standardised group differences, which are specified using the DurgaDiff parameter effect.type
. Chapter 11, Cohen's d in Cumming (2012) is a good reference for understanding standardised effect sizes and the importance of choosing the appropriate standardiser. The online help for DurgaDiff summarises the effect types that are built-in to Durga.
The simplest plot to create simply uses all default values. This plot shows raw data points displayed as individual data points, the distribution of data points within each group shown as half violins, and effect size to the right (for a single group comparison) or beneath the group data (for multiple comparisons).
The effect size is an estimate of the difference in means between two groups (or a similar statistic). It is shown with a symbol (a triangle by default) at the estimated value, and a thick line showing the confidence interval of the estimate. A filled half-violin shows the distribution of bootstrapped effect sizes, truncated at the top and bottom so it does not extend past the confidence interval.
If the data are paired (aka repeated measures; id.col
was specified in the DurgaDiff
call), then pair lines are drawn to connect measurements for the same individual/specimen.
DurgaPlot(d)
A DurgaPlot
data visualisation is a combination of one or more visual components: data points
; violin
plots; bar
charts; box
plots; repeated measure paired
lines; a mean or median indicator (central.tendency
); and error bars (error.bars
). Each component is controlled by a set of parameters to the DurgaPlot
function. The names of the parameters indicate the component they apply to, so to adjust the violin plot component, read the documentation about all parameters with prefix violin
. Effect size display is controlled by parameters with prefix ef.size
. Parameters have sensible default values, so for example, if you display bar charts (bar = TRUE
), violin plots and the central tendency indicator will be turned off. Obviously, you are able to override the defaults if you wish. Refer to help for the DurgaPlot
function for full details on all arguments.
# Show two plots in the same figure, and increase margins so they don't look crowded par(mfrow = c(1, 2), mar = c(5, 5, 4, 5) + 0.1) DurgaPlot(d, # Display box plots box = TRUE, # Don't display data points points = FALSE, # Don't display paired-value lines paired = FALSE, main = "Box plot") par(mar = c(5, 5, 4, 1) + 0.1) DurgaPlot(d, # Display bar charts bar = TRUE, # Position effect size below ef.size.position = "below", # No paired-value lines paired = FALSE, main = "Bar chart")
When plotted, the default display order for groups is their natural order, i.e. alphabetical order for character data, and numeric order for factors. You can, however, specify the display order with the groups
parameter to either DurgaDiff
or DurgaPlot
. Not all groups in the data need be included in the analysis or plot; missing groups will be excluded from the analysis and/or plot. To assign display labels to the groups, use a named vector to list the groups. The names will be used to label the groups on the plot.
In the damselfly data set, the groups are juvenile
and adult
. The example below demonstrates how to change the displayed group names as well as the display order of the groups.
# No title on this plot, so reduce the top margin size par(mar = c(5, 4, 1, 1) + 0.1) # Immature males are yellow, mature are red. Display juveniles first, and label # with colour rather than developmental stage d <- DurgaDiff(mass ~ maturity, damselfly, na.rm = TRUE, groups = c("Yellow" = "juvenile", "Red" = "adult")) DurgaPlot(d, # Custom y-axis label left.ylab = "Body mass (mg)", # Specify the colours to (approximately) match the data group.colour = c("orange", "red"), box = TRUE, box.fill = FALSE)
If you need greater control over the display of the group labels, for example to display in italics or bold, set the group.names
field in the DurgaDiff object. It must be a vector of characters or expressions, one for each group.
# No title on this plot, so reduce the top margin size par(mar = c(5, 4, 1, 1) + 0.1) # Immature males are yellow, mature are red. Display juveniles first, and label # with colour rather than developmental stage d <- DurgaDiff(mass ~ maturity, damselfly, na.rm = TRUE, groups = c("Juveniles" = "juvenile", "Adults" = "adult")) d$group.names <- c(expression(italic("Juveniles")), expression(bold("Adults"))) DurgaPlot(d, # Custom y-axis label left.ylab = "Body mass (mg)", # Specify the colours to (approximately) match the data group.colour = c("orange", "red"), box = TRUE, box.fill = FALSE)
Colours can be specified for each group (group.colour
; demonstrated above), or separately for individual components. The master argument for each component type controls whether the component is displayed and optionally its colour. So, to display blue violins, specify violin = "blue"
. To use the default colour, specify violin = TRUE
. Colours can be specified as the name of an RColorBrewer palette (e.g. group.colour = "Dark2"
), or as one or more R colours.
Colours for data points receive special treatment; points can be coloured individually or by group. If there are as many colours as data points, the points are coloured individually. Otherwise, colours are applied to groups (excess colours are ignored). The default point symbol (pch
) is 19.
The convenience function DurgaTransparent
adds transparency to a colour, so DurgaTransparent("blue", 0.8)
is a nearly-transparent blue colour.
par(mfrow = c(1, 2), mar = c(11, 4, 3, 6) + 0.1) # Only show 2 groups even though there are 3 in the data d <- DurgaDiff(iris, data.col = "Sepal.Length", group.col = "Species", groups = c("versicolor", "setosa")) DurgaPlot(d, # Custom colours for points - the points are partially transparent points = c(DurgaTransparent("coral", .6), DurgaTransparent("darkturquoise", .8)), # Different colurs for violin (border) and violin fill violin = c("red", "blue"), violin.fill = c("#f8a8c840", "#a8c8f840"), main = "Data subset; Two groups") par(mar = c(5, 4, 3, 1) + 0.1) d <- DurgaDiff(iris, data.col = "Sepal.Length", group.col = "Species") # Define colours for the 3 species bgCols <- c(DurgaTransparent("red", 0.5), DurgaTransparent("blue", 0.5), DurgaTransparent("green", 0.5)) DurgaPlot(d, # Make all boxes the same colour box = "grey50", box.fill = "grey90", # Black for border/outline colour of points points = "black", # Additional data point parameters: small squares filled with colour for species points.params = list(pch = 22, cex = 0.8, bg = bgCols[iris$Species]), main = "Coloured point fill")
The central tendency indicator displays either the mean or median of each group (controlled by the central.tendency.type
argument), and is shown by default when neither boxes nor bars are shown.
Error bars on each group may show: 95% confidence intervals of the mean; standard deviation; or standard error (specified by setting the error.bars.type
parameter), and may be plotted regardless of what other components are shown.
par(mfrow = c(1, 2)) d <- DurgaDiff(iris, data.col = "Sepal.Length", group.col = "Species") # Only show error bars (95% CI) and central tendency DurgaPlot(d, error.bars.type = "CI", # Custom colour for central tendency central.tendency = rainbow(3), points = FALSE, violin = FALSE, main = "Mean and 95% CI") par(mar = c(5, 2.4, 4, 2) + 0.1) DurgaPlot(d, bar = TRUE, error.bars.type = "SD", points = FALSE, left.ylab = "", ef.size.label = "", main = "Bar chart with std. deviation")
The default behaviour of DurgaDiff
is to calculate differences between all pairs of groups; each difference is a contrast and has an estimated effect size with confidence interval. When plotted, each of multiple effect sizes is positioned below the first group in the contrast. This can cause layout problems when one group participates in multiple contrasts. For example, the iris dataset (available within standard R) contains three species: setosa
, versicolor
and virginica
, hence three possible contrasts: virginica - versicolor
, virginica - setosa
and versicolor - setosa
. If all three effect sizes are plotted, two of them (virginica - versicolor
and virginica - setosa
) will be positioned at the same horizontal location, resulting in superimposed text and an illegible graph (panel a) below)!
There are several ways around this problem:
ef.size.dx
; exactly how to do this is left as an exercise for the reader); There is no single best solution to this problem. The appropriate method depends on your data and the information that you wish to communicate.
To avoid this problem in default plots, the default behaviour of DurgaPlot
is to assume that the first group is a control, and then show the differences between all subsequent groups and the control (method 1.). This behaviour prevents effect sizes from overwriting each other, but may exclude pertinent data from the plot (panel b) below).
To change the default contrast selection, specify the contrasts
argument to either DurgaDiff
or DurgaPlot
. Contrasts are specified as "<group> - <group>, <group> - <group>"
, so to apply method 2. to the iris
dataset, you could specify contrasts = c("virginica - setosa", "versicolor - setosa", "setosa - versicolor")
. This differs from the default behaviour by also plotting the contrast "setosa - versicolor"
. Note that if the contrast "versicolor - setosa"
were specified rather than "setosa - versicolor"
, the effect size would be shown at the same location as the "versicolor - setosa"
effect size, hence failing to solve the problem.
par(mfrow = c(1, 2)) di <- DurgaDiff(iris, "Sepal.Length", "Species") DurgaPlot(di, contrasts = "*", main = "Problematic effect sizes") mtext("a)", cex = 1.5, col = "brown", font = 2, adj = -0.3, line = 1.4) DurgaPlot(di, main = "Default effect sizes (subset)") mtext("b)", cex = 1.5, col = "brown", font = 2, adj = -0.3, line = 1.4)
When there are too many group comparisons for the effect sizes to be clearly displayed, you can choose to display confidence brackets above the plot by calling DurgaBrackets
after calling DurgaPlot
. Confidence brackets communicate less information than the standard effect size visualisation, but may be a reasonable compromise for many group comparisons.
You must manually ensure there is sufficient space for the brackets in the plot (see DurgaBrackets
for details). In the example below, to create space above the plot, we increase the top margin size and prevent the frame from being drawn around the plot (frame.plot = FALSE
).
The default symbology for the brackets are light grey if the confidence interval includes 0. If the confidence interval does not include zero, text is black by default, and the default bracket line colour is the colour of the group with the higher central tendency. Arguments to DurgaBrackets
can be used to override the defaults.
d <- DurgaDiff(petunia, 1, 2) # Use the top margin for brackets op <- par(mar = c(2, 4, 4, 1) + 0.1) # Custom group colours cols = rainbow(3) # Save return value from DurgaPlot p <- DurgaPlot(d, # Don't draw effect size because we will draw brackets ef.size = FALSE, # Don't draw frame because brackets will appear in the upper margin frame.plot = FALSE, # Custom group colours group.colour = cols) # Customise colours by drawing brackets with the colour of the taller group. # Get the taller group from each difference tallerIdx <- sapply(d$group.differences, function(diff) { # If the group difference > 0, the first group is taller ifelse(diff$t0 > 0, diff$groupIndices[1], diff$groupIndices[2]) }) # Draw the brackets DurgaBrackets(p, labels = "level CI", br.col = cols[tallerIdx])
Each graphical group component (points, violin plot, box plot, bar chart, central tendency indicator, error bar and axis label) is positioned or centred horizontally on x-axis locations from 1 to n, where n is the number of groups. Group horizontal locations can be shifted with the group.dx
parameter. For example, group.dx = 0.1
would shift all groups to the right by 1/10th of the distance between groups. Values are repeated as necessary, so group.dx = c(0.1, -0.1)
will move each pair of groups closer together.
group.dx
shifts all components equally, however, to provide fine-grained control, each component can be shifted independently. For example, to shift all violins left and all points right, specify violin.dx = -0.12, points.dx = 0.1
.
par(mfrow = c(1, 2), mar = c(3, 4, 3, 1) + 0.1) # Construct some data with multiple groups set.seed(1) # Ensure we always get the exact same data set multiGroupData <- data.frame(Measurement = unlist(lapply(c(10, 12, 8, 8.5, 9, 9.1), function(m) rnorm(40, m))), Group = rep(paste0("g", 1:6), each = 40), Id = rep(1:6, 40)) d <- DurgaDiff(multiGroupData, 1, 2) DurgaPlot(d, # Visually group each pair group.dx = c(0.18, -0.18), # Show box plots box = TRUE, # Make the boxes narrow box.params = list(boxwex = 0.4), # Don't display effect size or data points ef.size = FALSE, points = FALSE, xlim = c(0.8, 6.2), # Reduce white space on plot left and right main = "Grouped into pairs") DurgaPlot(d, # Shift violins left violin.dx = -0.12, # Shift points right points.dx = 0.1, # Make points small and jittered points.params = list(cex = 0.5), points.method = "jitter", # No central tendency indicator central.tendency = FALSE, ef.size = FALSE, main = "Component shifts")
group.dx
is implemented very simply as a default value for various other *.dx
arguments. That means that values such as violin.dx
are not added to group.dx
; rather they replace it. Under some circumstances, it can be more convenient if *.dx
values are added to group.dx
. You can achieve this in your own code by storing group.dx
in a variable, then adding relative offsets for other arguments.
# Example snippet, make *.dx arguments relative to group.dx group.dx <- c(0.75, 0) # Shift group 1 right, leave group 2 unchanged DurgaPlot(..., group.dx = group.dx, central.tendency.dx = group.dx + 0.1, # Shift mean & error bars right violin.dx = group.dx + c(-0.4, -0.5), # Shift violins left ef.size.dx = -0.5)
By default, effect sizes are "mean" (aka unstandardised), which means they are in the same units as the data. However, DurgaDiff
can estimate standardised effect sizes - either Cohen's d or Hedges' g - which allow comparisons of effect sizes across data in different units.
You can override the y-axis labels to aid interpretation of effect sizes. DurgaPlot
does not provide text labels for standardised effect sixes by default because effect size interpretation depends on context.
par(mar = c(5, 9, 3, 1) + 0.1) # Define effect size labels and their positions on the y-axis effectSizes <- c("No effect" = 0, "Large positive effect" = 0.8, "Huge positive effect" = 2) # Give the groups friendly names groups <- c("Self" = "self_fertilised", "Westerham" = "westerham_cross", "Inter" = "inter_cross") # Calculate bias-corrected Cohen's d, named Hedges' g, rather than unstandardised difference in means d <- DurgaDiff(petunia, 1, 2, effect.type = "hedges g", groups = groups) DurgaPlot(d, violin = FALSE, main = "Hedges' g, labelled effect size", points.method = "jitter", # Use our ticks and labels instead of default ef.size.ticks = effectSizes, # Horizontal tick labels ef.size.params = list(las = 1), # Don't label the effect size y-axis because it won't fit with the long tick labels ef.size.label = "")
Durga provides a fixed set of effect type statistics that can be used, however sometimes a different statistic is required. For example, we may wish to compare measurements of some organ or structure to know whether they have increased or decreased in size, and by how much, in which case we may decide to use log~2~ ratio as our statistic. log~2~ ratio has a straightforward interpretation: 0 means no difference, 1 means group 1 is double the size of group 2 ($2^1$), 2 means group 1 is four-times the size ($2^2$), -1 means group 2 is double the size of group 1 ($-2^1$), and so on.
```{R fig.height=4.5, fig.width=6}
log2Ratio <- function(x1, x2) log2(mean(x2) / mean(x1))
set.seed(1) data <- data.frame(species = rep(c("Sp1", "Sp2"), each = 10), volume = c(rnorm(10, mean = 5, sd = 2), rnorm(10, mean = 10, sd = 2)))
d <- DurgaDiff(data, "volume", "species", effect.type = log2Ratio)
d$group.differences[[1]]$label.plot <- expression("log"[2] ~ italic("Sp 2") * ":" * italic("Sp 1"))
DurgaPlot(d, ef.size.label = expression("log"[2]~"ratio"))
d$group.differences[[1]]$label.print <- "log2 Sp 2:Sp 1" d
Durga implements a variety of standardisers, but what if you wish to calculate an effect size that is not implemented by Durga? The following example shows one way to use the `cohens_d` function from the [effectsize](https://easystats.github.io/effectsize/index.html) R package with Durga. ```{R eval=FALSE} # Create a new function that calls effectsize::cohens_d and returns just the calculated statistic myCohensAv <- function(x1, x2) { # Note that we swap the arguments because Durga expects the calculation to be x2 - x1 cd <- effectsize::cohens_d(x2, x1, pooled_sd = FALSE) # Extract and return just the Cohen's d value cd$Cohens_d } # Pass it in to DurgaDiff as the effect.type d <- DurgaDiff(petunia, 1, 2, effect.type = myCohensAv) # Specify the effect type label DurgaPlot(d, ef.size.label = expression("Cohen's d"))
This example shows the use of custom group order and display names, contrasts between relevant pairs of groups rather than between all combinations of groups, and various custom plotting options.
par(mar = c(5, 4, 3, 1) + 0.1) # Define display order and display pretty group names groups = c("Control 1" = "g1", "Treatment 1" = "g2", "Control 2" = "g3", "Treatment 2" = "g4", "Control 3" = "g5", "Treatment 3" = "g6") d <- DurgaDiff(multiGroupData, 1, 2, 3, groups = groups, contrasts = "g2 - g1, g4 - g3, g6 - g5") # Reduce text size par(cex = 0.7) DurgaPlot(d, box = TRUE, # Group into pairs group.dx = c(0.15, -0.15), # Don't display points points = FALSE, # Adjust plot width to reduce left/right wasted space xlim = c(1, 6), # Don't display pair lines paired = FALSE, # Hide the boxplot median line (medcol = NA) and make the boxes narrow (boxwex = 0.4) box.params = list(medcol = NA, boxwex = 0.4), # Display mean and error bar central.tendency = "#8060b0a0", # Display and colour central tendency central.tendency.symbol = "segment", central.tendency.width = 0.07, error.bars = "#8060b0a0", # Same colour for error bars # Don't display the effect size violin ef.size.violin = FALSE, main = "Group offsets" )
Any unknown parameters to DurgaPlot
are passed on to the plot function, which allows some control over additional aspects of the plot. The labels underneath effect sizes can be controlled by passing a named vector for the DurgaPlot
contrasts
parameter.
# Add in some fake sex data to the insulin data set data <- cbind(insulin, Sex = sample(c("Male", "Female"), nrow(insulin), replace = TRUE)) # Thin the data so that individual symbols are visible. # Obviously these data manipulations are for plot demonstration purposes only data <- data[data$id %in% 1:15, ] d <- DurgaDiff(data, "sugar", "treatment", "id", groups = c("Before insulin" = "before", "After insulin" = "after"), na.rm = TRUE) par(mar = c(2, 4, 4, 1)) # Use different colours to show sex cols <- RColorBrewer::brewer.pal(3, "Set1") DurgaPlot(d, # Don't draw containing box frame.plot = FALSE, # Relabel the contrasts contrasts = c("Blood sugar change" = "after - before"), left.ylab = "Blood sugar level", # Customise the violins violin.shape = c("left", "right"), violin.dx = c(-0.055, 0.055), violin.width = 0.3, # Customise the points to display sex points = "black", points.params = list(bg = ifelse(data$Sex == "Female", cols[1], cols[2]), pch = ifelse(data$Sex == "Female", 21, 24)), # Customise the effect size ef.size.pch = 19, ef.size.violin = "blue", ef.size.violin.shape = "full", # Make mean lines match group colour ef.size.line.col = RColorBrewer::brewer.pal(3, "Set2")[2:1], ef.size.line.lty = 3, central.tendency = FALSE, main = "Effects of insulin on blood sugar") # Add a legend legend("topright", c("Female", "Male"), pch = c(21, 24), pt.bg = cols)
Below is an alternative representation for the insulin dataset, showing overlayed rain cloud plots. Positioning is achieved by manually adjusting the relative positions of the various components, using the appropriate *.dx
parameters.
d <- DurgaDiff(insulin, "sugar", "treatment", "id", groups = c("Before insulin" = "before", "After insulin" = "after"), na.rm = TRUE) # Change effect size tick label d$group.differences[[1]]$label.plot <- "Change in sugar" # Horizontal axis tick labels, move y-axis labels outwards to make space for the # horizontal ticks, and ensure the margins are large enough to show them par(las = 1, mgp = c(4, 1, 0), mar = c(3, 5.5, 4, 5.5) + 0.1) cols <- RColorBrewer::brewer.pal(3, "Set2") colsTr <- DurgaTransparent(cols, 0.7) DurgaPlot(d, paired = FALSE, # Move 2nd group to be at the same position as 1st group.dx = c(0.2, -0.8), # X-axis ticks are meaningless since both groups are at the same location x.axis = FALSE, # Adjust plot width and spacing xlim = c(0.7, 2.1), violin = TRUE, violin.shape = "right", violin.fill = colsTr, # No need to specify violin position because violin.dx defaults to group.dx # Bumpy violins violin.adj = 1, # Box plots are black outlines box = "black", box.dx = c(0.06, -0.895), box.fill = colsTr, # Don't show outliers box.outline = FALSE, # Thin boxes, solid thick lines box.params = list(boxwex = 0.06, lty = 1, lwd = 2, # normal line weight for median line medlwd = 2, # no end of whisker line staplecol = NA), # Points as solid colours, slightly smaller than default points = cols, points.params = list(pch = 16, cex = 0.8), points.method = "jitter", points.dx = c(-0.14, -1.14), points.spread = 0.09, # Move effect size left to near where 2nd group would have been ef.size.dx = -1.2, left.ylab = "Blood sugar", main = "Rain cloud with effect size" ) legend("topright", c("Before insulin", "After insulin"), col = cols, fill = colsTr, bty = "n")
The source code for Durga is available at github.com/KhanKawsar/EstimationPlot. We welcome suggestions for improvements or reports of problems. Let us know by logging an issue on GitHub.
The Durga GitHub page has additional information such as how to install the latest development version of Durga.
Cumming, G. (2012). Understanding the new statistics : effect sizes, confidence intervals, and meta-analysis (1st ed.). New York: Routledge.
Khan, M. K., & McLean, D. J. (2023). Durga: An R package for effect size estimation and visualisation. bioRxiv, 2023.2002.2006.526960. doi:10.1101/2023.02.06.526960
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.