library(dplyr)
library(magrittr)
library(purrr)
library(looplot)

Introduction

Nested loop plots are a data visualisation for tabular data, which contains or is dependent on several key parameters. A basic example is data coming from controlled experiments, in which
measurments are performed under pre-specified conditions. The results of such an experimental study are then stored along with the pre-specified conditions that gave rise to them (the key parameters). Often, such data is then presented in form of a table to make the performed measurements comparable across the variety of conditions. However, if there are many different experimental constraints, the resulting table easily becomes unintelligible. In such cases, a visual display can aid in the interpretation of the complex experimental data.

In statistical research, the equivalent of controlled experiments are simulation studies, which will be the focus of this vignette. @NestedLoop suggested the use of nested loop plots to facilitate transparent reporting of the results from such studies. In the following we briefly motivate statistical simulation studies, explain the basic layout of nested loop plots and demonstrate the use of the looplot package to create such visualisations.

Simulation studies in statistical research

Simulation studies in statistics or other methodological research fields are controlled experiments used to assess the properties of algorithms, gauge the performance of statistical or machine learning models, and gain insights into complex phenomena which are not readily understood analytically. While simulation studies by nature are simplifying the "true" underlying mechanisms of interest, they are useful because they allow the complete specification of the "ground truth" to which comparisons can be made and the experimental conditions are fully under control of the experimenter. Such studies typically consist of three major components:

All three together make up what we will call the simulation study design.

The data generating mechanism usually simulates (i.e. creates) data according to pre-defined parameters, e.g. the number of variables (i.e. columns of the data matrix) to be generated, their correlation structure or the number of observations (i.e. rows of the data matrix). These parameters will be called design parameters in the following, and the unique combinations of their values define specific simulation scenarios. Usually, the data generation involves some uncertainty or noise when creating the data, to mimic the uncertainty when data is obtained through measurements in the real world or sampled from a population. Therefore, each simulation scenario can be conducted repeatedly to remove the effect of sampling variability.

The methods or models to be evaluated in the study are then applied to the datasets generated by the data generating mechanism and the evaluation criteria are computed, e.g. some measure of deviation from the ground truth. These are subsequently averaged to obtain performance measures for each method for a given simulation scenario.

Nested loop plots

Nested loop plots are a data visualisation for tabular data which is dependent on several design parameters. Their visual appearance is similar to trellis plots. @NestedLoop suggested nested loop plots for displaying the results accross the different scenarios of a simulation study. The key idea is to display the design parameters along with the results for each specific scenario they define and to arrange the display in a meaningful way to facilitate the clear distinction of patterns within the results. As suggested in the original publication, design parameters are drawn as step functions below the results. This is especially useful when the experimenter is conducting the simulation study and is looking for interesting ways to summarize the results. The plots also provide a good high level overview of the study suitable for a publication.

A challenge when using nested loop plots is their complexity and information density. For these reasons it is also often necessary to use large figure sizes to accurately display all of the details appropriately. It is not straightforward to design a good looking nested loop plot with clear layout and easily readable display of the data. Manual tweaking of the display is often necessary. That is the reason why this package exists to streamline the creation of nested loop plots and making them more accessible.

In short, nested loop plots work well for many possible simulation study designs, in particular if:

Workflow with the looplot package

This package is based on ggplot2 and offers several functions which aid in the process of creating nested loop plots with facetting.

The starting point for the looplot package is a dataframe comprising the results for a single performance measure from the simulation study, very similar to how the results could be presented in a table. In fact, one could think of nested loop plots as a more intuitive, visual way to present such tabular data. The dataframe should contain:

We will give an example of such a dataframe in the examples below.

The user then defines the arrangement of the data within the nested loop plot and how the data should be displayed. There are two ways to work with the looplot package: one with basic options for users who are less familiar with the underlying ggplot2 framework, and the other for users who require modular access to specific parts of the plot to fine-tune the display. In either case, the resulting plot object is a ggplot2 object and can therefore be edited using the functions from that package. Furthermore, many of the options have similar names and behaviour as comparable options in ggplot2, so some familiarity with the concepts is helpful to make the most out of this package. We will give an overview of many of the available options by demonstrating several aspects of the looplot package in the following examples.

Basic data example {#data}

This vignette makes use of the tidyverse environment of packages and specifically requires the dplyr, purr and magrittr packages, besides the looplot package to be available. Please refer to the R environment used to create this vignette for detailed information.

We generate an artificial example dataframe representing the output of a potential, actual simulation study which features a fully factorial design with 4 design parameters (samplesize, param1, param2, param3). For each combination of the values of the parameters we generate some artificial data that depends on these values. Normally, these would correspond to the perfomance measures of the methods which are evaluated in the study (method1, method2, method3). The resulting input dataframe for the nested loop plots is shown below. It encodes in each line the simulation design parameters for one specific simulation scenario, as well as the (summarised) results for each of the methods of interest. Note that the design parameters do not need to be numeric, but can be ordinal or even categorical.

set.seed(14)
params = list(
    samplesize = c(100, 200, 500),
    param1 = c(1, 2), 
    param2 = c(1, 2, 3), 
    param3 = c(1, 2, 3, 4)
)

design = expand.grid(params)

# add some "results"
design %<>% 
    mutate(method1 = rnorm(n = n(),
                           mean = param1 * (param2 * param3 + 1000 / samplesize), 
                           sd = 2), 
           method2 = rnorm(n = n(),
                           mean = param1 * (param2 + param3 + 2000 / samplesize), 
                           sd = 2), 
           method3 = rnorm(n = n(),
                           mean = param1 * (param2 + param3 + 3000 / samplesize), 
                           sd = 2))

knitr::kable(head(design, n = 10))

Layout of the nested loop plot {#layout}

The key elements of the original nested loop plot as suggested in @NestedLoop are the display of the results of each individual method and simulation scenario, as well as the display of the design parameters by virtue of plotting step functions representing the values of the parameters. In the looplot package a lot of emphasis is given to the aspect of facetting, to further divide the plot into small panels and make the plot visually less cluttered. Furthermore, facetting also allows to easily zoom into specific design parameter combinations and only drawing panels for specific parameter combinations. Each panel then stays the same, so the results can still be easily compared accross different panel configurations. Furthermore this package uses the idea of "smallest plottable units" (spu), which are defined as all results for simulation scenarios for which only a single design parameter changes (drawn on the x-axis of the plot), while all other design parameters remain at a fixed value. By separating such spus from each other, meaningful subgroups of the results can be visually identified and represented.

When designing a layout for a nested loop plot with facets it is often helpful to think of the context of the simulation study and the intended meaning of the simulation parameters. Do some of them have a natural ordering? Are some of them encoding situations with largely different results? Which of the parameters should define the facet grid and which are merely displayed in each panel as step function?

In our artificial dataset we have the parameter samplesize which has a natural order and is well suited to define an x-axis along which results can be arranged.

The design is heavily influenced by param1, which only has two categories and therefore ideally defines either the columns or the rows of the facetted plot.

The role of the other parameters is less clear and their influence on the display of the results can be easily changed in the looplot functions to interactively get a good understanding of the study results.

Basic plot examples

The basic plot interface of the package is a single function with a lot of parameters. This might be very intimidating, so this vignette goes through a lot of them step by step.

The most important settings are to define the x-axis of the plot, the facet grid layout and adapt the basic parameters for plotting the parameter step functions to ensure a proper display. This is hard to automate, so manual tweaking is required to get a nice visualisation. Usually this can be achieved with the following parameters:

Further details of the plot which often need to be taken care of:

Many of the plot object manipulations could also be achieved by directly manipulating the resulting plot object and are incorporated into the nested_loop_plot function mainly for convenience or users with little familiarity of the ggplot2 framework.

p = nested_loop_plot(resdf = design %>% rename(`beta[LY]` = param1), 
                     x = "samplesize", steps = "param3",
                     grid_rows = "param2", grid_cols = "beta[LY]", 
                     steps_y_base = -10, steps_y_height = 5, 
                     x_name = "Samplesize", y_name = "Error",
                     spu_x_shift = 75,
                     steps_values_annotate = TRUE, steps_annotation_size = 2.5, 
                     hline_intercept = 0, 
                     y_expand_add = c(10, NULL), 
                     post_processing = list(
                         add_custom_theme = list(
                             axis.text.x = element_text(angle = -90, 
                                                        vjust = 0.5, 
                                                        size = 8) 
                         )
                     ))
print(p)

Once presented with such a plot, several patterns clearly emerge from the picture:

While this is only an artificial example, detecting such patterns is one of the goals of a well designed simulation study and could benefit from display with nested loop plots.

Changing the layout

The layout of the plots can be easily changed as shown in the following examples. Most of these require manual tweaking of the parameter step functions.

Only rows

p = nested_loop_plot(resdf = design, 
                     x = "samplesize", steps = c("param2", "param3"),
                     grid_rows = "param1", 
                     steps_y_base = -10, steps_y_height = 3, steps_y_shift = 3,
                     x_name = "Samplesize", y_name = "Error",
                     spu_x_shift = 75,
                     steps_values_annotate = TRUE, steps_annotation_size = 2.5, 
                     hline_intercept = 0, 
                     y_expand_add = c(10, NULL), 
                     post_processing = list(
                         add_custom_theme = list(
                             axis.text.x = element_text(angle = -90, 
                                                        vjust = 0.5, 
                                                        size = 8) 
                         )
                     ))
print(p)

Only columns

p = nested_loop_plot(resdf = design, 
                     x = "samplesize", steps = c("param2", "param3"),
                     grid_cols = "param1", 
                     steps_y_base = -5, steps_y_height = 1, steps_y_shift = 3,
                     x_name = "Samplesize", y_name = "Error",
                     spu_x_shift = 75,
                     steps_values_annotate = TRUE, steps_annotation_size = 2.5, 
                     hline_intercept = 0, 
                     y_expand_add = c(10, NULL), 
                     post_processing = list(
                         add_custom_theme = list(
                             axis.text.x = element_text(angle = -90, 
                                                        vjust = 0.5, 
                                                        size = 8) 
                         )
                     ))
print(p)

"Classic" nested loop plots

These do not use facetting but only draw the design parameters as step functions. Furthermore, in the original publication of @NestedLoop, the authors use step functions to draw the results as well.

The advantage of not using facetting is that the steps are drawn only once, wasting less screen space to repeated information. A disadvantage is the lack of visual distinction between the blocks of the data, possibly making it harder to spot patterns.

p = nested_loop_plot(resdf = design, 
                     x = "samplesize", steps = c("param1", "param2", "param3"),
                     steps_y_base = -5, steps_y_height = 1, steps_y_shift = 3,
                     x_name = "Samplesize", y_name = "Error",
                     spu_x_shift = 200,
                     steps_values_annotate = TRUE, steps_annotation_size = 2.5, 
                     hline_intercept = 0, 
                     y_expand_add = c(10, NULL), 
                     post_processing = list(
                         add_custom_theme = list(
                             axis.text.x = element_text(angle = -90, 
                                                        vjust = 0.5, 
                                                        size = 5) 
                         )
                     ))
print(p)

Zooming

By subsetting the data, one can easily "zoom" into specific panels of the nested loop plot. We use dplyr::filter to subset the data in the following example.

design_subset = design %>% 
    filter(param1 == 1)
p = nested_loop_plot(resdf = design_subset, 
                     x = "samplesize", steps = c("param2", "param3"),
                     grid_rows = "param1", 
                     steps_y_base = -3, steps_y_height = 1, steps_y_shift = 1,
                     x_name = "Samplesize", y_name = "Error",
                     spu_x_shift = 75,
                     steps_values_annotate = TRUE, steps_annotation_size = 2.5, 
                     hline_intercept = 0, 
                     y_expand_add = c(5, NULL), 
                     post_processing = list(
                         add_custom_theme = list(
                             axis.text.x = element_text(angle = -90, 
                                                        vjust = 0.5, 
                                                        size = 8) 
                         )
                     ))
print(p)

Gallery and advanced functionality

Further options and advanced usage of this package are demonstrated in the vignette "Annotated gallery of examples".

R session information {#rsession}

sessionInfo()

References



matherealize/looplot documentation built on Jan. 14, 2024, 2:07 a.m.