nested_loop_plot: Facetted nested loop plots

View source: R/nested_loop.R

nested_loop_plotR Documentation

Facetted nested loop plots

Description

Basic interface for generating nested loop plots, visualisations for (factorial) controlled experiments.

Usage

nested_loop_plot(
  resdf,
  x,
  grid_rows = NULL,
  grid_cols = NULL,
  steps = NULL,
  steps_add = NULL,
  methods = NULL,
  pass_through = NULL,
  trans = identity,
  design_parameters_values = NULL,
  design_type = "full",
  parameter_decreasing = FALSE,
  spu_x_shift = 1,
  grid_scales = "fixed",
  grid_labeller = label_both_custom,
  replace_labels = NULL,
  y_name = waiver(),
  x_name = waiver(),
  steps_names = NULL,
  legend_name = "Method",
  legend_breaks = waiver(),
  legend_labels = NULL,
  connect_spus = FALSE,
  sizes = 1,
  point_shapes = 19,
  point_size = NULL,
  point_alpha = 1,
  line_linetypes = 1,
  line_size = 0.5,
  line_alpha = 1,
  colors = NULL,
  draw = c("add_points", "add_lines"),
  x_labels = waiver(),
  ylim = NULL,
  y_expand_mult = NULL,
  y_expand_add = NULL,
  y_breaks = waiver(),
  y_labels = waiver(),
  steps_draw = TRUE,
  steps_y_base = 0,
  steps_y_height = 1,
  steps_y_shift = NULL,
  steps_names_annotate = TRUE,
  steps_values_annotate = FALSE,
  steps_color = "#AAAAAA",
  steps_annotation_size = 5,
  steps_annotation_nudge = 0.2,
  steps_annotation_color = "#AAAAAA",
  na_rm = TRUE,
  base_size = 12,
  hline_intercept = NULL,
  hline_linetype = 3,
  hline_size = 0.5,
  hline_colour = "black",
  hline_alpha = 1,
  post_processing = NULL,
  return_data = FALSE
)

Arguments

resdf

Data.frame with data to be visualised in wide format with columns: param1 param2 ... paramN measurement1 measurement2 ... measurementM. param1 to paramN represent the design parameters, measurement1 to measurementM the measured / summarised results for M different models / methods for the given parameters. Design parameters are mostly treated as factors and can thus be ordered by the user by specifying factor levels. The only exception is the parameter which is used for the x-axis - it is treated as a continuous variable.

x

Name of column in resdf which defines the x-axis of the plot. Converted to numeric values for x-axis via as.numeric.

grid_rows, grid_cols

NULL or names of columns in resdf which define the facetting rows and columns of the plot. Correspond to rows and cols argument in facet_grid. Either or both of these can be NULL - then only rows, columns or no facetting at all are done.

steps

NULL or character vector with names of columns in resdf which define further parameter configurations and which define smallest plottable units (see Details below).

steps_add

Character vector with names of columns in resdf which should be added to the plot as steps, but do not represent parameters. These are just added for information and do not influence the data display. Example: show separation rate (reasonably rounded) for given parameter specifications.

methods

NULL or character vector with names of columns in resdf which contain results from the experimental study and should be drawn in the nested loop plot. Default NULL means that all columns not mentioned in x, grid_rows, grid_cols, steps and steps_add are used. Allows to subset to only draw methods of interest.

pass_through

NULL or character vector with names of columns in resdf which will be passed to post-processing, without otherwise affecting the plot. Useful to add e.g. panel specific decorations (see corresponding section in the Gallery vignette of this package).

trans

Function name or object, to be called via do.call to transform the plotted values.

design_parameters_values

NULL or Named list of vectors. Each entry in the list represents one of the loop variables (x, grid_rows, grid_cols, steps) in resdf. The passed values here override the default, observed design parameters (i.e. the unique values of the corresponding variable in resdf). This allows to e.g. deal with missing data. Usually not necessary.

design_type

Either "full" or "partial". If "full", then resdf is completed to a full design, where possibly missing entries (because a specific parameter combination does not have data) are set to NA. Steps, axes etc. are then drawn as if the data was available. Useful to show explicitly which scenarios have not been done in the case if the design is almost full. If "partial" then parameter configurations without data are dropped from the plot and now shown.

parameter_decreasing

Logical - if TRUE, design parameters are sorted to be decreasing (in terms of factor levels).

spu_x_shift

Distance between two contigous data spus. Given in units of the x-axis.

grid_scales

Analogous to scales argument of facet_grid. For some usage hints, see Details below.

grid_labeller

Labeller function to format strip labels of the facet grid. By default uses a custom version of ggplot2::label_both, but can be customized according to labeller.

replace_labels

NULL or named list of character vectors which facilitates renaming of design parameter values. The names correspond to names of design parameters as specified by resdf. Each entry is a vector of the form c("value_name_as_character" = "replacement_value_name").

y_name

Character which is used as y-axis label.

x_name

Character which is used as x-axis label.

steps_names

NULL or character value of same length as steps. Specifies names of the design parameters which are used for steps.

legend_name

String which is used as legend name.

legend_breaks

A character vector of breaks for the scales. Can be used to e.g. exclude certain methods from the legend. If NULL, then no breaks are displayed in the legend. Otherwise, must be the same length as legend_labels, if one of them is changed from the default.

legend_labels

NULL or character vector which is used as keys in legend. Overrides variable columns names in resdf. Must be the same length as legend_breaks, if one of them is changed from the default.

connect_spus

Logical - if TRUE, individual spus are connected by lines, this is necessary to reproduce original nested loop plots as suggested in the manuscript by Ruecker and Schwarzer (2014). The default FALSE means not to connect indidivual spus which often makes it easier to spot patterns in the results.

sizes

Single numeric or numeric vector, specifies custom sizes for lines and points added to the plot. Cycled as necessary. Note that this scale affects both points and lines due to the implementation in the underlying ggplot2 package. See details for useage pointers.

point_shapes, point_size, point_alpha

Point drawing parameters. point_shapes is a vector of shape specifications of length equal to the number of measurement columns (M) in resdf (cycled to appropriate length, if necessary). The other drawing parameters are single numeric values. point_size may be set to NULL to make it scale with the methods (defined by sizes).

line_linetypes, line_alpha, line_size

Line or step drawing parameters. line_linetypes is a vector of linetype specifications of length equal to the number of measurement columns (M) of resdf (cycled to appropriate length, if necessary). The other drawing parameters are single numeric values. line_size may be set to NULL to make it scale with the methods (defined by sizes)..

colors

NULL or vector of color specification of length equal to the number of measurement columns (M) in resdf. If NULL, the viridis color scale is used (see viridis).

draw

Character vector, which contains a combination of "add_points", "add_lines" or "add_steps", which are all wrapper for ggplot2 geoms. Defines which geometry is used to draw connected data. The default is to represent results by drawing points and lines. Original nested loop plots use "add_steps" only.

x_labels

If set to NULL, no labels are drawn on x-axis.

ylim

Vector of length 2 with limits of y-axis for the measurement data. Steps drawn (due to steps_draw TRUE) are not affected and will adapt to this setting automatically.

y_expand_mult

Vector of length 2. Used for adjustments to the display area similar to what expand_scale does. The lower limit of display will be expanded by a fraction of the plotting range as given by the first entry of the vector, the upper limit by a fraction according to the second entry. Useful to adjust the y-axis when steps for design parameters are drawn below the results.

y_expand_add

Vector of length 2. Used for adjustments to the display area similar to what expand_scale does. The lower limit of display will be changed by addition of the first entry, the upper limit by addition of the second entry. Specified in y-axis coordinates. Useful to adjust the y-axis when steps for design parameters are drawn below the results.

y_breaks

Vector with user specified breaks of the y-axis. Default is to use the breaks as suggested by ggplot2.

y_labels

Vector with user specified labels of the y-axis. Default is to use the labels as suggested by ggplot2.

steps_draw

Logical. Should design parameters as given in steps be drawn as step-functions? Y limits will adjust automatically but proper display may need manual tweaks using y_expand_mult and y_expand_add.

steps_y_base

Numeric. Maximum height of steps in y-axis units. I.e. if steps are increasing (due to parameter_decreasing == FALSE) this represents the y-axis value of the uppermost step, if parameter_decreasing == TRUE this is the y-axis value of the first step.

steps_y_height

Numeric. Height of a single step in y-axis units. If a single numeric, the same height is used for all steps. If a vector, then the step heights may vary for each layer (as defined by steps argument). Values are cycled to have appropriate length.

steps_y_shift

Numeric. Distance in y-axis units between step layers, i.e. distance between step drawn for different design parameters (if steps comprises more than one variable). As steps_y_height, this can be a vector to allow varying shift between layers. If NULL, an automated attempt is made to set the value to 0.25*steps_y_height, but this may need manual tweaking.

steps_names_annotate

Logical. Should steps drawn be annotated with names of corresponding design parameters?

steps_values_annotate

Logical. Should steps drawn be annotated with values of corresponding design parameters? Only the first occurence of a new value is annotated to avoid visual clutter.

steps_color

Color specification for steps drawn.

steps_annotation_size

Numeric. Size of annotations for steps. Likely needs tweaking.

steps_annotation_nudge

Numeric. Fine-tune position of steps annotations in y-axis units. Often, the annotation is overlayed with the lines of the steps - this argument simply increases the distance between annotations and step lines, similar to the nudge_y argument of geom_text.

steps_annotation_color

Color specification of the step annotation text.

na_rm

Logical. Should missing values be removed before plotting? This means that lines will be connected, even if a missing value is between two values. See details for some useage notes.

base_size

Numeric. base_size parameter of theme_bw.

hline_intercept

Intercept of a horizontal line which can be added to the plot (e.g. to mark a target value such as an error of 0). If NULL, no line is drawn.

hline_linetype, hline_size, hline_colour, hline_alpha

Aesthethic parameters for horizontal line, see geom_line.

post_processing

NULL or a named list of lists. Each entry should have as name a wrapper function exported from this package and as entry a named list of parameters which are passed to the wrapper function via do.call. Useful to adjust y-limts, add addtional lines or points to the plot or customize the theme of the plot.

return_data

Logical. Should the data necessary for drawing a plot be returned or the plot itself? Can be useful for debugging.

Details

The basic data for nested loop plots are tabular data matrices in which rows correspond to different experimental conditions, defined by a few key design parameters. The columns represent these design parameters as well as results from a measurements conducted with several methods. All of the measurements for all of the design parameters are then displayed in a single nested loop plot. Their layout is defined by x, grid_rows, grid_cols and all the design parameter step variables.

A crucial defintion for these plots is a SPU (smallest / single plottable unit). It is given by a subset of the measurements (rows), for which a fixed number of parameter varies (usually only 1, represented on the x-axis) and all others are fixed (i.e. fixed facet row, column and step values). Such SPUs are then treated as "contigous" and connected in the plot. The most intuitive use-case is if the x-axis represents samplesize - then a spu is all measurements for varying samplesize but fixed settings for all other parameters (i.e. fixed facet row, grid and steps). If a SPU has more than one varying parameter then the connect_spus argument may be used to define how they are plotted. IF the parameter is FALSE, then only results within a single individual spu are connected, but not between different spus. If TRUE, then all data within a facet is connected, effectively reproducing original nested loop plots as suggested by Ruecker and Schwarzer (2014).

The motivation for SPUs is visual readability of the plot - drawing a line through data that "belongs together" while separating it from data at other design parameters adds clarity and makes patterns clearly discernible in the plot.

This function works best with 4 to 6 design parameters - much more and the plots are likely to be unreadable due to information density. The visualisation works best with a fractional factorial design.

Value

If return_data is TRUE, then the outputs of nested_loop_base_data and nested_loop_paramsteps_data are combined and returned. If FALSE, then a ggplot2 plot object is returned.

Axis scaling

The axis scaling is not fully free. It has the restrictions of facet_grid and thus:

  • scales = free_x allows for m scales along the bottom, and 1 common scale for all rows.

  • scales = free_y allows for n scales along the side, and 1 common scale for all columns.

  • scales = free allows for n scales along the side, and m scales along the bottom.

Completely freeing both axes is currently not possible. Thus, the arrangement of variables may face some restrictions. The implementation of facetting uses the facet_grid_sc function from the facetscales package on Github.

Axis transformation

Axis transformations are implemented by transformations of the data using the trans argument. This is necessary because general transformations using the ggplot2 trans argument of axis scales can not easily deal with steps drawn for parameters. For details on how to work with axis transformations see the package vignettes.

Size scale

The size scale faces some restrictions due to the underlying ggplot2 package. It affects ALL elements added to the plot, i.e. points and lines can not be scaled independently (except when set to fixed values). To control which elements are affected, point_size and line_size can both be set to single numeric values to fix their size and make them constant accross all methods. If these arguments are set to NULL, then they will pick up on the overall size scale given by sizes.

Adding meta-information

The steps_add argument can be used to provide contextual information in the plot. For continous data, this could also be realized by additional measurement columns. Examples for usage include displaying the separation rate for a given parameter specifciation or labeling parameter specifications as "difficult scenario" / "easy scenario", etc. See the package vignettes for useage examples.

Missing data

In general the na_rm parameter can be used to deal with missing data. However, if a whole method is missing, then setting that parameter to TRUE will lead to unexpected results as a whole column is removed from the dataset. In such a case, the parameter should be set to FALSE or the method removed from the dataset by the user.

Useage example

Further details and usage examples may be found in the package vignettes.

References

Ruecker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol 2014; 14.

Examples

## Not run: 
params = list(
  samplesize = c(10, 50, 100, 200, 500),
  param1 = c(1, 2), 
  param2 = c(1, 2, 3), 
  param3 = c(1, 2, 3, 4)
  )
design = expand.grid(params)
design$method1 = rnorm(n = nrow(design),
                       mean = design$param1 * design$param2 * design$param3, 
                       sd = 5 / design$samplesize) 
design$method2 = rnorm(n = nrow(design),
                       mean = design$param1 + design$param2 + design$param3,
                       sd = 5 / design$samplesize)
nested_loop_plot(design, x = "samplesize", 
            grid_rows = "param1", grid_cols = "param2", steps = "param3")

## End(Not run)


matherealize/looplot documentation built on Jan. 14, 2024, 2:07 a.m.