nested_loop_plot | R Documentation |
Basic interface for generating nested loop plots, visualisations for (factorial) controlled experiments.
nested_loop_plot(
resdf,
x,
grid_rows = NULL,
grid_cols = NULL,
steps = NULL,
steps_add = NULL,
methods = NULL,
pass_through = NULL,
trans = identity,
design_parameters_values = NULL,
design_type = "full",
parameter_decreasing = FALSE,
spu_x_shift = 1,
grid_scales = "fixed",
grid_labeller = label_both_custom,
replace_labels = NULL,
y_name = waiver(),
x_name = waiver(),
steps_names = NULL,
legend_name = "Method",
legend_breaks = waiver(),
legend_labels = NULL,
connect_spus = FALSE,
sizes = 1,
point_shapes = 19,
point_size = NULL,
point_alpha = 1,
line_linetypes = 1,
line_size = 0.5,
line_alpha = 1,
colors = NULL,
draw = c("add_points", "add_lines"),
x_labels = waiver(),
ylim = NULL,
y_expand_mult = NULL,
y_expand_add = NULL,
y_breaks = waiver(),
y_labels = waiver(),
steps_draw = TRUE,
steps_y_base = 0,
steps_y_height = 1,
steps_y_shift = NULL,
steps_names_annotate = TRUE,
steps_values_annotate = FALSE,
steps_color = "#AAAAAA",
steps_annotation_size = 5,
steps_annotation_nudge = 0.2,
steps_annotation_color = "#AAAAAA",
na_rm = TRUE,
base_size = 12,
hline_intercept = NULL,
hline_linetype = 3,
hline_size = 0.5,
hline_colour = "black",
hline_alpha = 1,
post_processing = NULL,
return_data = FALSE
)
resdf |
Data.frame with data to be visualised in wide format with columns:
|
x |
Name of column in resdf which defines the x-axis of the plot. Converted to numeric values for x-axis via as.numeric. |
grid_rows , grid_cols |
NULL or names of columns in resdf which define the facetting rows and columns
of the plot. Correspond to rows and cols argument in
|
steps |
NULL or character vector with names of columns in resdf which define further parameter configurations and which define smallest plottable units (see Details below). |
steps_add |
Character vector with names of columns in resdf which should be added to the plot as steps, but do not represent parameters. These are just added for information and do not influence the data display. Example: show separation rate (reasonably rounded) for given parameter specifications. |
methods |
NULL or character vector with names of columns in resdf which contain
results from the experimental study and should be drawn in the nested
loop plot. Default NULL means
that all columns not mentioned in |
pass_through |
NULL or character vector with names of columns in resdf which will be passed to post-processing, without otherwise affecting the plot. Useful to add e.g. panel specific decorations (see corresponding section in the Gallery vignette of this package). |
trans |
Function name or object, to be called via |
design_parameters_values |
NULL or Named list of vectors. Each entry in the list represents one of the loop
variables ( |
design_type |
Either "full" or "partial". If "full", then resdf is completed to a full design, where possibly missing entries (because a specific parameter combination does not have data) are set to NA. Steps, axes etc. are then drawn as if the data was available. Useful to show explicitly which scenarios have not been done in the case if the design is almost full. If "partial" then parameter configurations without data are dropped from the plot and now shown. |
parameter_decreasing |
Logical - if TRUE, design parameters are sorted to be decreasing (in terms of factor levels). |
spu_x_shift |
Distance between two contigous data spus. Given in units of the x-axis. |
grid_scales |
Analogous to |
grid_labeller |
Labeller function to format strip labels of the facet grid. By default
uses a custom version of |
replace_labels |
NULL or named list of character vectors which facilitates renaming of design
parameter values. The names correspond to names of design parameters as
specified by resdf. Each entry is a vector of the form
|
y_name |
Character which is used as y-axis label. |
x_name |
Character which is used as x-axis label. |
steps_names |
NULL or character value of same length as |
legend_name |
String which is used as legend name. |
legend_breaks |
A character vector of breaks for the scales. Can be used to e.g. exclude
certain methods from the legend. If NULL, then no breaks are displayed in the
legend. Otherwise, must be the same length as |
legend_labels |
NULL or character vector which is used as keys in legend. Overrides variable
columns names in resdf. Must be the same length as |
connect_spus |
Logical - if TRUE, individual spus are connected by lines, this is necessary to reproduce original nested loop plots as suggested in the manuscript by Ruecker and Schwarzer (2014). The default FALSE means not to connect indidivual spus which often makes it easier to spot patterns in the results. |
sizes |
Single numeric or numeric vector, specifies custom sizes for lines and points added to the plot. Cycled as necessary. Note that this scale affects both points and lines due to the implementation in the underlying ggplot2 package. See details for useage pointers. |
point_shapes , point_size , point_alpha |
Point drawing parameters. |
line_linetypes , line_alpha , line_size |
Line or step drawing parameters. |
colors |
NULL or vector of color specification of length equal to the number of
measurement columns (M) in |
draw |
Character vector, which contains a combination of "add_points", "add_lines" or "add_steps", which are all wrapper for ggplot2 geoms. Defines which geometry is used to draw connected data. The default is to represent results by drawing points and lines. Original nested loop plots use "add_steps" only. |
x_labels |
If set to NULL, no labels are drawn on x-axis. |
ylim |
Vector of length 2 with limits of y-axis for the measurement data. Steps
drawn (due to |
y_expand_mult |
Vector of length 2. Used for adjustments to the display area similar
to what |
y_expand_add |
Vector of length 2. Used for adjustments to the display area similar
to what |
y_breaks |
Vector with user specified breaks of the y-axis. Default is to use the breaks as suggested by ggplot2. |
y_labels |
Vector with user specified labels of the y-axis. Default is to use the labels as suggested by ggplot2. |
steps_draw |
Logical. Should design parameters as given in |
steps_y_base |
Numeric. Maximum height of steps in y-axis units. I.e. if steps are
increasing (due to |
steps_y_height |
Numeric. Height of a single step in y-axis units. If a single numeric,
the same height is used for all steps. If a vector, then the step heights
may vary for each layer (as defined by |
steps_y_shift |
Numeric. Distance in y-axis units between step layers, i.e. distance
between step drawn for different design parameters (if |
steps_names_annotate |
Logical. Should steps drawn be annotated with names of corresponding design parameters? |
steps_values_annotate |
Logical. Should steps drawn be annotated with values of corresponding design parameters? Only the first occurence of a new value is annotated to avoid visual clutter. |
steps_color |
Color specification for steps drawn. |
steps_annotation_size |
Numeric. Size of annotations for steps. Likely needs tweaking. |
steps_annotation_nudge |
Numeric. Fine-tune position of steps annotations in y-axis units.
Often, the annotation is overlayed with the lines of the steps - this
argument simply increases the distance between annotations and step lines,
similar to the |
steps_annotation_color |
Color specification of the step annotation text. |
na_rm |
Logical. Should missing values be removed before plotting? This means that lines will be connected, even if a missing value is between two values. See details for some useage notes. |
base_size |
Numeric. base_size parameter of |
hline_intercept |
Intercept of a horizontal line which can be added to the plot (e.g. to mark a target value such as an error of 0). If NULL, no line is drawn. |
hline_linetype , hline_size , hline_colour , hline_alpha |
Aesthethic parameters for horizontal line, see |
post_processing |
NULL or a named list of lists. Each entry should have as name a wrapper
function exported from this package and as entry a named list of parameters
which are passed to the wrapper function via |
return_data |
Logical. Should the data necessary for drawing a plot be returned or the plot itself? Can be useful for debugging. |
The basic data for nested loop plots are tabular data matrices in which
rows correspond to different experimental conditions, defined by a few key
design parameters. The columns represent these design parameters as well
as results from a measurements conducted with several methods.
All of the measurements for all of the design parameters are then displayed
in a single nested loop plot. Their layout is defined by
x, grid_rows, grid_cols
and all the design parameter step
variables.
A crucial defintion for these plots is a SPU (smallest / single plottable
unit). It is given by a subset of the measurements (rows), for which a
fixed number of parameter varies (usually only 1, represented on the x-axis)
and all others are fixed (i.e. fixed facet row, column and step values).
Such SPUs are then treated as "contigous" and connected in the plot.
The most intuitive use-case is if the x-axis represents
samplesize - then a spu is all measurements for varying samplesize but
fixed settings for all other parameters (i.e. fixed facet row, grid and
steps).
If a SPU has more than one varying parameter then the connect_spus
argument may be used to define how they are plotted. IF the parameter is
FALSE, then only results within a single individual spu are connected, but
not between different spus. If TRUE, then all data within a facet is
connected, effectively reproducing original nested loop plots as suggested
by Ruecker and Schwarzer (2014).
The motivation for SPUs is visual readability of the plot - drawing a line through data that "belongs together" while separating it from data at other design parameters adds clarity and makes patterns clearly discernible in the plot.
This function works best with 4 to 6 design parameters - much more and the plots are likely to be unreadable due to information density. The visualisation works best with a fractional factorial design.
If return_data is TRUE, then the outputs of nested_loop_base_data
and
nested_loop_paramsteps_data
are combined and returned.
If FALSE, then a ggplot2 plot object is returned.
The axis scaling is not fully free. It has the restrictions of
facet_grid
and thus:
scales = free_x allows for m scales along the bottom, and 1 common scale for all rows.
scales = free_y allows for n scales along the side, and 1 common scale for all columns.
scales = free allows for n scales along the side, and m scales along the bottom.
Completely freeing both axes is currently not possible. Thus, the arrangement
of variables may face some restrictions. The implementation of facetting
uses the facet_grid_sc
function from the
facetscales package on Github.
Axis transformations are implemented by transformations of the data using
the trans
argument. This is necessary because general transformations
using the ggplot2 trans
argument of axis scales can not easily deal with
steps drawn for parameters. For details on how to work with axis
transformations see the package vignettes.
The size scale faces some restrictions due to the underlying ggplot2
package. It affects ALL elements added to the plot, i.e. points and lines can
not be scaled independently (except when set to fixed values).
To control which elements are affected, point_size
and line_size
can both be set to single numeric values to fix their size and make them
constant accross all methods.
If these arguments are set to NULL, then they will pick up on the overall
size scale given by sizes
.
The steps_add
argument can be used to provide contextual information
in the plot. For continous data, this could also be realized by additional
measurement columns. Examples for usage include displaying the separation
rate for a given parameter specifciation or labeling parameter
specifications as "difficult scenario" / "easy scenario", etc. See
the package vignettes for useage examples.
In general the na_rm
parameter can be used to deal with missing data.
However, if a whole method is missing, then setting that parameter to TRUE
will lead to unexpected results as a whole column is removed from the
dataset. In such a case, the parameter should be set to FALSE or the
method removed from the dataset by the user.
Further details and usage examples may be found in the package vignettes.
Ruecker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol 2014; 14.
## Not run:
params = list(
samplesize = c(10, 50, 100, 200, 500),
param1 = c(1, 2),
param2 = c(1, 2, 3),
param3 = c(1, 2, 3, 4)
)
design = expand.grid(params)
design$method1 = rnorm(n = nrow(design),
mean = design$param1 * design$param2 * design$param3,
sd = 5 / design$samplesize)
design$method2 = rnorm(n = nrow(design),
mean = design$param1 + design$param2 + design$param3,
sd = 5 / design$samplesize)
nested_loop_plot(design, x = "samplesize",
grid_rows = "param1", grid_cols = "param2", steps = "param3")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.