plot_neat: Plots of Means and of Dispersion

View source: R/plot_neat.R

plot_neatR Documentation

Plots of Means and of Dispersion

Description

Primarily for line and bar plots for factorial designs. Otherwise (if no data_per_subject is given) descriptive dispersion plots (histogram, density, or box plots) for a continuous variable. (For the latter, only the parameters values, parts, part_colors, and binwidth are used, the rest are ignored.)

Usage

plot_neat(
  data_per_subject = NULL,
  values = NULL,
  within_ids = NULL,
  between_vars = NULL,
  factor_names = NULL,
  value_names = NULL,
  y_title = NULL,
  reverse = FALSE,
  panels = NULL,
  type = "line",
  dodge = NULL,
  bar_colors = "viridis",
  line_colors = "viridis",
  row_number = 1,
  method = mean,
  eb_method = neatStats::mean_ci,
  numerics = FALSE,
  hush = FALSE,
  parts = c("h", "d", "n", "b"),
  part_colors = NULL,
  binwidth = NULL
)

Arguments

data_per_subject

Data frame containing all values (measurements/observations for a factorial design) in a single row per each subject. Otherwise, if no data frame is given (default: NULL), histogram, density, or box plots will be returned for a continuous variable (numeric vector).

values

For plots of means (factorial designs): vector of strings; column name(s) in the data_per_subject data frame. Each column should contain a single dependent variable: thus, to plot repeated (within-subject) measurements, each specified column should contain one measurement. For descriptive dispersion plots (if data_per_subject is NULL), a numeric vector is expected.

within_ids

NULL (default), string, or named list. In case of no within-subject factors, leave as NULL. In case of a single within subject factor, a single string may be given to optionally provide custom name for the within-subject factor (note: this is a programming variable name, so it should not contain spaces, etc.); otherwise (if left NULL) this one within-subject factor will always just be named "within_factor". In case of multiple within-subject factors, each factor must be specified as a named list element, each with a vector of strings that distinguish the levels within that factors. The column names given as values should always contain one (and only one) of these strings within each within-subject factor, and thus they will be assigned the appropriate level. For example, values = 'rt_s1_neg, rt_s1_pos, rt_s2_neg, rt_s2_pos' could have within_ids = list( session = c('s1', 's2'), valence = c('pos', 'neg'). (Note: the strings for distinguishing must be unambiguous. E.g., for values apple_a and apple_b, do not set levels c('a','b'), because 'a' is also found in apple_b. In this case, you could choose levels c('_a','_b') to make sure the values are correctly distinguished.) See also Examples.

between_vars

NULL (default; in case of no between-subject factors) or vector of strings; column name(s) in the data_per_subject data frame. Each column should contain a single between-subject independent variable (representing between-subject factors).

factor_names

NULL or named vector. In a named vector, factor names (either within or between) can be given a different name for display, in a dictionary style, using original factor name as the name of a vector element, and the element's value (as string) for the new name. For example, to change a factor named "condition" to "High vs. low arousal", the vector may be given (in this case with a single element) as factor_names = c(condition = "High vs. low arousal").

value_names

NULL or named vector. Same as factor_names, but regarding the factor values. For example, to change values "high_a" and "low_a" to "High" and "Low" for display, the vector may be given as value_names = c(high_a = "High", low_a = "Low").

y_title

NULL (default) or string. Optionally given title for the y axis.

reverse

Logical (default: FALSE). If TRUE, reverses the default grouping of variables within the figure, or within each panel, in case of multiple panels. (The default grouping is decided automatically by given factor order, but always starting, when applicable, with within-subject factors: first factor is split to adjacent bars, or vertically aligned dots in case of line plot.)

panels

NULL or string. Optionally gives the factor name by which the plot is to be split into different panels, in case of three factors. (By default, the third given factor is used.)

type

Strong: "line" (default) or "bar". The former gives line plot, the latter gives bar plot.

dodge

Number. Specifies the amount by which the adjacent bars or dots 'dodge' each other (i.e., are displaced compared to each other). (Default is 0.1 for line plots, and 0.9 for bar plots.)

bar_colors

Vector of strings, specifying colors from which all colors for any number of differing adjacent bars are interpolated. (If the number of given colors equal the number of different bars, the precise colors will correspond to each bar.) The default 'viridis' gives a color gradient based on viridis. (In case of a single factor, the first given colors is taken.)

line_colors

Vector of strings, specifying colors from which all colors for any number of differing vertically aligned dots and corresponding lines are interpolated. The default 'viridis' gives a color gradient based on viridis. (In case of a single factor, the first given colors is taken.)

row_number

Number. In case of multiple panels, the number of rows in which the panels should be arranged. For example, with the default row_number = 1, all panels will be displayed in one vertically aligned row.

method

A function (default: mean) for the calculation of the main statistics (bar or dot heights).

eb_method

A function (default: mean_ci for 95 the calculation of the error bar size (as a single value used for both directions of the error bar). If set to NULL, no error bar is displayed.#'

numerics

If FALSE (default), returns ggplot object. If set to TRUE, returns only the numeric aggregated data per grouping factors, as specified by method and eb_method functions. If set to any string (e.g. "both"), returns the numeric aggregated data and at the same time draws the plot.

hush

Logical. If TRUE, prevents printing aggregated values.

parts

For dispersion plots only (if no data_per_subject is given). A vector of characters that specify which types of overlayed types to plot: "h" for histogram, "d" for density, "n" normally distributed density (using the mean and standard deviation of the given variable), "b" for boxplot. (All are included by default: parts = c("h", "d", "n", "b")).

part_colors

For dispersion plots only (if no data_per_subject is given). A named that can specify and thereby override default colors and alpha (transparency) of each plot type. Colors can be given by adding "c" to the plot type letter, e.g. c(hc = "blue") for blue histogram. Alpha can be given by adding "a" to the plot type letter, e.g. c(ha = 0) for completely transparent histogram. Any number may be given: e.g. a dark red transparent histogram with green boxplot would be part_colors = c(hc = "#cc0000", ha = 0.1, bc = "green").

binwidth

For dispersion plots only (if no data_per_subject is given). Binwidth for histograms. If NULL (default), Freedman–Diaconis rule is used if it produces at least 10 bins – otherwise 1bandwidth is calculated for 10 bins.

Value

By default, a ggplot plot object. (This object may be further modified or adjusted via regular ggplot methods.) If so set (numerics), aggregated values as specified by the methods.

Note

More than three factors is not allowed: it would make little sense and it would be difficult to clearly depict in a simple figure. (However, you can build an appropriate graph using ggplot directly; but you can also just divide the data to produce several three-factor plots, after which you can use e.g. ggpubr's ggarrange to easily collate the plots.)

See Also

anova_neat, mean_ci, se

Examples


# assign random data in a data frame for illustration
# (note that the 'subject' is only for illustration; since each row contains the
# data of a single subject, no additional subject id is needed)
dat_1 = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
    grouping1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
    grouping2 = c(1, 2, 1, 2, 2, 1,2, 1,2,1, 1, 1, 2, 1),
    value_1_a = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1,
                  31, 16.9, 40.1, 42.1, 41, 12.9),
    value_2_a = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5,
                  25.6,-37.1, 55.1,-38.5, 28.6,-34.1),
    value_1_b = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85,
                  132, 121, 151, 95),
    value_2_b = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53,
                  18.37, 0.3,-0.59, 12.53, 13.37, 2.3,-3),
    value_1_c = c(27.4, -17.6, -32.7, 0.4, 37.2, 1.7, 18.2, 8.9,
                  1.9, 0.4, 2.7, 14.2, 3.9, 4.9),
    value_2_c = c(7.7,-0.8, 2.2, 14.1, 22.1,-47.7,-4.8, 8.6,
                  6.2, 18.2,-6.8, 5.6, 7.2, 13.2)
)
head(dat_1) # see what we have

# plot for factors 'grouping1', 'grouping2'
plot_neat(
    data_per_subject = dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2')
)

# same as above, but with bars and renamed factors
plot_neat(
    data_per_subject = dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender')
)

# same, but with different (lighter) gray scale bars
plot_neat(
    dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender'),
    bar_colors = c('#555555', '#BBBBBB')
)

# same, but with red and blue bars
plot_neat(
    dat_1,
    values = 'value_1_a',
    between_vars = c('grouping1', 'grouping2'),
    type = 'bar',
    factor_names = c(grouping1 = 'experimental condition', grouping2 = 'gender'),
    bar_colors = c('red', 'blue') # equals c('#FF0000', '#0000FF')
)

# within-subject factor for 'value_1_a' vs. 'value_1_b' vs. 'value_1_c'
# (automatically named 'within_factor'), between-subject factor 'grouping1'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2')
)

# same, but panelled by 'within_factor'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor'
)

# same, but SE for error bars instead of (default) SD
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    eb_method = se
)

# same, but 95% CI for error bars instead of SE
# (arguably more meaningful than SEs)
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    eb_method = mean_ci
)

# same, but using medians and Median Absolute Deviations
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_1_b', 'value_1_c'),
    between_vars = c('grouping1', 'grouping2'),
    panels = 'within_factor',
    method = stats::median,
    eb_method = stats::mad
)

# within-subject factor 'number' for variables with number '1' vs. number '2'
# ('value_1_a' and 'value_1_b' vs. 'value_2_a' and 'value_2_b'), factor 'letter'
# for variables with final letter 'a' vs. final letter 'b' ('value_1_a' and
# 'value_2_a' vs. 'value_1_b' and 'value_2_b')
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    )
)

# same as above, but now including between-subject factor 'grouping2'
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2'
)

# same as above, but renaming factors and values for display
plot_neat(
    dat_1,
    values = c('value_1_a', 'value_2_a', 'value_1_b', 'value_2_b'),
    within_ids = list(
        letters = c('_a', '_b'),
        numbers =  c('_1', '_2')
    ),
    between_vars = 'grouping2',
    factor_names = c(numbers = 'session (first vs. second)'),
    value_names = c(
        '_1' = 'first',
        '_2' = 'second',
        '1' = 'group 1',
        '2' = 'group 2'
    )
)

# In real datasets, these could of course be more meaningful. For example, let's
# say participants rated the attractiveness of pictures with low or high levels
# of frightening and low or high levels of disgusting qualities. So there are
# four types of ratings:
# 'low disgusting, low frightening' pictures
# 'low disgusting, high frightening' pictures
# 'high disgusting, low frightening' pictures
# 'high disgusting, high frightening' pictures

# this could be meaningfully assigned e.g. as below
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    rating_fright_low_disgust_low = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    rating_fright_high_disgust_low = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5, 25.6,-37.1),
    rating_fright_low_disgust_high = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    rating_fright_high_disgust_high = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53, 18.37, 0.3,-0.59)
)
head(pic_ratings) # see what we have

# the same logic applies as for the examples above, but now the
# within-subject differences can be more meaningfully specified, e.g.
# 'disgust_low' vs. 'disgust_high' for levels of disgustingness, while
# 'fright_low' vs. 'fright_high' for levels of frighteningness
plot_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    )
)

# now let's say the ratings were done in two separate groups
pic_ratings = data.frame(
    subject = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    group_id = c(1, 2, 1, 2, 2, 1, 1, 1, 2, 1),
    rating_fright_low_disgust_low = c(36.2, 45.2, 41, 24.6, 30.5, 28.2, 40.9, 45.1, 31, 16.9),
    rating_fright_high_disgust_low = c(-14.1, 58.5,-25.5, 42.2,-13, 4.4, 55.5,-28.5, 25.6,-37.1),
    rating_fright_low_disgust_high = c(83, 71, 111, 70, 92, 75, 110, 111, 110, 85),
    rating_fright_high_disgust_high = c(8.024,-14.162, 3.1,-2.1,-1.5, 0.91, 11.53, 18.37, 0.3,-0.59)
)

# now include the 'group_id' factor in the plot
plot_neat(
    pic_ratings,
    values = c(
        'rating_fright_low_disgust_low',
        'rating_fright_high_disgust_low',
        'rating_fright_low_disgust_high',
        'rating_fright_high_disgust_high'
    ),
    within_ids = list(
        disgustingness = c('disgust_low', 'disgust_high'),
        frighteningness =  c('fright_low', 'fright_high')
    ),
    between_vars = 'group_id'
)


## DISPERSION PLOTS

plot_neat(values = rnorm(100))

# with smaller binwidth (hence more bins)
plot_neat(values = rnorm(100), binwidth = 0.2)

# without normal distribution line
plot_neat(values = rnorm(100), parts = c('h', 'd', 'b'))

# without histrogram
plot_neat(values = rnorm(100), parts = c('d', 'n', 'b'))

# blue density, fully opaque histogram
plot_neat(values = rnorm(100),
         part_colors = c(dc = 'blue', ha = 1))


neatStats documentation built on Dec. 8, 2022, 1:13 a.m.