Chart: Charts for One or Two Categorical Variables

View source: R/Chart.R

ChartR Documentation

Charts for One or Two Categorical Variables

Description

lessR introduces the concept of a data view visualization function, in which the choice of visualization function directly reflects the structure of the data and the analyst's goal for understanding the data. The function Chart() visualizes the distribution of a categorical variable along with related statistics from aggregated data of a numerical variable, either counts or a statistic such as the mean of another numerical variable. Choose the type of visualization according to the value of the parameter type.

  • Bar chart with type = "bar", the default value

  • Radar chart with type = "radar"

  • Bubble chart with type = "bubble"

  • Pie chart with type = "pie"

  • Starburst chart with type = "pie" and another categorical variable(s) with by

  • Treemap chart with type = "treemap"

  • Icicle chart with type = "icicle"

Stratify, that is, divide the distribution into groups with each group plotted separately, with parameters by, which plots the groups on the same panel, or facet, which plots the groups on different panels (not applicable to bubble charts). With this conceptualization, a starburst chart is a pie chart with nested layers, For the hierarchical charts – pie and starburst, treemap, and icicle – the by stratification parameter can be a vector, defining multiple levels.

When using RStudio, plots are directed to the Plots window for the bar chart, and also the Viewer window for Plotly interactive plots. The remaining plots are all Plotly visualizations.

Unless by is a vector of at least length two, the chart is constructed from the one- or two-dimensional table that pairs each level or joint level of the categorical variables with the corresponding numerical value of y. Usually, this table is a summary (pivot) table calculated as a data aggregation from the original data table of measurements. A one-dimensional example is the average salary of the employees in each department. Corresponding two-dimensional example is the average salary of men and women in each department. Enter the original, raw data from which Chart calculates the summary table, or enter the summary table directly as the input data.

Chart also displays the foundational summary table, such as frequency table for one or two variables. If a frequency table, also displayed are Cramer's V association, and the corresponding chi-square inferential analysis. For two variables, the frequencies include the joint and marginal frequencies.

To activate Trellis graphics or facets, a multi-panel display, specify a facet variable in place of by for the second categorical variable.

For bar charts, if the provided object to analyze is a set of multiple variables, including the name of an entire data frame, then a bar chart is calculated for each non-numeric variable in the data frame. For the default bar chart, a standard bar chart is presented simultaneously with the interactive version because there are some features in the standard chart not yet available in the interactive version.

Usage

Chart(

        # ------------------------------------------
        # Data from which to construct the bar chart
        x=NULL, by=NULL, y=NULL, data=d, filter=NULL,

        # -----------------------------------
        # Chart type, defaults to a bar chart
        type=c("bar", "radar", "bubble", "pie", "icicle", "treemap"),
        hole=0.65,  # pie chart
        radius=0.35, power=0.5,  # bubble chart

        # --------------------------
        # Chart from aggregated data
        stat=c("mean", "sum", "sd", "deviation", "min", "median", "max"),
        stat_x=c("count", "proportion"),

        # --------------------------------------------------------------
        # Bar chart parameters: Facet plot, stratify on different panels
        facet=NULL, n_row=NULL, n_col=NULL, aspect="fill",
        
        # -----------------------------------------------------
        # Bar chart parameters: Layout and ordering of the bars
        horiz=FALSE, sort=c("0", "-", "+"),
        beside=FALSE, stack100=FALSE,
        gap=NULL, scale_y=NULL, one_plot=NULL,

        # ----------------------------------------------------------------
        # Analogy of physical Marks on paper to create the bars and labels
        theme=getOption("theme"),
        fill=NULL,
        color=getOption("bar_color_discrete"),
        transparency=getOption("trans_bar_fill"),
        fill_split=NULL, fill_scaled=FALSE, fill_chroma=75,

        labels=c("%", "input", "prop", "off"),
        labels_position=c("in", "out"),
        labels_color="white",
        labels_size=0.75,
        labels_decimals=NULL,
        labels_cut=NULL,

        # ------------------------------------------------------------------
        # Labels for axes, values, and legend if x and by variables, margins
        xlab=NULL, ylab=NULL, main=NULL, sub=NULL,
        lab_adjust=c(0,0), margin_adjust=c(0,0,0,0),
        pad_y_min=0, pad_y_max=0,
    
        rotate_x=getOption("rotate_x"), rotate_y=getOption("rotate_y"),
        break_x=NULL, offset=getOption("offset"),
        axis_fmt=c("K", ",", ".", ""), axis_x_pre="", axis_y_pre="",
        label_max=100,

        legend_title=NULL, legend_position="right_margin",
        legend_labels=NULL, legend_horiz=FALSE,
        legend_size=NULL, legend_abbrev=10, legend_adjust=0,

        # ----------------------------------------------------
        # Draw one or more objects, text, or geometric figures
        # Only applies to the standard bar chart
        add=NULL, x1=NULL, y1=NULL, x2=NULL, y2=NULL,

        # --------------------------------------------------------------------
        # Output: text or chart turned off, to PDF file, number decimal digits
        quiet=getOption("quiet"), do_plot=TRUE, 
        use_plotly=getOption("lessR.use_plotly"),
        pdf_file=NULL, width=6.5, height=6, 
        digits_d=NULL, out_size=80, 

        # --------------------------------------
        # Deprecated, removed in future versions
        n_cat=getOption("n_cat"), value_labels=NULL,
        rows=NULL, facet1=NULL,

        # -------------
        # Miscellaneous
        eval_df=NULL, fun_call=NULL, ...)

Arguments

x

Primary categorical variable to analyze. For bar charts, x can be a single variable (in a data frame or as a vector in the user's workspace), a vector of variables specified with c, or an entire data frame. If not specified, defaults to all non-numeric variables in the data frame given by data (or d by default). To improve label legibility, category labels are automatically wrapped: unless break_x = FALSE, spaces in labels are replaced with line breaks. To keep two short words on the same line, replace the intervening space with a tilde; the tilde is displayed as a blank in the axis label.

by

Optional second categorical variable for stratification. Creates a two-way display (e.g., stacked or grouped bars, nested pies, multi-variable bubbles), with subgroups shown within each level of x. The same stratification applies within panels when facet is used.

y

Numeric variable whose values determine bar heights, bubble sizes, or other aggregated measures across categories. If y is supplied for raw data, a summary statistic must be specified via stat. If y is omitted, counts (or proportions) are computed from the data and used as the default response.

data

Optional data frame that contains the variables of interest. May be raw data, from which summaries are computed, or a pre-aggregated summary table with one categorical column and one numeric column giving the heights/sizes of the plotted objects.

filter

Logical expression or vector of row indices that defines a subset of rows in data to analyze. Use logical operators such as \&, |, and ! and relational operators such as ==, !=, and >.

type

Chart family to produce. Default is "bar". Alternatives include "pie", "treemap", "bubble", and "radar". A hierarchical pie chart (with a by vector) is rendered as a sunburst chart.

hole

For pie and sunburst charts, proportion of the radius occupied by the inner hole (a doughnut chart). Set to 0 or FALSE for a full pie.

radius

For bubble charts, scaling factor for bubble radius (in pixels) controlling the size of the largest displayed bubble.

power

For bubble charts, controls the relative scaling of bubbles. The default 0.5 scales radii so that bubble areas are proportional to the underlying values. A value of 1 scales radii directly to the values, increasing visual differences in size.

stat

Summary statistic applied to y within groups defined by x and optional by. Typical values include "sum", "mean", "sd", "dev" (mean deviations), "min", "median", and "max". The resulting summary table (pivot table) defines the plotted heights or sizes.

stat_x

When y is not supplied, specifies whether to plot the "count" (default) of each group or its "proportion".

facet

Optional categorical variable that activates Trellis graphics (facets) using the lattice framework. A separate chart is drawn for each level of facet, in contrast to by, which overlays subgroups on the same panel.

n_row

Number of rows in the facet layout. If specified, n_col is determined automatically and should not be set simultaneously.

n_col

Number of columns in the facet layout. If specified, n_row is determined automatically and should not be set simultaneously. When n_col = 1, facet strips are placed to the left of panels instead of above.

aspect

Lattice aspect ratio for facet panels, defined as height/width. Default "fill" expands panels to occupy available space. Set to 1 for square panels or "xy" to bank lines to an effective slope of 45 degrees.

horiz

Orientation of bars in a bar chart. Defaults to FALSE (vertical bars) unless one_plot = TRUE, in which case horizontal bars are often more readable.

sort

Sorting strategy for bar categories. Default "0" retains the original order. Use "-" for descending and "+" for ascending order of frequencies (for one-way charts) or column sums (with by). Not applicable to facet plots. When one_plot = TRUE, the default is "+".

beside

For a two-way bar chart, if TRUE plots the levels of the second variable as adjacent bars (grouped bars) rather than stacked segments.

stack100

Produces a 100% stacked bar chart when a by variable is present, equivalent to setting stat_x = "proportion" with by.

gap

Controls the spacing between bars; passed to the space argument of barplot. Default is 0.2, except for two-variable plots with beside = TRUE, where the default is c(0.1, 1).

scale_y

Optional numeric vector of length three defining the y-axis (numeric axis) scale: minimum, maximum, and number of intervals. Applies to bar and similar charts.

one_plot

For multiple x variables, selects whether to draw a separate bar chart for each variable or combine all variables into a single multi-item chart. By default, if variables share a common response scale (e.g., Likert items), one_plot is set to TRUE; otherwise it defaults to FALSE.

theme

Color theme for this analysis. Use style to set persistent defaults across analyses.

fill

Fill color(s) for bars, pie slices, tiles, or bubbles. Default is the qualitative "hues" palette under the "colors" theme, or an ordered sequential palette (e.g., "blues") for ordinal categories. For other themes, default fill is taken from the corresponding gradient (e.g., "reds" for "darkred"). May also be any vector of colors (e.g., from getColors) or predefined palettes including color-blind–safe options such as "viridis". When fill is set to the name of y (or (count) for tabulated counts), values of y are mapped to a color scale. Not used when fill_split is active.

color

Border color of plotted objects (bars, slices, bubbles, tiles). May be a vector to vary borders by category. Default is bar_color_discrete from style.

transparency

Transparency of filled areas, from 0 (opaque) to 1 (fully transparent). Default is trans_bar_fill from style.

fill_split

For bar charts, splits bars into two fill colors relative to a numeric threshold. Bars with y <= fill_split are drawn in the first fill color; larger values use the second. Alternatively, supply a length-2 vector of colors.

fill_scaled

For bar charts without a by variable, scales the lightness of the fill color according to height (the value of y). Larger values yield darker bars. When fill is a single color, a sequential scale is generated; when fill is two colors, a diverging scale is used.

fill_chroma

Chroma (saturation) for fill_scaled bars. Full saturation is 100; lower values approach grayscale. Has no effect for the "gray" theme, which is already achromatic.

labels

Adds numeric labels to bars or pie slices. Default "%" displays percentages, "prop" shows proportions, and "input" shows the underlying numeric values (counts or supplied y). If y is omitted, the input values are the tabulated counts.

labels_position

Position of labels for pies/sunbursts. Default is "in" (inside slices); use "out" to place labels outside.

labels_color

Color(s) of the plotted labels. May be a vector; if fewer colors are given than categories, colors are recycled.

labels_size

Character expansion factor for label text. Default is 0.95, or 0.9 of that value when beside = TRUE and labels_position = "in" (to account for narrower bars).

labels_decimals

Number of decimal places displayed in labels. Defaults to 0 for integer-valued y and 2 for "prop".

labels_cut

Minimum relative size required to show a label. When labels_position = "out", the default is 0.028 for simple charts, and 0.040 when a by variable is present or multiple x variables are combined.

xlab

Axis label for the x-axis. If omitted, the label is taken from the variable label (if present) or the variable name. If xy_ticks = FALSE, no x-axis label is drawn. When no y is specified, xlab defaults to "Index" unless explicitly set.

ylab

Axis label for the y-axis. If omitted, the label is taken from the variable label (if present) or the variable name. If xy_ticks = FALSE, no y-axis label is drawn.

main

Title of the chart. Size and color may be controlled via main_cex and main_color in style.

sub

Subtitle placed below xlab. Not yet implemented.

lab_adjust

Two-element numeric vector (x-label, y-label) giving approximate inch offsets for axis labels. Positive values move labels away from the plotting region. Not applicable to facet (Trellis) plots.

margin_adjust

Four-element numeric vector (top, right, bottom, left) that adjusts plot margins in inches. Positive values expand the corresponding margin. Not applicable to facet plots.

pad_y_min

Proportion of padding added at the lower end of the y-axis (0–1).

pad_y_max

Proportion of padding added at the upper end of the y-axis (0–1).

rotate_x

For bar charts, rotation (in degrees) of category labels on the x-axis, typically used to accommodate long labels in combination with offset. When rotate_x = 90, labels are vertical and an alternative placement algorithm is used, so offset is usually unnecessary.

rotate_y

Applies to BPFM (bubble plot frequency matrix), a sequence of stacked bubble charts. Controls rotation of labels along the vertical axis.

break_x

For bar charts, controls automatic line-breaking of category labels. When TRUE, spaces are converted to new lines and tildes to blanks (keeping words joined by a tilde on the same line). Defaults to TRUE for vertical bars with rotate_x = 0, and FALSE otherwise.

offset

For bar charts, controls the spacing between axis labels and the axis itself. Default is 0.5. Larger values (e.g., 1.0) create additional room for rotated or long labels.

axis_fmt

Numeric format for axis labels. Default "K" shows thousands as "K" (e.g., 100000 as 100K). Alternatives include "," (comma separators with decimal point), "." (period separators), or "" to disable formatting.

axis_x_pre

Prefix for labels on the x-axis, such as "$".

axis_y_pre

Prefix for labels on the y-axis, such as "$".

label_max

For bar charts, improves console readability of text output by setting a target maximum label length. Longer labels are abbreviated in the printed frequency distribution. The limit is not strict when necessary to preserve uniqueness.

legend_title

Title of the legend. Usually set automatically from variable names, but must be supplied explicitly when plotting raw count matrices without variable metadata.

legend_position

Legend placement when plotting two variables. Default is in the right margin. Standard positions such as "topleft", "top", and "topright" are also available; see legend.

legend_labels

Legend labels when plotting two variables. Defaults to the levels of the second (or by) variable.

legend_horiz

If TRUE, draws the legend horizontally; default is vertical.

legend_size

Character expansion factor for legend text.

legend_abbrev

If specified, truncates legend title and labels to at most the given number of characters (subject to preserving uniqueness).

legend_adjust

Horizontal shift of the legend in two-way bar charts. Positive values move the legend to the right from its default position.

add

For bar charts, overlays additional objects (text or geometric figures) on the plot. The first argument "text" writes arbitrary text; geometric options include "rect", "line", "arrow", "v_line" (vertical line), and "h_line" (horizontal line). The value "means" is shorthand for vertical and horizontal lines at the respective means. Does not apply to facet plots. Use style parameters such as add_fill and add_color to control appearance.

x1

First x-coordinate (in standardized -1 to 1 units) for each added object.

y1

First y-coordinate for each added object.

x2

Second x-coordinate for each added object. Used for "rect", "line", and "arrow".

y2

Second y-coordinate for each added object. Used for "rect", "line", and "arrow".

quiet

If TRUE, suppresses text output to the console. The default can be changed via style.

do_plot

If TRUE (default), produces the chart. Set to FALSE to compute and return results without plotting.

use_plotly

If TRUE (default), produces a Plotly-based interactive chart in the RStudio Viewer window in addition to the static plot in the Plots window. Some advanced options apply only to the static chart.

pdf_file

If specified, directs graphics output to a PDF file with this name.

width

Width of the plot window (or PDF device) in inches. Default is 4.5.

height

Height of the plot window (or PDF device) in inches. Default is 4.5.

digits_d

Number of decimal digits used for displayed numeric summaries. Defaults to at least 2 or one more than the maximum number of digits in the response variable, whichever is larger.

out_size

Target maximum line width (in characters) for console frequency tables of a single variable. Longer lines trigger a vertical layout for improved readability.

n_cat

For analyses of all variables in a data frame, sets the maximum number of unique values for a numeric variable to be treated as categorical rather than continuous. Default is 0. Deprecated: It is preferable to convert such variables explicitly to factors.

value_labels

For factors, defaults to factor levels; for character variables, defaults to the character values. May be used to override axis labels on the x-axis. If the variable is a factor and value_labels is NULL, levels are used with embedded spaces replaced by line breaks. If x and y share the same scale, labels may also be used on the y-axis. Label size is controlled via axis_cex and axis_x_cex in style.

rows

Deprecated. Old name for filter.

facet1

Deprecated. Old parameter name, replaced by facet.

eval_df

Controls whether the function checks for existence of data and referenced variables. Defaults to TRUE, except when shiny is loaded, in which case it is set to FALSE so that Shiny applications run without conflict. Set to FALSE when using the pipe operator %>%.

fun_call

Function call object used internally (e.g., by knitr) to reconstruct the original call.

...

Additional graphical parameters passed to base barplot, legend, and par. Common options include cex.main (title size), col.main (title color), line types such as "dotted" or "dotdash", and subtitle options sub and col.sub. Axis label orientation can be adjusted with las = 3, and bar spacing with space in one-variable bar charts.

Details

OVERVIEW

Chart() visualizes numerical values associated with one or two categorical variables, each with a relatively small number of levels. By default, colors for bars, background, and grid lines are taken from the active style theme, but all can be customized. Base computations use standard R functions such as barplot, chisq.test, and, for two variables, legend. For horizontal bar charts (horiz = TRUE), category labels are drawn horizontally and the left margin is automatically extended to accommodate both the labels and the axis title.

DATA

Conceptually, the chart is built from a summary table in which each row consists of a level of the categorical variable x paired with a numerical value y, with as many rows as there are levels of x. You may:

  • supply x and y directly as a pre-aggregated summary table, or

  • supply x (and optionally y) at the observation level and let Chart() aggregate over the levels of x (and by) using stat.

A second categorical variable by can be used to form a two-way table.

The filter parameter subsets rows (cases) of the input data frame according to a logical expression or a set of integers that specify the row numbers to retain. Use the standard R logical operators described in Logic, such as \& (and), | (or), and ! (not), and the standard relational operators described in Comparison, such as == (equality), != (not equal), and > (greater than). Alternatively, specify a vector of integers that correspond to row numbers. See the Examples.

The input can be factors, numeric values, characters, or a matrix. You can:

  • enter raw data and let Chart() compute frequencies or summaries, or

  • enter a pre-tabulated summary table of counts or statistics.

When y is not supplied, the numerical values are simply the counts of each level of x (and of each combination of x and by).

TWO DATA MODES FOR PLOTLY OUTPUTS

Chart() supports two conceptual modes for aggregated values used in the plots and tables:

  • Count mode (default): if y is NULL, the chart uses counts of x (and by, when supplied).

  • Summary mode: if y is numeric, the chart aggregates y over the categories of x (and by) using stat.

From full data with repeated values of x (and by), you can reduce to a summary table using one of the following transformations:

Transformation Meaning
-------------- -------------------
"sum" sum
"mean" mean
"sd" standard deviation
"dev" mean deviation
"min" minimum
"median" median
"max" maximum
------------- -------------------

All numeric values (both in console tables and Plotly hovers) are formatted according to digits_d.

Before plotting, Chart() constructs a 1-D table (x) or 2-D table (by × x) of either counts (when y = NULL) or aggregated y (when y is supplied). For count mode, Chart() prints the frequency table and a chi-square test of equal probabilities. For summary mode, it prints the aggregated table (no chi-square test is computed for numeric summaries). These tables serve as a concise audit of the data supplied to the visualization.

NON-HIERARCHICAL PLOTLY CHARTS

For non-hierarchical charts (type = "bar", "radar", or "bubble"):

  • with y = NULL, the geometry encodes counts;

  • with y supplied, the geometry encodes the chosen stat of y per category (and per group when by is supplied).

In bubble charts, bubble size is proportional to the aggregated value. Use radius (in pixels) and power to control size mapping (area proportional to the value when power = 0.5).

HIERARCHICAL PLOTLY CHARTS

Hierarchical charts include pies with a by variable (sunburst charts) and type = "treemap" or "icicle". For these charts, Chart() constructs a path table from x and by, where by may be:

  • a single factor (one additional level), or

  • a data frame of multiple factors, where each column represents a deeper level in the hierarchy.

It then aggregates y (or counts, if y = NULL) along the path:

  • Node values are computed by applying stat within each node to its child records.

  • For additive statistics ("sum" and counts), parent node values equal the sum of their children. Children are sized proportionally to their parent (branchvalues = "total"), so hover percentages of parent/root are well defined.

  • For non-additive statistics ("mean", "median", "min", "max", "sd"), parent values are computed at the parent level using the same stat and are not sums of children. In this case, “% of parent/root” is not shown in hovers because these proportions are not meaningful for non-additive summaries.

All numeric values shown in hovers are formatted using digits_d. For hierarchical charts:

  • additive modes show the aggregated value and, when appropriate, the % of parent and % of root;

  • non-additive modes show the aggregated value only.

Titles for console output and interactive plots reflect the mode:

  • Count mode: e.g., “Count of x” (optionally “by by”).

  • Summary mode: e.g., “stat of y by x” (and “by by” when grouped).

  • Hierarchical: analogous titles using the same stat and variable names.

In all cases, Chart() preserves factor level order when building 1-D and 2-D tables and prints tables with informative dimnames. The console chi-square test is computed only for count mode (1-D or 2-D).

VECTOR OF x-VALUES

A vector of categorical x-variables (character or factor) generalizes to a matrix of one-dimensional plots, depending on the value of type:

  • for type = "bar", a stacked bar chart (stack of one-dimensional bar plots),

  • for type = "bubble", a stacked bubble chart, referred to as a bubble plot frequency matrix (BPFM).

COLORS

For a one-variable plot, the default bar colors are taken from the current theme via the bar_fill_discrete argument of style, which by default uses the qualitative HCL palette "hues". Alternatively, set the bar colors explicitly with the fill parameter, using:

  • a single color,

  • a palette name, or

  • a vector of colors, e.g., from getColors.

Pre-defined sequential and divergent HCL ranges are available through getColors. The qualitative sequence "hues" provides equally spaced HCL colors (same chroma and luminance). Sequential and divergent ranges are available at 30-degree increments around the HCL color wheel, including "reds", "rusts", "browns", "olives", "greens", "emeralds", "turquoises", "aquas", "blues", "purples", "violets", "magentas", and "grays".

Define a divergent color scale by providing a vector of two such ranges to fill, e.g., c("purples", "rusts"). These are especially useful for multiple bar charts with a common response scale (e.g., Likert items). Alternatively, specify colors manually, such as c("coral3", "seagreen3") for a two-level by variable.

For finer control, call getColors explicitly and pass its result to fill, adjusting chroma (c) and luminance (l), or defining a custom hue (h). See getColors for details.

The values of another variable can be mapped to bar fill by setting fill equal to that variable’s name, typically y when supplied. When y is tabulated, refer to it as (count). Larger values produce darker bars.

Additional pre-specified palettes include "rainbow", "terrain", and "heat". Distinct palettes include "distinct" (maximally separated hues), the viridis family ("viridis", "cividis", "magma", "inferno", "plasma"), and color-blind friendly options such as "Okabe-Ito". Wes Anderson–inspired palettes such as "Moonrise1", "Royal1", "GrandBudapest1", "Darjeeling1", and "BottleRocket1" are also available (with variants using 2 or 3 in the name where defined).

LEGEND

When two variables are plotted, a legend is produced with entries for each level of the second or by variable. By default, the legend is placed in the right margin. This position can be changed with legend_position, which accepts "right_margin" and any valid position accepted by the standard R legend function.

The legend title can be abbreviated with legend_abbrev, which specifies the maximum number of characters. The legend is vertical by default, but can be drawn horizontally with legend_horiz.

LONG CATEGORY NAMES

Category labels are often long. Adjust their display with rotate_x and rotate_y, in conjunction with offset, which moves labels away from the axis to compensate for rotation. These settings can be made persistent with style. To reset to defaults, call style() again.

Spacing codes for category names:

  1. Any space in a category name is converted to a new line in the plotted label.

  2. To keep words on the same line, replace the space with a tilde ~; the tilde is rendered as a space without a line break.

For console output, you can limit label length with label_max. Longer names are abbreviated to the specified number of characters, with a mapping table provided to show the correspondence between abbreviated and full names. For one-variable frequency distributions, out_size controls the maximum line width before the distribution is printed vertically instead of horizontally.

MULTIPLE BAR CHARTS ON THE SAME PANEL (PLOT)

For multiple x-variables, set one_plot = TRUE to overlay individual bar charts on a single panel. This is especially useful when all items share a common response scale (e.g., Likert items). By default, Chart() produces a single-panel display when a common response scale is detected.

The algorithm for detecting a common response scale identifies the variable with the largest set of responses, then checks that all other variables’ responses are contained within that set. Some items may not exhibit all possible responses (e.g., no one chooses “Strongly Disagree”), but as long as at least one variable contains the full response set, the scales are treated as common.

Regardless of this automatic detection, you can explicitly set one_plot to either TRUE or FALSE. Explicitly setting one_plot bypasses the commonality check and saves computation.

ENTER NUMERIC VARIABLE DIRECTLY

Instead of computing counts from raw data, you can enter a numeric variable directly as y, together with a categorical x (and possibly a categorical by). In this case, the chart uses the supplied numeric values as-is (or aggregates them according to stat). Alternatively, you can read a pre-tabulated table of counts into R as a matrix or data frame and pass it to Chart().

STATISTICS

In addition to the Plotly and static charts, descriptive and optional inferential statistics are reported. For count mode, a frequency table (one variable) or joint frequency table (two variables) is displayed, followed by Cramér’s V and the chi-square test of independence (or equal probabilities) by default. For summary mode, the aggregated table is printed without a chi-square test, as the test is not appropriate for numeric summaries.

VARIABLE LABELS

If variable labels are stored in the data frame (e.g., via Read or VariableLabels), they are used by default as axis labels and in text output. For a single variable, the x-axis label defaults to the variable label unless xlab is explicitly supplied. For two variables, the plot title is derived from both variable labels unless overridden by main. Variable labels are also shown in the printed tables.

PDF OUTPUT

To write graphics to a PDF file, use pdf_file, optionally with width and height. Files are written to the current working directory, which you can explicitly set with setwd.

ONLY VARIABLES ARE REFERENCED

Arguments that denote variables in Chart() (and other lessR functions) must be names of existing variables, either in the referenced data frame (e.g., the default d) or in the user’s workspace (global environment). Expressions are not evaluated directly. For example:

> Chart(cut(rnorm(50))) # does NOT work

Instead, assign the expression to a variable and reference that variable:

    > Y <- cut(rnorm(50))   # create vector Y in user workspace
    > Chart(Y)                            # directly reference Y

Value

For interactive visualizations, Chart() returns a Plotly htmlwidget object (class plotly) that can be printed for interactive viewing or saved as a self-contained HTML file.

For standard (non-interactive) charts, the output can optionally be saved as an R object. Otherwise, it appears only in the console (unless quiet = TRUE). Two types of components are provided: readable text output and numerical statistics.

The readable output consists of character strings such as frequency or summary tables suitable for display. The numerical components are statistics amenable to further analysis. This design supports reproducible reporting in R Markdown documents by referencing the name of each output component directly, using the syntax object$component.

Each component appears only when relevant to the current analysis. For example, cell proportions (out_prop) are included only for two-way tables.

Example: save the output of a standard chart to an object with any valid R name, such as b <- Chart(Dept). View the available output elements with names(b), and access a specific component by prefixing with the object name, such as b$out_chi to display the chi-square test results. These objects can be displayed directly in the console or within R Markdown for integrated text and analysis.

Bar charts only: tabulated numerical variable y

When Chart() is used as a bar chart with a tabulated numerical variable (counts or proportions), the object may contain:

Readable output

out_title

Title of the analysis.

out_lbl

Variable label.

out_counts

Frequency or two-way frequency distribution.

out_chi

Chi-square test of equal probabilities (one variable) or independence (two variables).

One variable out_miss

Number of missing values.

Two variables out_prop

Cell proportions.

Two variables out_row

Row-wise cell proportions.

Two variables out_col

Column-wise cell proportions.

Statistics

n_dim

Number of dimensions, 1 or 2.

p_value

p-value for the null hypothesis of equal proportions (one variable) or independence (two variables).

freq

Data frame of the frequency distribution.

One variable values

y-values read directly.

One variable prop

Frequency distribution of proportions.

One variable n_miss

Number of missing values.

Numerical variable y read from data

When Chart() reads a numeric variable y directly from the data and summarizes it across one or two categorical variables, the returned object can include:

out_y

Values of y used in the analysis.

n_dim

Number of dimensions, 1 or 2.

Author(s)

David W. Gerbing (Portland State University; gerbing@pdx.edu)

References

Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapter 4, NY: Routledge.

Gerbing, D. W. (2020). R Visualizations: Derive Meaning from Data, Chapter 3, NY: CRC Press.

Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond, Journal of Statistics and Data Science Education, 29(3), 251-266, https://www.tandfonline.com/doi/abs/10.1080/26939169.2021.1999871.

Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC. URL: https://plotly.com/r/

See Also

X, XY, getColors, barplot, table, legend, savePlotly.

Examples


# get the data
d <- Read("Employee")

# --------------------------------------------------------
# bar chart from tabulating the data for a single variable
# --------------------------------------------------------

# for each level of Dept, display the frequencies
# -----------------------------------------------
# bar chart, standard and plotly
  Chart(Dept)  # bar chart by default

  # radar chart, plotly only
  Chart(Dept, type="radar") 

  # bubble chart, plotly only
  Chart(Dept, type="bubble") 

  # pie chart, plotly only
  Chart(Dept, type="pie") 

  # treemap chart, plotly only
  Chart(Dept, type="treemap") 

#  the values output by BarChart into the myOutput list
myOutput <- Chart(Dept)
# display the saved output
myOutput

# just males with salaries larger than 75,000 USD
Chart(Dept, filter=(Gender=="M" & Salary > 85000))

# rotate and offset the axis labels, sort categories by frequencies
Chart(Dept, rotate_x=45, offset=1, sort="-")

# set bars to a single color of blue with some transparency
Chart(Dept, fill="blue", transparency=0.3)
# progressive (sequential) color scale of blues
Chart(Dept, fill="blues")

# viridis palate
Chart(Dept, fill="viridis")

# change the theme just for this analysis, as opposed to style()
Chart(Dept, theme="darkgreen")

# set bar color to hcl custom hues with chroma and luminance
#   at the values provided by the default hcl colors from
#   the getColors function, which defaults to h=240 and h=60
#   for the first two colors on the qualitative scale
Chart(Gender, fill=c(hcl(h=180,c=100,l=55), hcl(h=0,c=100,l=55)))

# or set to unique colors via color names
Chart(Gender, fill=c("palegreen3","tan"))

# darken the colors with an explicit call to getColors,
#   do a lower value of luminance, set to l=25
Chart(Dept, fill=getColors(l=25), transparency=0.4)

# column proportions instead of frequencies
Chart(Gender, stat_x="proportion")

# map value of tabulated count to bar fill
Chart(Dept, fill=(count))

# data with many values of categorical variable Make and large labels
myd <- Read("Cars93")
# perpendicular labels
Chart(Make, rotate_x=90, data=myd)
# manage size of horizontal value labels
Chart(Make, horiz=TRUE, label_max=4, data=myd)

# read y variable, Salary
# display bars for values of count <= 0 in a different color
#  than values above
Chart(Dept, y=Salary, stat="dev", sort="+", fill_split=0)

# scale the luminosity of the bars with the sequential scale
Chart(Dept, y=Salary, stat="deviation", sort="+",
      fill_scaled=TRUE, fill="green")

# scale the luminosity of the bars with a divergent scale
Chart(Dept, y=Salary, stat="deviation", sort="+", fill_scaled=TRUE,
         fill=c("red", "blue"))

# ----------------------------------------------------
# bar chart from tabulating the data for two variables
# ----------------------------------------------------

# at each level of Dept, show the frequencies of the Gender levels
  # bar chart, standard and plotly
  Chart(Dept, by=Gender)  # bar chart by default

  # radar chart, plotly only
  Chart(Dept, by=Gender, type="radar") 

  # bubble chart, plotly only
  Chart(Dept, by=Gender, type="bubble") 

  # pie chart, plotly only
  Chart(Dept, by=Gender, type="pie") 

  # treemap chart, plotly only
  Chart(Dept, by=Gender, type="treemap") 
# --------------------------------------

# Trellis (facet) plot, bar chart only
Chart(Dept, facet=Gender)

# at each level of Dept, show the row proportions of the Gender levels
#   i.e., 100% stacked bar graph
Chart(Dept, by=Gender, stack100=TRUE)

# at each level of Gender, show the frequencies of the Dept levels
# do not display percentages directly on the bars
Chart(Gender, by=JobSat, fill="reds", labels="off")

# specify two fill colors for Gender
Chart(Dept, by=Gender, fill=c("deepskyblue", "black"))

# display bars beside each other instead of stacked, Female and Male
# the levels of Dept are included within each respective bar
# plot horizontally, display the value for each bar at the
#   top of each bar
Chart(Gender, by=Dept, beside=TRUE, horiz=TRUE, labels_position="out")

# horizontal bar chart of two variables, put legend on the top
Chart(Gender, by=Dept, horiz=TRUE, legend_position="top")

# for more info on base R graphic options, enter:  help(par)
# for lessR options, enter:  style(show=TRUE)
# here fill is set in the style function instead of BarChart
#   along with the others
style(fill=c("coral3","seagreen3"), lab_color="wheat4", lab_cex=1.2,
      panel_fill="wheat1", main_color="wheat4")
Chart(Dept, by=Gender,
         legend_position="topleft", legend_labels=c("Girls", "Boys"),
         xlab="Dept Level", main="Gender for Different Dept Levels",
         value_labels=c("None", "Some", "Much", "Ouch!"))
style()


# -------------------------------------------------------------------------
# bar chart from a statistic aggregated across 1 or 2 categorical variables
# -------------------------------------------------------------------------
Chart(Dept, y=Salary, stat="mean")

Chart(Dept, by=Gender, y=Salary, stat="mean")

# -----------------------------------------------------------------
# multiple bar charts tabulated from data across multiple variables
# -----------------------------------------------------------------

# bar charts for all non-numeric variables in the data frame called d
#   and all numeric variables with a small number of values, < n_cat
# BarChart(one_plot=FALSE)

d <- rd("Mach4", quiet=TRUE)

# stacked bar charts for 20 6-pt Likert scale items
# default scale is divergent from "browns" to "blues"
Chart(m01:m20, horiz=TRUE, labels="off", sort="+")

# stacked bubble charts for 20 6-pt Likert scale items
Chart(m01:m20, type="bubble")




# custom scale with explicit call to getColors, HCL chroma at 50
clrs <- getColors("greens", "purples", c=50)
Chart(m01:m20, horiz=TRUE, labels="off", sort="+", fill=clrs)

# custom divergent scale with pre-defined color palettes
#  with implicit call to getColors
Chart(m01:m20, horiz=TRUE, labels="off", fill=c("aquas", "rusts"))


# ----------------------------
# can enter many types of data
# ----------------------------

# generate and enter integer data
X1 <- sample(1:4, size=100, replace=TRUE)
X2 <- sample(1:4, size=100, replace=TRUE)
Chart(X1)
Chart(X1, by=X2)

# generate and enter type double data
X1 <- sample(c(1,2,3,4), size=100, replace=TRUE)
X2 <- sample(c(1,2,3,4), size=100, replace=TRUE)
Chart(X1)
Chart(X1, by=X2)

# generate and enter character string data
# that is, without first converting to a factor
Travel <- sample(c("Bike", "Bus", "Car", "Motorcycle"), size=25, replace=TRUE)
Chart(Travel, horiz=TRUE)


# ----------------------------
# bar chart directly from data
# ----------------------------

# include a y-variable, here Salary, in the data table to read directly
d <- read.csv(text="
Dept, Salary
ACCT,51792.78
ADMN,71277.12
FINC,59010.68
MKTG,60257.13
SALE,68830.06", header=TRUE)
Chart(Dept, y=Salary)

# specify two variables for a two variable bar chart
# also specify a y-variable to provide the counts directly
# when reading y values directly, must be a summary table,
#   one row of data for each combination of levels with
#   a numerical value of y
# use lessR pivot function to get summary table, cannot process missing data
#   so set na_show_group to FALSE
d <- Read("Employee")
a <- pivot(d, mean, Salary, c(Dept,Gender), na_group_show=FALSE)
Chart(Dept, y=Salary_mean, by=Gender, data=a)
# do so just with BarChart, display bars in grayscale
# How does average salary vary by gender across the various departments?
Chart(Dept, y=Salary, by=Gender, stat="mean", data=d, fill="grays")


# -----------
# annotations
# -----------

d <- rd("Employee")

# Place a message in the center of the plot
# \n indicates a new line
Chart(Dept, add="Employees by\nDepartment", x1=3, y1=10)

# Use style to change some parameter values
style(add_trans=.8, add_fill="gold", add_color="gold4", add_lwd=0.5)
# Add a rectangle around the message centered at <3,10>
Chart(Dept, add=c("rect", "Employees by\nDepartment"),
                     x1=c(2,3), y1=c(11, 10), x2=4, y2=9)


lessR documentation built on Dec. 11, 2025, 5:07 p.m.