XY: Scatterplots including Time Series

View source: R/XY.R

XYR Documentation

Scatterplots including Time Series

Description

lessR introduces the concept of a data view visualization function, in which the choice of visualization function directly reflects the structure of the data and the analyst's goal for understanding the data. The function XY() visualizes the joint distribution of two continuous variables, or more generally the relationship among multiple variables, along with associated summary statistics. The specific representation is selected with the type argument:

  • Scatterplot of points with type = "scatter" (default)

  • Smoothed scatterplot with type = "smooth"

  • Contour plot (2-dimensional density display) with type = "contour"

For the standard scatterplot, stratification divides the data into groups via the arguments by and facet. The by argument overlays groups in a single coordinate system for direct comparison, while facet produces coordinated small multiples on separate panels. Depending on the combination of variable types supplied to x and y, XY() can also create time series, bubble plots, and Cleveland-style dot or lollipop plots.

When running in RStudio, static graphics are drawn in the Plots window. When the default use_plotly=TRUE, an additional interactive plotly version of the scatterplot (including time series) display is also drawn in the Viewer window.

A scatterplot displays the joint values of one or more variables as points in an \(n\)-dimensional coordinate system, in which the coordinates of each point are the values of \(n\) variables for a single observation (row of data). With a common syntax, XY(x, y, ...) generates a family of related 1- or 2-dimensional relationship plots, possibly enhanced with fitted lines, ellipses, and outlier labeling. Categorical variables should typically be defined as factors, and date variables should be stored as Date objects. When x is of class Date, the display is treated as a time-indexed series.

XY() therefore serves as the relationship-oriented counterpart to Chart() (aggregated values) and X() (univariate distributions) within the three-function visualization framework implemented in lessR.

Usage

XY(

    # -------------------------------------------------------
    # Data from which to construct the plot for x- and y-axis
    x, y=NULL, data=d, filter=NULL,

    # --------------------------------------------------
    # Stratification: Same panel or Trellis (facet) plot [x, or x and y]
    by=NULL, facet=NULL,
    n_row=NULL, n_col=NULL, aspect="fill",

    # ----------------------------------------------------------
    # Types of plots
    type=c("scatter", "smooth", "contour"), 


    # -------------------------------
    # Enhancements and customizations
    # -------------------------------

    # --------------------------------------------------------------------
    # Analogy of physical Marks on paper that create the points and labels
    # See  ?style  for more options with the  style()  function
    theme=getOption("theme"),
    fill=NULL, color=NULL,
    transparency=getOption("trans_pt_fill"),

    enhance=FALSE, means=TRUE,
    size=NULL, size_cut=NULL, shape="circle", line_width=1.5,
    segments=FALSE, segments_y=FALSE, segments_x=FALSE,

    # ----------------------
    # Sort and jitter points
    sort=c("0", "-", "+"),
    jitter_x=NULL, jitter_y=NULL,

    # ----------------
    # Outlier analysis
    ID="row.name", ID_size=0.60,
    MD_cut=0, out_cut=0, out_shape="circle", out_size=1,

    # -------------------------------------------------
    # Fit line, confidence interval, confidence ellipse
    fit=c("off", "loess", "lm", "ls", "null", "exp", "quad",
          "power", "log"),
    fit_power=1, fit_se=0.95,
    fit_color=getOption("fit_color"), fit_lwd=getOption("fit_lwd"),
    fit_new=NULL, plot_errors=FALSE, ellipse=0,

    # -----------------------------------------------------------------------
    # Time series and forecasting, plot x values sequentially [xDate, y or Y]
    ts_unit=NULL, ts_ahead=0, ts_method=c("es", "lm"),
    ts_source=c("fable", "classic"), ts_agg=c("sum", "mean"),
    ts_NA=NULL, ts_format=NULL, ts_fitted=FALSE, ts_PI=.95, 
    ts_trend=NULL, ts_seasons=NULL, ts_error=NULL, 
    ts_alpha=NULL, ts_beta=NULL, ts_gamma=NULL, ts_new_x=NULL,
    ts_stack=FALSE, ts_area_fill="transparent", ts_area_split=0,
    ts_n_x_tics=NULL,
    # Run chart (indicate with .Index for the name of the x-variable)
    show_runs=FALSE, center_line=c("off", "mean", "median", "zero"),

    # -----------------------------------
    # Lollipop chart from aggregated data [xCategorical and y]
    stat=c("mean", "sum", "sd", "deviation", "min", "median", "max"),
    stat_x=c("count", "proportion", "%"),

    # ----------------------------------
    # Integrated violin/box/scatter plot  [x only]
    vbs_plot="vbs", vbs_ratio=0.9, bw=NULL, bw_iter=10,
    violin_fill=getOption("violin_fill"),
    box_fill=getOption("box_fill"),
    vbs_pt_fill="black",
    vbs_mean=FALSE, fences=FALSE, n_min_pivot=1,
    k=1.5, box_adj=FALSE, a=-4, b=3,

    # -----------
    # Bubble plot [xCategorical, or xCategorical and yCategorical]
    radius=NULL, power=0.5, low_fill=NULL, hi_fill=NULL,

    # -----------------------------------------------------------
    # Large data sets, smoothing, contours and binning  [x and y]
    smooth_points=100, smooth_size=1,
    smooth_power=0.25, smooth_bins=128, n_bins=1,
    contour_n=10, contour_nbins=50, contour_points=FALSE,

    # ------------------------------------------------------
    # Bins for frequency polygon or text output of VBS plots
    bin=FALSE, bin_start=NULL, bin_width=NULL, bin_end=NULL,
    breaks="Sturges", cumulate=FALSE,


    # ----------------------
    # Axes labels and values
    # ----------------------

    # -----------------------
    # Axis labels and spacing
    xlab=NULL, ylab=NULL, main=NULL, sub=NULL,
    label_adjust=c(0,0), margin_adjust=c(0,0,0,0),  # top, right, bottom, left
    pad_x=c(0,0), pad_y=c(0,0),

    # ---------------------
    # Axis values specified
    scale_x=NULL, scale_y=NULL, origin_x=NULL, origin_y=NULL,

    # ---------------------
    # Axis values formatted
    rotate_x=getOption("rotate_x"), rotate_y=getOption("rotate_y"),
    offset=getOption("offset"),
    axis_fmt=c("K", ",", ".", ""), axis_x_prefix="", axis_y_prefix="",
    xy_ticks=TRUE, n_axis_x_skip=0, n_axis_y_skip=0, 

    # ------
    # legend
    legend_title=NULL,


    # -------------
    # Miscellaneous
    # -------------

    # ----------------------------------------------------
    # Add one or more objects, text, or geometric figures
    add=NULL, x1=NULL, y1=NULL, x2=NULL, y2=NULL,

    # ---------------------------------------------
    # Output: turn off, to PDF file, decimal digits
    quiet=getOption("quiet"), do_plot=TRUE,
    use_plotly=getOption("lessR.use_plotly"),
    pdf_file=NULL, width=6.5, height=6,
    digits_d=NULL,

    # -------------------------------------------------------------
    # Deprecated, to be removed in future versions
    n_cat=getOption("n_cat"), value_labels=NULL,  # use R factors instead
    rows=NULL, facet1=NULL, facet2=NULL, smooth=FALSE,

    # ---------------------------------
    # Other
    eval_df=NULL, fun_call=NULL, ...)

Arguments

x

The variable plotted on the x-axis. The data values can be continuous or categorical, cross-sectional or a time series. If x is sorted with equal intervals separating the values, or is a time series, then by default the points are plotted sequentially and connected with line segments. If named .Index, a run chart is generated from the corresponding y variable. Can specify multiple x-variables or multiple y-variables as vectors, but not both.

y

The variable plotted on the vertical y-axis. Can be continuous or categorical.

data

Optional data frame that contains one or both of x and y. Default data frame is d.

filter

A logical expression or a vector of integers that specify the row numbers to retain, defining a subset of rows of the data frame to analyze.


by

A grouping variable, a categorical variable that provides separate group profiles of the primary numeric variables x and, optionally, y on the same plot. For two-variable plots, the same grouping applies to the panels of a Trellis graphic if facet1 is specified.

facet

One categorical variable, or a vector of two categorical variables, that activates facet graphics (Trellis plots) via the lattice package, providing a separate plot of the primary variable(s) x and y for each level of the variable or combination of two levels.

n_row

Optional specification for the number of rows in the layout of a multi-panel display with Trellis graphics. Specify n_col or n_row, but not both.

n_col

Optional specification for the number of columns in the layout of a multi-panel display with Trellis (facet) graphics. Specify n_col or n_row, but not both. The default is 1.

aspect

Lattice parameter for the aspect ratio of the Trellis panels (facets), defined as height divided by width. The default value is "fill" to have the panels expand to occupy as much space as possible. Set to 1 for square panels. Set to "xy" to specify a ratio that “banks” to 45 degrees, that is, with the typical line slope approximately 45 degrees.


type

Display type. The default is "scatter" for a scatter plot. Set to "smooth" for a smoothed density plot for two numeric variables, or "contour" for a contour plot summarizing the bivariate density.


theme

Color theme for this analysis. Make persistent across analyses with style.

fill

Fill color for the points or for the area under a line chart. Can also be set via the lessR function getColors to select from a variety of color palettes. For points, the default is pt_fill; for the area under a line chart, violin_fill. For a line chart, set to "on" to use the default color.

color

Border color of the points or line color for a line plot. Can be a vector to customize the color for each point or a color range such as "blues" (see getColors). Default is pt_color from the lessR style function.

transparency

Transparency factor of the fill color of each point. Default is trans_pt_fill from the lessR style function, which is 0.1 for points. Bubbles have a default transparency of 0.3.


enhance

For a two-variable scatterplot, if TRUE, automatically add the 0.95 data ellipse, labels for outliers beyond a Mahalanobis distance of 6 from the ellipse center, the least squares line for all data, the least squares line for the data excluding outliers, and horizontal and vertical lines representing the means of the two variables.

means

If one variable is categorical (factor) and the other is continuous, then if TRUE the group means are plotted in the scatterplot by default. Also applies to a 1-D scatterplot.

size

When set to a constant, the scaling factor for standard points (not bubbles) or for a line, with default of 1.0 for points and 2.0 for a line. Set to 0 to suppress plotting of points or lines. If ts_area_fill is used for a line chart, then the default is 0. When set to a variable, activates a bubble plot, with the size of each bubble further determined by the value of radius. Applies to the standard two-variable scatterplot.

size_cut

For a bubble plot in which bubble sizes are defined by a size variable, controls whether and how many bubble labels are shown. If 1 (or TRUE), show the value of the sizing variable for selected bubbles in the center of the bubbles, except when the bubble is too small. If 0 (or FALSE), no values are displayed. If a number greater than 1, display labels only for the indicated number of values (e.g., the maximum and minimum when set to 2). The color of the displayed text is set by bubble_text from the style function.

shape

Plot character(s). The default value is "circle", with both a default exterior color and filled interior, explicitly specified by "color" and "fill". Other possible values with fillable interiors are "circle", "square", "diamond", "triup" (triangle up), and "tridown" (triangle down), all uppercase and lowercase letters, all digits, and most punctuation characters. The integers 0–25 as defined by the base R points function also apply. If plotting levels for aesthetic different groups according to by, specify a vector of shapes with one shape per group, or set to "vary" to have shapes selected automatically across by groups.

line_width

Width of the line segments that connect adjacent points, such as in time series plots. Set to 0 to remove line segments.

segments

Designed for interaction plots of means: connects each pair of successive points with a line segment. Pass a data frame of means, such as from pivot. To turn off connecting line segments for sorted, equal-interval data, set to FALSE. Currently, does not apply to Trellis plots.

segments_y

For one x-variable, draw a line segment from the y-axis to each plotted point, such as for a Cleveland dot plot. For two x-variables, the line segments connect the two points.

segments_x

Draw a line segment from the x-axis for each plotted point.


sort

Sort the values of y by the values of x, such as for a Cleveland dot plot (numeric x paired with a categorical y with unique values). If x is a vector of two variables, sort by their difference.

jitter_x

Randomly perturb the plotted points of a scatterplot horizontally within the limits of the specified value, or set to NULL to rely upon the computed default value.

jitter_y

Vertical jitter. Defaults to 0. Same interpretation as jitter_x.


ID

Name of a variable that provides the labels for selected plotted points for outlier identification, row names of the data frame by default. To label all points, use the add parameter (see below).

ID_size

Size of the plotted labels. Modify label text color with the style function parameter ID_color.

MD_cut

Mahalanobis distance cutoff to define an outlier in a two-variable scatterplot.

out_cut

Count or proportion of plotted points to label, in order of their distance from the center (means) of the univariate distribution or scatterplot, sorted from most to least extreme. For two-variable plots, distance from the center is based on Mahalanobis distance.

out_shape

Shape of outlier points in a two-variable scatterplot or VBS plot. Modify fill color from the current theme with out_fill and out2_fill in style.

out_size

Size of outlier points in a two-variable scatterplot or VBS plot.


fit

Type of best fit line. Default is "off". Options include "loess" for a non-linear smooth, "lm" for a linear least squares model, "null" for the null (intercept-only) model, "exp" for exponential growth or decay, "power" for a general power model in conjunction with fit_power, and "quad" for a quadratic (power 2) model. If potential outliers are identified according to out_cut, a second (dashed) fit line is displayed based on the data excluding those outliers.

fit_power

Power that describes the response y as a power function of the predictor x, required when fit = "power". Optionally, and experimentally, may apply to fit = "exp".

fit_se

Confidence level for the error band displayed around the line of best fit. Default is 0.95 when a fit line is specified, but is turned off when plot_errors = TRUE. Can be a vector to display multiple confidence bands. Set to 0 to suppress error bands.

fit_color

Color of the fit line.

fit_lwd

Line width of the fit line.

fit_new

When fit is set to a fitted curve such as "lm" or "quad", predicted values are computed at the x-values supplied here.

plot_errors

If TRUE, plot line segments joining each point to the regression line ("loess" or "lm"), illustrating the size of the residuals.

ellipse

Confidence level of a data ellipse for a scatterplot of one x-variable and one y-variable according to the contours of the corresponding bivariate normal density. Specify the confidence level(s) as a single number or a vector of values between 0 and 1 to plot one or more ellipses. Ellipses are not available in the interactive plotly version. For Trellis graphics, only the maximum level is used, with one ellipse per panel. Modify fill and border colors with the style parameters ellipse_fill and ellipse_color.


ts_unit

Time unit for plotting time series data when x is of type Date. Default is the time unit that describes the regular time intervals as they occur in the data. Aggregate to a coarser time unit as specified, such as a daily series aggregated to "months". Dates are stored as variables of type Date, which represent calendar dates without times-of-day. Valid values include "days", "weeks", "months", "quarters", and "years", as well as "days7" to model weekly seasonality for daily data. Otherwise the time unit for detecting seasonality is typically "months" or "quarters".

ts_ahead

Number of ts_unit steps ahead of the last time period to forecast.

ts_method

Forecasting method. Default is "es" for exponential smoothing models. Alternatively, choose "lm" for a least squares linear regression model that accounts for seasonality.

ts_source

Source of forecasting functions. Default is "fable" for access to ETS() and TSLM() from the fable package and related functions from the fpp3 ecosystem, or "classic" for base R forecasting functions.

ts_agg

Function used to aggregate over time according to ts_unit. Default is "sum" with an option for "mean".

ts_NA

By default, missing y values (NA) are not plotted, leaving gaps. Alternatively, specify a value, usually 0, to replace NA and plot that value (e.g., 0) at the corresponding date. Forecasting with missing data is not supported.

ts_format

Optional format string for as.Date() describing the values of the date variable on the x-axis, needed if the function cannot infer the date format. For example, the character string "09/01/2024" can be described by "%m/%d/%Y". See details for more information.

ts_fitted

If TRUE and ts_ahead is used for forecasting, display for each data point the fitted value, level, trend, and seasonal component.

ts_PI

Prediction interval level about the forecasted values, with default of 0.95.

ts_trend

Trend parameter. Default value is NULL, allowing the procedure to choose the specification that yields the best-fitting model for the data. Values "A" and "M" indicate additive and multiplicative models, respectively. Specify "N" to omit the trend component. See the forecasting documentation for valid combinations of trend, season, and error.

ts_seasons

Seasonality parameter. See ts_trend.

ts_error

Error parameter. See ts_trend.

ts_alpha

Exponential smoothing level parameter. Sets the value for HoltWinters(), and acts as a suggestion for ETS() in the "fable" framework.

ts_beta

Exponential smoothing trend parameter. See ts_alpha.

ts_gamma

Exponential smoothing seasonal parameter. See ts_alpha.

ts_new_x

A data frame of predictor variable names and new values for exogenous regressors used for model fitting and forecasting. The number of rows of new data values determines the number of future time periods for which forecasts are generated. Currently applies only to the default "fable" linear model for ts_method = "lm".

ts_stack

If TRUE, multiple time plots are stacked on each other, with area set to TRUE by default.

ts_area_fill

Fill under line segments, if present. If ts_stack is TRUE, the default is a gradient from the default color range (e.g., "blues"). If not specified, and fill is set with no plotted points and ts_area_fill is not specified, then fill generally controls the area under the line segments.

ts_area_split

Applies only to a Trellis plot activated with facet1. Value of y that defines a reference line splitting the filled area under the time series line. Values of y below this threshold lie below the reference line; values above lie above the line.

ts_n_x_tics

Suggested number of ticks for dates on the x-axis, overriding the default of approximately seven ticks.

show_runs

If TRUE, display the individual runs in the run analysis. Also sets run to TRUE. Customize the color of the line segments with segments_color via style.

center_line

Plots a dashed line through the middle of a run chart. Default center line is the "median", suitable when values randomly vary about the median. Alternatively, "mean" and "zero" specify that the center line goes through the mean or zero, respectively. Currently not supported for Trellis plots.


stat

Apply a specified aggregation such as "mean" to the numerical y variable at each level of a categorical x variable. The resulting dot plot (Cleveland plot) is analogous to a bar chart but emphasizes position along a common scale.

stat_x

If no y variable is specified, constructs a frequency polygon for x with access to bin_width. Either use the default "count" for each bin or "proportion", also indicated by %.


vbs_plot

A character string that specifies the components of the integrated Violin–Box–Scatterplot (VBS) of a continuous variable. A "v" in the string indicates a violin plot, a "b" indicates a box plot with flagged outliers, and a "s" indicates a 1-variable scatterplot. Default value is "vbs". The characters can be in any order and in upper- or lower-case. Generalize to Trellis plots with the facet1 and facet2 parameters, but currently only for horizontal displays. Modify fill and border colors from the current theme with the style parameters violin_fill, violin_color, box_fill and box_color.

vbs_ratio

Height of the violin plot relative to the plot area. Make the violin (and the accompanying box plot) larger or smaller by adjusting either the plot area or this value.

bw

Bandwidth for the smoothness of the violin plot. Higher values give smoother plots. Default is to calculate a bandwidth that provides a relatively smooth density estimate.

bw_iter

Number of iterations used to adjust the default R bandwidth for further smoothing of the density estimate. When set, the iterations and corresponding results are displayed.

violin_fill

Fill color for a violin plot.

box_fill

Fill color for a box plot.

vbs_pt_fill

Points in a VBS scatterplot are black by default because the background violin is based on the current theme color. To use the pt_fill and pt_color values specified by style, set vbs_pt_fill = "default". Otherwise, set to any desired color.

vbs_mean

Show the mean on the box plot with a strip in the color of out_fill, which can be changed with the style function.

fences

If TRUE, draw the inner upper and lower fences as dotted line segments.

n_min_pivot

For the pivot table underlying a VBS plot over at least a by or facet1 categorical variable, sets the minimum sample size for a group for the corresponding row to be displayed. Default is 1 for all non-empty groups. Set to 0 to view all groups.

k

IQR multiplier that determines the distance of the whiskers of the box plot from the box. Default is Tukey's setting of 1.5.

box_adj

Adjust the box and whiskers, and thus outlier detection, for skewness using the medcouple statistic as a robust measure of skewness according to Hubert and Vandervieren (2008).

a, b

Scaling factors for the adjusted box plot to set the lengths of the whiskers. If explicitly set, box_adj is activated.


radius

Scaling factor of the bubbles in a bubble plot, which sets the radius of the largest displayed bubble in inches. To activate bubble scaling, set size to a third variable, or for categorical variables (factors), the bubble sizes represent frequencies.

power

Relative scaling of bubble sizes. The default value of 0.5 scales bubbles so that the area of each bubble is proportional to the corresponding sizing variable. A value of 1 scales bubble radii directly by the sizing variable, increasing the size differences between values.

low_fill

For a categorical variable and the resulting bubble plot, or a matrix of such plots, sets the low end of a color gradient for bubble fill.

hi_fill

For a categorical variable and the resulting bubble plot, or a matrix of such plots, sets the high end of a color gradient for bubble fill.


smooth_points

Number of points superimposed on the density plot in the areas of lowest density to help identify outliers, thereby controlling how dark the smoothed display becomes.

smooth_size

Size of points superimposed on the density plot. Default value is 1, resulting in very small points.

smooth_power

Exponent of the function that maps the density scale to the color scale. Smaller values than the default of 0.25 yield darker plots.

smooth_bins

Number of bins in each direction for the density estimation.

n_bins

Number of bins for a single numeric x-variable when visualizing the mean or median of a numeric y-variable for each bin. Points are plotted as bubbles, with bubble size proportional to the bin sample size, unless size is specified as a constant. Default is 1 (no binning).

contour_n

Number of contour levels in a contour plot, with default of 10.

contour_nbins

Number of bins constructed for each of x and y, forming the 2D grid from which densities are estimated.

contour_points

If TRUE, plot the original scatterplot points in a light gray with white borders, representing the data from which the contour curves are estimated. Color is not affected by other color settings, but point size can be changed with size. Default size is 0.72.


bin

If values = "count", display the frequency distribution for a frequency polygon.

bin_start

Optional starting value of the bins for a frequency polygon.

bin_width

Optional specified bin width. Also sets bin = TRUE.

bin_end

Optional value that lies within the last bin, so the actual endpoint of the last bin may be larger than the specified value.

breaks

Method for calculating bins, or an explicit specification of the bins, such as via seq or other options provided by hist. Also sets bin = TRUE.

cumulate

If TRUE, display a cumulative frequency polygon.


xlab, ylab

Axis labels for the x- and y-axes. If not specified, labels default to the corresponding variable labels (if present), otherwise to the variable names. If xy_ticks is FALSE, no ylab is displayed. Customize these and related parameters with options such as lab_color in style.

main

Label for the plot title. If corresponding variable labels exist, the title is set by default from those labels.

sub

Subtitle for the graph, below xlab. Not yet implemented.

label_adjust

Two-element vector (x-axis label, y-axis label) that adjusts the positions of axis labels in approximate inches. Positive values move labels away from the plot edge. Not applicable to Trellis graphics.

margin_adjust

Four-element vector (top, right, bottom, left) that adjusts plot margins in approximate inches. Positive values move the corresponding margin away from the plot edge. Can be used with offset to move axis values into the expanded margin space. Not applicable to Trellis graphics.

pad_x

Proportion of padding added to the left and right sides of the x-axis. Values range from 0 to 1 for each of the two elements. If only one element is specified, it is applied to both sides.

pad_y

Proportion of padding added to the bottom and top sides of the y-axis. Values range from 0 to 1 for each of the two elements. If only one element is specified, it is applied to both sides.


scale_x

If specified, a vector of three values that define the numeric x-axis: starting value, ending value, and number of intervals.

scale_y

If specified, a vector of three values that define the numeric y-axis: starting value, ending value, and number of intervals.

origin_x

Origin of the x-axis. Default is the minimum value of x, except for time series plots and when stat is set to "count" or a related option, in which case the origin defaults to 0.

origin_y

Origin of the y-axis. Default is the minimum value of y, except for time series plots and when stat is set to "count" or a related option, in which case the origin defaults to 0.


rotate_x

Rotation in degrees of the axis value labels on the x-axis, usually to accommodate longer labels, typically used with offset. If set to 90, labels are perpendicular to the axis and a different placement algorithm is used so that offset is not needed.

rotate_y

Rotation in degrees of the axis value labels on the y-axis, usually to accommodate longer labels, typically used with offset.

offset

Spacing between axis value labels and the axis. Default is 0.5. Larger values (e.g., 1.0) create additional space for labels, especially when rotated. Can be used with margin_adjust to create additional margin space for axis labels and values.


axis_fmt

Numeric format of axis labels for both axes. Default rounds thousands to "K" (e.g., 100000 becomes 100K). Alternatives include "," to insert commas in large numbers with a decimal point, "." to insert periods, or "" to turn off formatting.

axis_x_prefix

Prefix for axis labels on the x-axis, such as "$".

axis_y_prefix

Prefix for axis labels on the y-axis, such as "$".

xy_ticks

Flag indicating whether tick marks and associated value labels on the axes are displayed. To rotate axis values, use rotate_x, rotate_y, and offset (see style).

n_axis_x_skip

Particularly useful for Trellis or facet plots with many labels on the x-axis. Specifies the spacing for skipping labels. A value of 0 includes all labels (default). A value of 1 skips every other label, 2 includes every third label, etc. Also consider rotate_x.

n_axis_y_skip

Same as n_axis_x_skip but applies to the y-axis.


legend_title

Title of the legend for a multiple-variable x or y plot.


add

Overlay one or more objects, text or geometric figures, on the plot. Possible values include any text (first argument) or "text"; "labels" to label each point with the row name; and geometric objects "rect" (rectangle), "line", "arrow", "v_line" (vertical line), and "h_line" (horizontal line). The value "means" draws vertical and horizontal lines at the respective means. Does not apply to Trellis graphics. Customize with parameters such as add_fill and add_color via style.

x1

First x-coordinate for each overlaid object; may be "mean_x". Not used for "h_line".

y1

First y-coordinate for each overlaid object; may be "mean_y". Not used for "v_line".

x2

Second x-coordinate for each object; may be "mean_x". Used only for "rect", "line", and "arrow".

y2

Second y-coordinate for each object; may be "mean_y". Used only for "rect", "line", and "arrow".


quiet

If TRUE, suppress text output. Can change the system default with style.

do_plot

If TRUE (default), generate the plot.

use_plotly

If TRUE (default), draw a plotly interactive plot in the RStudio Viewer window in addition to the static plot in the Plots window. Not all options are available in the interactive version, but key options are supported.

pdf_file

If specified, direct PDF graphics to the given file name.

width

Width of the plot window in inches, default 5 except within RStudio, where the default maintains an approximately square plotting area.

height

Height of the plot window in inches, default 4.5 except for 1-D scatterplots and when running in RStudio.

digits_d

Number of significant digits for each displayed summary statistic.


n_cat

Maximum number of unique, equally spaced integer values of a variable for which it will be analyzed as categorical rather than continuous. Default is 0. Use to treat such variables as informal factors. Deprecated. Best to convert integer-coded categoricals to factors explicitly.

value_labels

For factors, default labels are the factor levels; for character variables, default labels are the character values. Optionally provide labels for the x-axis to override defaults. If the variable is a factor and value_labels is NULL, then labels are set to the factor levels with spaces replaced by line breaks. If x and y share the same scale, labels also apply to the y-axis. Control label size with axis_cex and axis_x_cex from the lessR style function.

rows

Deprecated. Old parameter name; use filter.

facet1

Deprecated. Old parameter name; use facet.

facet2

Deprecated. Old parameter name; use a two-element vector in facet.

smooth

Deprecated. Use parameter type.


eval_df

Controls whether to check for an existing data frame and specified variables. Default is TRUE, unless the shiny package is loaded, in which case it is set to FALSE so that Shiny can run. Must be set to FALSE when using the pipe operator %\>%.

fun_call

Function call. Used with knitr to pass the function call when obtained from the abbreviated helper sp.


...

Other parameters for non-Trellis graphics as recognized by base R functions plot and par, including cex.main for the size of the title, col.main for the color of the title, and sub and col.sub for a subtitle and its color.

Details

VARIABLES and TRELLIS PLOTS

There is at least one primary variable, x, which defines the horizontal x-axis. A second primary variable, y, defines the vertical y-axis. Either x or y (but not both simultaneously) may be a vector of variables. The simplest usage—single x and y—produces a single scatterplot on one panel. With type = "scatter" this is a standard point plot; alternative values such as "smooth" and "contour" invoke 2-D kernel density and contour displays, respectively.

For numeric primary variables, multiple plots can appear on a single panel in two ways. First, by defining groups with the by argument: by identifies a categorical grouping variable, and a separate scatterplot layer is drawn for each of its levels. Group levels are distinguished by color and/or shape. By default, colors vary across groups; for two groups, a common pattern is a filled symbol for one group and a point with transparent interior for the other.

Second, multiple numeric x-variables or multiple y-variables can be supplied as a vector, which produces multiple series on the same panel. This is commonly used for time series overlays and multi-series line plots.

Trellis graphics (facets), from Deepayan Sarkar's (2009) lattice package, may also be used. A variable specified with facet is a conditioning variable. A single facet variable produces one panel for each of its levels; a length-two vector of facet variables produces one panel for each cross-classified combination of their levels. Within each panel, x and y are plotted as usual, and an additional grouping variable specified with by may be applied to all panels. When x has at most 1000 unique values, XY() can provide a brief diagnostic of the maximum number of repeated values for each level of facet.

Control the panel dimensions and the overall size of the Trellis display with width and height for the graphics device, n_row and n_col for the number of panels in each direction, and aspect for the panel height-to-width ratio. The plot window is the active graphics device (e.g., the standard R plotting window, RStudio Plots pane, or a pdf file when pdf_file is specified).

CATEGORICAL VARIABLES

Conceptually, lessR distinguishes continuous and categorical variables. Categorical variables have relatively few unique values and are often coded as factors. However, categorical variables may also be coded numerically, such as Likert responses from 1 to 5. The structuring arguments by and facet apply when at least one of x or y is numeric, but that numeric variable may represent either a truly continuous scale or a discrete Likert-type scale.

Scatterplots of Likert-type data can be challenging because the number of possible joint values is small. For two five-point scales there are only 25 possible combinations, so many points overlap at the same coordinates. For such situations, jittering, or dot/mean plots (using stat) may provide clearer displays.

DATA

The default input data frame is d. Another data frame can be specified with the data argument. Regardless of the name of the data frame, variables can be referenced directly by name—no need to attach the data frame or use d$name. Referenced variables may live in the data frame, the global environment, or both.

The plotted values can be the raw data themselves, or summaries derived from them. With a single numeric variable, counts or proportions can be plotted on the y-axis, optionally after binning x. For a categorical x-variable paired with a continuous y-variable, summary statistics such as means can be plotted at each level of x. When x is continuous and binning is desired, XY() uses the same binning parameters as Histogram, such as bin_width, to override defaults. The stat parameter controls what is plotted (e.g., "data", "mean", "median"). By default, connecting line segments are drawn, yielding a frequency polygon. Turn off line segments by setting line_width = 0.

The filter parameter subsets rows (cases) of the input data frame according to a logical expression or a vector of integers specifying row numbers to retain. Use standard R logical operators as described in Logic—for example, & (and), | (or), and ! (not)—and relational operators as described in Comparison such as == (equality), != (not equal), and > (greater than). To specify rows directly, create an integer vector using standard R syntax. See Examples.

VALUE LABELS
Deprecated. Use factor() instead.

The value_labels option can override the axis tick labels with user-supplied values. This is particularly useful for Likert-style data coded as integers: for example, a data value 0 can be displayed as "Strongly Disagree". These labels apply to integer categorical variables and to factor variables. Any spaces in a label are translated into line breaks to improve readability.

In current workflows, the recommended approach is to define factors with the desired labels directly, typically via factors, which allows convenient creation of labeled factors for one or more variables in a single statement.

VARIABLE LABELS

Base R does not natively support variable labels, but lessR stores labels in the data frame alongside the variables, typically created by Read or VariableLabels. When variable labels exist, XY() uses them by default for axis labels and in text output. Otherwise, the variable names are used.

TWO VARIABLE PLOT

With two variables specified, XY() behaves as follows. If the values of x are unsorted, have unequal intervals, or there is missing data in either variable, a standard scatterplot is produced (for type = "scatter"). When x is sorted with equal intervals and there are no missing values, the default display connects adjacent points with line segments, yielding a function or time-series style plot.

Supplying multiple continuous x-variables against a single y, or vice versa, produces multiple series on the same graph. The points for the second series reuse the first series' color but with transparent fill; for more series, additional colors from the current theme are used.

SCATTERPLOT MATRIX

If x is a vector of continuous variables and y is omitted, XY() generates a scatterplot matrix. The matrix adopts the current color theme. Individual colors (e.g., fill, color) can be overridden. The lower triangle shows the pairwise scatterplots, and the upper triangle shows the corresponding correlation coefficients. By default a non-linear loess fit line is added to each scatterplot; the fit parameter can be used to request a linear least squares line instead, along with fit_color to set its color.

SIZE VARIABLE
Specifying a numeric size variable activates a bubble plot in which the area of each bubble is determined by the corresponding value of size (modified by radius and power).

To vary shapes explicitly across groups, use shape and provide a vector of values (e.g., created with c). One shape is used for each level of the grouping variable in by. To vary colors, use fill; if fill is specified without shape, colors vary but shapes do not. To vary both, specify both shape and fill with values for each by level.

Beyond the named shapes such as "circle", any single character (letters, digits, "+", "*", "#") may be used as a plotting symbol. Within a single specification, either use standard named shapes or single characters, but not both.

SCATTERPLOT ELLIPSE

For a scatterplot of two numeric variables, the ellipse argument requests one or more data ellipses, based on the contours of the corresponding bivariate normal density. For a single x- and y-variable pair, setting ellipse to a numeric value between 0 and 1 (e.g., 0.95) draws the corresponding probability contour. A vector of values produces multiple ellipses. XY() expands the axes as needed so that ellipses extending beyond the range of the data remain fully visible. For Trellis graphics, only the largest ellipse level is used (one ellipse per panel). Ellipse fill and border colors are controlled via ellipse_fill and ellipse_color in style.

TIME CHARTS

See https://web.pdx.edu/~gerbing/lessR/examples/Time.html for additional examples and explanation.

When x is the special variable .Index, XY() produces a run chart of y. The y-values are plotted against their index positions (1, 2, …), and run-chart diagnostics such as center lines and runs analysis can be displayed via the corresponding arguments.

If x is of type Date or an R time-series object, XY() produces a time series plot for each specified variable. Time-series data can be supplied in “long” format (a single column of values plus a date column) or in “wide” format (multiple time-series columns sharing a date index). XY() will attempt to convert character string date values (e.g., "08/18/1952") to Date via as.Date(), using default date formats when possible.

XY() makes a reasonable attempt to decode common date formats, but some formats (e.g., those with month names rather than numbers) may require an explicit format string via ts_format. If the default conversion fails or is ambiguous, specify the correct format using examples such as those in the table below.

Example Date Format
--------------------------- -----------------
"2022-09-01" "%Y-%m-%d"
"2022/9/1" "%Y/%m/%d"
"2022.09.01" "%Y.%m.%d"
"09/01/2022" "%m/%d/%Y"
"9/1/22" "%m/%d/%y"
"September 1, 2022" "%B %d, %Y"
"Sep 1, 2022" "%b %d, %Y"
"20220901" "%Y%m%d"
--------------------------- -----------------

XY() also converts character-string dates such as "2024 Aug" and "2024 Q1", interpreting three-letter month abbreviations and quarter codes Q1Q4.

The ts_unit argument optionally aggregates the date variable to a higher time unit using endpoints() and period.apply() from the xts package. For example, daily data can be aggregated and plotted as monthly or quarterly series.

The aggregation function is specified with ts_agg, default "sum", with "mean" as a common alternative.

For missing data, if a date is present but the corresponding y-value is NA, the line is broken at that point (no segment is drawn). If both the date and value are absent (the entire row is missing), the line connects the nearest observed dates, spanning the gap in calendar time but skipping the missing tick label. For example, if "2021-01-07" and "2021-01-09" are present but "2021-01-08" is absent, the plot includes points for the 7th and 9th connected by a line; the 8th does not appear on the axis.

Forecasting is activated by setting ts_ahead to a positive integer. The trend, seasonal, and error components for exponential smoothing and regression models follow the fable conventions and are controlled by ts_trend, ts_seasons, and ts_error. XY() supports four forecasting engines: ETS() and TSLM() from fable via ts_source = "fable", and classic HoltWinters() and regression decomposition (stl() + lm()) via ts_source = "classic". Multiplicative components require positive data. Additive trend is linear, while multiplicative trend corresponds to exponential growth or decay; additive and multiplicative seasonality scale the seasonal pattern differently as the series level changes.

Exponential smoothing via ETS() (the default when ts_source = "fable") generally provides flexible model specification and often superior predictive performance compared to classic Holt–Winters. For regression with seasonality, TSLM() and the stl()+lm() approach both rely on least squares but handle seasonal structure differently. Setting ts_fitted = TRUE provides fitted values and decomposed trend and seasonal components for inspection.

2-D KERNEL DENSITY

Set type = "smooth" to invoke smoothScatter and display a 2-D kernel density estimate for large data sets. The display respects the current theme. The smooth_points argument controls how many points from low-density regions are superimposed, smooth_bins sets the number of bins in each direction for density estimation, and smooth_power controls the transformation from density scale to color scale. Larger smooth_power values reduce saturation in low-density regions. These arguments map directly to nrpoints, nbin, and transformation in smoothScatter. Grid lines are disabled by default for smooth density plots, but can be re-enabled via the appropriate styling options (e.g., grid_color in style).

Alternatively, set type = "contour" to plot contour lines of the estimated bivariate density, with the level resolution controlled by contour_n and contour_nbins.

COLORS

A global color theme for XY() and other lessR graphics can be set with style (e.g., style(theme = "lightbronze")). The default theme is "lightbronze". A grayscale theme is available via "gray", and other themes (e.g., "sienna", "darkred") are described in style. The sub_theme = "black" option yields a black background with partially transparent plotted colors.

Individual graphical elements (points, lines, panels, grid lines, etc.) can be customized with additional style arguments, such as panel_fill. For a subtle warm background, try panel_fill = "snow"; very light grays such as "gray99" or "gray97" provide a neutral tone.

For many color options, the value "off" is equivalent to "transparent", removing the corresponding fill or borde

See showColors for a display of all named R colors and their RGB values.

ANNOTATIONS

The add argument and its related coordinates (x1, y1, x2, y2) annotate the plot with text and geometric objects. Each object type requires a specific set of coordinates, summarized below. x-coordinates may take the special value "mean_x" and y-coordinates may take "mean_y".

Value Object Required Coordinates
----------- ------------------- -----------------------
"text" text x1, y1
"point" point x1, y1
"rect" rectangle x1, y1, x2, y2
"line" line segment x1, y1, x2, y2
"arrow" arrow x1, y1, x2, y2
"v_line" vertical line x1
"h_line" horizontal line y1
"means" horizontal and vertical lines at the means
----------- ------------------- -----------------------

The value of add specifies the object type. For a single object, provide one value and its required coordinates. For multiple placements of the same object, supply coordinate vectors. For multiple different objects, supply a vector of add values and, for each coordinate argument (x1, y1, x2, y2), a vector whose elements correspond to the sequence of objects in add.

Styling options such as add_color, add_fill, and transparency may also be given as vectors, allowing different objects to have different colors or other properties.

PDF OUTPUT

To obtain pdf output, set pdf_file to a file name, optionally with width and height to control device size. The pdf is written to the current working directory, which can be explicitly set with setwd.

ADDITIONAL OPTIONS

Many commonly used graphical parameters from the base R function plot are passed through by XY(), including:

cex.main, col.lab, font.sub, etc.

Settings for main/sub-titles and axis annotation; see title and par.

main

Main title of the graph; see title.

xlim

Limits of the x-axis, expressed as c(x1, x2). Note that x1 > x2 is allowed and produces a reversed axis.

ylim

Limits of the y-axis.

ONLY VARIABLES ARE REFERENCED

A referenced variable in a lessR plotting function such as XY must be a variable name (or vector of variable names), not an arbitrary expression. The variable must exist either in the referenced data frame (e.g., the default d) or in the user's workspace (global environment). Expressions are not evaluated in place. For example:

> XY(rnorm(50), rnorm(50)) # does NOT work

Instead, create named objects and reference them directly:

    > X <- rnorm(50)   # create vector X in user workspace
    > Y <- rnorm(50)   # create vector Y in user workspace
    > XY(X, Y)         # directly reference X and Y

Value

The output may be assigned to an R object; otherwise it is printed directly to the console. Each component appears only when the corresponding analytic option is activated. For example, outlier identification must be enabled (e.g., via MD_cut) for out_outliers to be included in the output.

To save the results, assign the output to an object, such as p <- XY(Years, Salary). Use names(p) to view the available components, and access any one by prefixing with p$, for example p$out_stats. Output can be viewed interactively at the console or inserted into R~Markdown documents for reproducible reporting.

READABLE OUTPUT

out_stats: Correlational analysis.
out_outliers: Mahalanobis Distance values for detected outliers.
out_frcst: Forecasted values.
out_fitted: Fitted values for the observed data.
out_coefs: Linear and seasonal coefficients from forecasting models.
out_smooth: Smoothing parameters from exponential smoothing models.
out_bubble: Bubble-plot settings, including radius and power.
out_reg: Regression statistics produced when fit is specified.
out_parm: Parameter settings for a VBS plot of a continuous variable.
out_pivot: Pivot table for VBS plots based on any combination of by and facet variables.
out_by: Pivot table for VBS plots aggregated by the by variable.
out_facet1: Pivot table for VBS plots aggregated by the first facet variable.
out_facet2: Pivot table for VBS plots aggregated by the second facet variable.

STATISTICS

outliers: Row numbers corresponding to detected outliers.

Author(s)

David W. Gerbing (Portland State University; gerbing@pdx.edu)

References

Brys, G., Hubert, M., & Struyf, A. (2004). A robust measure of skewness. Journal of Computational and Graphical Statistics, 13(4), 996-1017.

Murdoch, D, and Chow, E. D. (2013). ellipse function from the ellipse package.

Gerbing, D. W. (2023). R Data Analysis without Programming, 2nd edition, Chapter 10, NY: Routledge.

Gerbing, D. W. (2020). R Visualizations: Derive Meaning from Data, Chapter 5, NY: CRC Press.

Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond, Journal of Statistics and Data Science Education, 29(3), 251-266, https://www.tandfonline.com/doi/abs/10.1080/26939169.2021.1999871.

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). Melbourne, Australia: OTexts. Retrieved from https://otexts.com/fpp3/

Sarkar, Deepayan (2008) Lattice: Multivariate Data Visualization with R, Springer. http://lmdvr.r-forge.r-project.org/

Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC. URL: https://plotly.com/r/

See Also

X, Chart, style.

Examples

# read the data
d <- rd("Employee", quiet=TRUE)
d <- d[.(random(0.6)),]  # less computationally intensive
dd=d

#---------------------------------------------------
# traditional scatterplot with two numeric variables
#---------------------------------------------------

# scatterplot with all defaults
XY(Years, Salary)
# or use Plot in place of XY, the older method

XY(Years, Salary, by=Gender, size=2, fit="lm",
     fill=c(M="olivedrab3", W="gold1"),
     color=c(M="darkgreen", W="gold4"))

# maximum information, minimum input: scatterplot +
#  means, outliers, ellipse, least-squares lines with and w/o outliers
XY(Years, Salary, enhance=TRUE)

# extend x and y axes
XY(Years, Salary, scale_x=c(-10, 35, 10), scale_y=c(0,200000,10))

XY(Years, Salary, add="Hi", x1=c(12, 16, 18), y1=c(80000, 100000, 60000))

d <- factors(Gender, levels=c("M", "W"))
XY(Years, Salary, facet=Gender)
d <- dd



# just males employed more than 5 years
XY(Years, Salary, filter=(Gender=="M" & Years > 5))

# plot 0.95 data ellipse with the points identified that represent
#   outliers defined by a Mahalanobis Distance larger than 6
# save outliers into R object out
d[1, "Salary"] <- 200000
out <- XY(Years, Salary, ellipse=0.95, MD_cut=6)

# new shape and point size, no grid or background color
# then put style back to default
style(panel_fill="powderblue", grid_color="powderblue")
XY(Years, Salary, size=2, shape="diamond")
style()

# translucent data ellipses without points or edges
#  show the idealized joint distribution for bivariate normality
style(ellipse_color="off")
XY(Years, Salary, size=0, ellipse=seq(.1,.9,.10))
style()

# bubble plot with size determined by the value of Pre
# display the value for the bubbles with values of  min, median and max
XY(Years, Salary, size=Pre, size_cut=3)

# variables in a data frame not the default d
# plot 0.6 and 0.9 data ellipses with partially transparent points
# change color theme to gold with black background
style("gold", sub_theme="black")
XY(eruptions, waiting, transparency=.5, ellipse=seq(.6,.9), data=faithful)

# scatterplot with two x-variables, plotted against Salary
# define a new style, then back to default
style(window_fill=rgb(247,242,230, maxColorValue=255),
  panel_fill="off", panel_color="off", pt_fill="black", transparency=0,
  lab_color="black", axis_text_color="black",
  axis_y_color="off", grid_x_color="off", grid_y_color="black",
  grid_lty="dotted", grid_lwd=1)
XY(c(Pre, Post), Salary)
style()

# increase span (smoothing) from default of .7 to 1.25
# span is a loess parameter, which generates a caution that can be
#   ignored that it is not a graphical parameter -- we know that
# display confidence intervals about best-fit line at
#   0.95 confidence level
XY(Years, Salary, fit="loess", span=1.25)

# 2-D kernel density (more useful for larger sample sizes)
XY(Years, Salary, type="smooth")


#------------------------------------------------------
# scatterplot matrix from a vector of numeric variables
#------------------------------------------------------

# with least squares fit line
XY(c(Salary, Years, Pre), c(Salary, Years, Pre), fit="lm")


#--------------------------------------------------------------
# Trellis graphics and by for groups with two numeric variables
#--------------------------------------------------------------

# Trellis plot with condition on 1-variable
# optionally re-order default alphabetical R ordering by converting
#   to a factor with lessR factors (which also does multiple variables)
# always save to the full data frame with factors
d <- factors(Gender, levels=c("M", "W"))
XY(Years, Salary, facet=Gender)
d <- Read("Employee", quiet=TRUE)



# all three by (categorical) variables
XY(Years, Salary, facet=c(Dept, Gender), by=Plan, n_axis_y_skip=1)

# vary both shape and color with a least-squares fit line for each group
style(color=c("darkgreen", "brown"))
XY(Years, Salary, facet=Gender, fit="lm", shape=c("W","M"), size=.8)
style("gray")

# compare the men and women Salary according to Years worked
#   with an ellipse for each group
XY(Years, Salary, by=Gender, ellipse=.50)




# time charts
#------------
# run chart, with default fill area
XY(.Index, Salary, ts_area_fill="on")

# two run charts in same panel
# or could do a multivariate time series
XY(.Index, c(Pre, Post))

# Trellis graphics run chart with custom line width, no points
XY(.Index, Salary, facet=Gender, line_width=3, size=0)

# daily time series plot
# create the daily time series from R built-in data set airquality
oz.ts <- ts(airquality$Ozone, start=c(1973, 121), frequency=365)
XY(oz.ts)

# multiple time series plotted from dates and stacked
# black background with translucent areas, then reset theme to default
style(sub_theme="black", color="steelblue2", transparency=.55,
  window_fill="gray10", grid_color="gray25")
date <- seq(as.Date("2013/1/1"), as.Date("2016/1/1"), by="quarter")
x1 <- rnorm(13, 100, 15)
x2 <- rnorm(13, 100, 15)
x3 <- rnorm(13, 100, 15)
df <- data.frame(date, x1, x2, x3)
rm(date); rm(x1); rm(x2); rm(x3)
XY(date, x1:x3, data=df)
style()

# aggregate monthly data to plot by quarter
n.q <- 42
month <- seq(as.Date("2013/1/1"), length=n.q, by="months")
x <- rnorm(n.q, 100, 15)
XY(month, x, ts_unit="quarters")


# trigger a time series with a Date variable specified first
# stock prices for three companies by month:  Apple, IBM, Intel
d <- rd("StockPrice")
# only plot Apple
XY(Month, Price, filter=(Company=="Apple"))
# Trellis plots, one for each company
XY(Month, Price, facet=Company, n_col=1)
# all three plots on the same panel, three shades of blue
XY(Month, Price, by=Company, color="blues")
# exponential smoothing forecast for next 12 months, 
#   aggregate monthly data by mean over quarters
XY(Month, Price, ts_ahead=12, ts_unit="quarters")

#------------------------------------------
# analysis of a single categorical variable
#------------------------------------------
d <- rd("Employee")

# default 1-D bubble plot
# frequency plot, replaces bar chart
XY(Dept)



# plot of frequencies for each category (level), replaces bar chart
XY(Dept, stat_x="count")


#----------------------------------------------------
# scatterplot of numeric against categorical variable
#----------------------------------------------------

# generate a chart with the plotted mean of each level
# rotate x-axis labels and then offset from the axis
style(rotate_x=45, offset=1)
XY(Dept, Salary)
style()


#-------------------
# Cleveland dot plot
#-------------------

# standard scatterplot
XY(Salary, row_names, segments_y=FALSE)

# Cleveland dot plot with two x-variables
XY(c(Pre, Post), row_names)


#------------
# annotations
#------------

# add text at the one location specified by x1 and x2
XY(Years, Salary, add="Hi There", x1=12, y1=80000)
# add text at three different specified locations
XY(Years, Salary, add="Hi", x1=c(12, 16, 18), y1=c(80000, 100000, 60000))

# add three different text blocks at three different specified locations
XY(Years, Salary, add=c("Hi", "Bye", "Wow"), x1=c(12, 16, 18),
  y1=c(80000, 100000, 60000))

# add an 0.95 data ellipse and horizontal and vertical lines through the
#  respective means
XY(Years, Salary, ellipse=0.95, add=c("v_line", "h_line"),
  x1="mean_x", y1="mean_y")
# can be done also with the following short-hand
XY(Years, Salary, ellipse=0.95, add="means")

# a rectangle requires two points, four coordinates, <x1,y1> and <x2,y2>
style(add_trans=.8, add_fill="gold", add_color="gold4", add_lwd=0.5)
XY(Years, Salary, add="rect", x1=12, y1=80000, x2=16, y2=115000)

# the first object, a rectangle, requires all four coordinates
# the vertical line at x=2 requires only an x1 coordinate, listed 2nd
XY(Years, Salary, add=c("rect", "v_line"), x1=c(10, 2),
  y1=80000, x2=12, y2=115000)

# two different rectangles with different locations, fill colors and translucence
style(add_fill=c("gold3", "green"), add_trans=c(.8,.4))
XY(Years, Salary, add=c("rect", "rect"),
  x1=c(10, 2), y1=c(60000, 45000), x2=c(12, 75000), y2=c(80000, 55000))


#----------------------------------------------------
# analysis of two categorical variables (Likert data)
#----------------------------------------------------

d <- Read("Mach4", quiet=TRUE)  # Likert data, 0 to 5
XY(m06, m07)




#---------------
# function curve
#---------------

x <- seq(10,50,by=2)
y1 <- sqrt(x)
y2 <- x**.33
# x is sorted with equal intervals so run chart by default
XY(x, y1)

# multiple plots from variable vectors need to have the variables
#  in a data frame
d <- data.frame(x, y1, y2)
# if variables are in the user workspace and in a data frame
#   with the same names, the user workspace versions are used,
#   which do not work with vectors of variables, so remove
rm(x); rm(y1); rm(y2)
XY(x, c(y1, y2))


lessR documentation built on Dec. 11, 2025, 5:07 p.m.