distr.plot.xy: Analysis of a bivariate distribution using plots

View source: R/UBStats_Main_Visible_ALL_202406.R

distr.plot.xyR Documentation

Analysis of a bivariate distribution using plots

Description

distr.plot.xy() generates plots of a bivariate distribution.

Usage

distr.plot.xy(
  x,
  y,
  plot.type,
  bar.type = "stacked",
  freq = "counts",
  freq.type = "joint",
  breaks.x,
  breaks.y,
  interval.x = FALSE,
  interval.y = FALSE,
  bw = FALSE,
  color = NULL,
  var.c,
  breaks.c,
  interval.c = FALSE,
  adj.breaks = TRUE,
  fitline = FALSE,
  legend = TRUE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

x, y

Unquoted strings identifying the variables whose distribution has to be graphically displayed. x and y can be the name of a vector or a factor in the workspace or the name of one of the columns in the data frame specified in the data argument. Note that in the plot x is reported on the horizontal axis while y is reported on the vertical axis.

plot.type

A single character specifying the type of plot to build. Allowed options are "bars", "scatter", and "boxplot". If both x and y are character vectors or factors and bar.type = "scatter" a bubble plot is built, with dots having a size proportional to the joint frequency of each pair of observed values. If bar.type = "boxplot", at least one input variable must be numeric; when both the variables are numeric the conditional distributions of y|x are displayed, unless otherwise specified using freq.type="x|y".

bar.type

A single character indicating whether in a bar plot stacked (bar.type = "stacked", default) or side-by-side (bar.type = "beside") bars should be displayed.

freq

A single character specifying the frequencies to be displayed when a bar plot is requested (plot.type="bars"). Allowed options (possibly abbreviated) are "counts", "percentages" and "proportions".

freq.type

A single character specifying the type of frequencies to be displayed when a bar plot is requested (plot.type="bars"). Allowed options are joint (default) for joint frequencies, x|y for the distributions of x conditioned to y, and y|x for the distributions of y conditioned to x. The option x|y can also be used when plot.type="boxplot".

breaks.x, breaks.y

Allow to classify the variables x and/or y, if numerical, into intervals. They can be integers indicating the number of intervals of equal width used to classify x and/or y, or vectors of increasing numeric values defining the endpoints of the intervals (closed on the left and open on the right; the last interval is closed on the right too). To cover the entire range of values taken by one variable, the maximum and the minimum values should be included between the first and the last break. It is possible to specify a set of breaks covering only a portion of the variable's range.

interval.x, interval.y

Logical values indicating whether x and/or y are variables measured in classes (TRUE). If the detected intervals are not consistent (e.g. overlapping intervals, or intervals with upper endpoint higher than the lower one), the variable is analyzed as it is, even if results are not necessarily consistent; default to FALSE.

bw

Logical value indicating whether plots should be colored in scale of greys (TRUE) rather than using a standard palette (FALSE, default).

color

Optional string vector allowing to specify colors to use in the plot rather than a standard palette (NULL, default).

var.c

An optional unquoted string identifying one variable used to color points in a scatter plot (plot.type="scatter"), that can be defined same way as x. This is allowed only when at least one of the input variables x and y is numeric.

breaks.c

Allows to classify the variable var.c, if numerical, into intervals. It can be defined as breaks.x.

interval.c

Logical value indicating whether var.c is a variable measured in intervals (TRUE) or not, as described for interval.x; default to FALSE.

adj.breaks

Logical value indicating whether the endpoints of intervals of a numerical variable (x, or y, or var.c) when classified into intervals should be displayed avoiding scientific notation; default to TRUE.

fitline

Logical value indicating whether the line of best fit (also called trend line or regression line) should be added to a scatter plot (fitline = TRUE) or not (fitline = FALSE; default).

legend

Logical value indicating whether a legend should be displayed in the plot (legend = TRUE; default) or not (legend = FALSE).

use.scientific

Logical value indicating whether numbers on axes should be displayed using scientific notation (TRUE); default to FALSE.

data

An optional data frame containing x and/or y and/or var.c (the variable used to color points in scatter plots). If not found in data, the variables are taken from the environment from which distr.plot.xy() is called.

...

Additional arguments to be passed to low level functions.

Value

No return value, called for side effects.

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

distr.table.xy() for tabulating a bivariate distribution.

distr.table.x() for tabulating a univariate distribution.

distr.plot.x() for plotting a univariate distribution.

Examples

data(MktDATA, package = "UBStats")

# Bivariate bar plots
# - Two discrete variables (factor or vector with few levels)
#   Joint counts
distr.plot.xy(CustClass, Children,plot.type = "bars", 
              freq = "Counts", freq.type = "joint",
              data = MktDATA)
# - Two discrete variables (factor or vector with few levels)
#   Joint percentages, side-by-side bars
#   User-defined colors
distr.plot.xy(Children,CustClass, plot.type = "bars", 
              bar.type = "beside",
              freq = "percent", freq.type = "joint",
              color = c("red","gold","green","forestgreen"),
              data = MktDATA)
# - One numeric variable classified into intervals
#   and one variable measured in classes
#   Conditional percentages of x|y 
distr.plot.xy(TotPurch, Income, plot.type = "bars", 
              freq = "percent",freq.type = "x|y",
              breaks.x = c(0,5,10,15,20,35),
              interval.y = TRUE, data = MktDATA)
#   Conditional percentages of y|x 
distr.plot.xy(TotPurch, Income, plot.type = "bars", 
              freq = "percent",freq.type = "y|x",
              breaks.x = c(0,5,10,15,20,35),
              interval.y = TRUE, data = MktDATA)

# Side-by-side boxplots
# - A continuous variable conditioned to a factor, 
#   a character, or a classified variable
#   The distributions of the numeric variable conditioned
#   to the factor (or character) are displayed
distr.plot.xy(x = AOV, y = Education, plot.type = "boxplot",
              data = MktDATA)
distr.plot.xy(x = Income.S, y = AOV, plot.type = "boxplot",
              interval.x = TRUE, data = MktDATA)
distr.plot.xy(x = Baseline, y = TotPurch, plot.type = "boxplot",
              breaks.y = c(0,5,10,15,20,35),
              data = MktDATA)
# - Two numerical variables. By default distributions 
#   of y|x are displayed unless differently 
#   specified in freq.type
distr.plot.xy(x = NPickUp_Purch, y = NWeb_Purch,
              plot.type = "boxplot", data = MktDATA)
distr.plot.xy(x = NPickUp_Purch, y = NWeb_Purch,
              plot.type = "boxplot",freq.type = "x|y",
              data = MktDATA)

# Scatter plots
# - Two numerical variables: default options
distr.plot.xy(Baseline, TotVal, plot.type = "scatter", 
              fitline = TRUE, data = MktDATA)
# - Two numerical variables: colors based on discrete var 
distr.plot.xy(Baseline, TotVal, plot.type = "scatter", 
              var.c = Marital_Status,  
              fitline = TRUE, data = MktDATA)
distr.plot.xy(Baseline, TotVal, plot.type = "scatter", 
              var.c = Income, interval.c = TRUE, 
              fitline = TRUE, data = MktDATA)
distr.plot.xy(Baseline, TotVal, plot.type = "scatter", 
              var.c = TotPurch, breaks.c = 10, 
              fitline = TRUE, data = MktDATA)
# - Two numerical variables: colors based 
#   on a continuous numerical variable
distr.plot.xy(Baseline, TotVal, plot.type = "scatter", 
              var.c = AOV, fitline = TRUE, data = MktDATA)

# - One numerical variable and one factor or character 
distr.plot.xy(Baseline, Marital_Status, plot.type = "scatter", 
              fitline = TRUE, data = MktDATA)
distr.plot.xy(Income.S, Baseline, plot.type = "scatter", 
              interval.x = TRUE,
              fitline = TRUE, data = MktDATA)
#   color based on a third variable
distr.plot.xy(TotPurch, TotVal, plot.type = "scatter", 
              breaks.x = c(0,5,10,15,20,35),
              var.c = AOV,
              fitline = TRUE, data = MktDATA)

# - Two factors or character vectors: bubble plots
distr.plot.xy(Education, LikeMost, plot.type = "scatter", 
              data = MktDATA)
# - Two classified variables (i.e. not properly numerical): 
#   bubble plots, changed color
distr.plot.xy(Income.S, TotPurch, plot.type = "scatter",
              interval.x = TRUE,
              breaks.y = c(0,5,10,15,20,35),
              color = "orchid", data = MktDATA)

# Arguments adj.breaks and use.scientific 
#  Variable with very wide ranges
LargeC<-MktDATA$AOV*5000000 
LargeX<-MktDATA$Baseline*1000000 
LargeY<-MktDATA$TotVal*1000000
#  - Default: no scientific notation
distr.plot.xy(LargeX, LargeY, plot.type = "scatter", 
              var.c = LargeC, data = MktDATA)
distr.plot.xy(LargeX, LargeY, plot.type = "scatter", 
              breaks.x = 10, var.c = LargeC, 
              data = MktDATA)
#  - Scientific notation for axes 
distr.plot.xy(LargeX, LargeY, plot.type = "scatter", 
              breaks.x = 10, var.c = LargeC, 
              use.scientific = TRUE,
              data = MktDATA)
#  - Scientific notation for intervals' endpoints
distr.plot.xy(LargeX, LargeY, plot.type = "scatter", 
              breaks.x = 10, var.c = LargeC, 
              adj.breaks = FALSE,
              data = MktDATA)
#  - Scientific notation for intervals endpoints and axes
distr.plot.xy(LargeX, LargeY, plot.type = "scatter", 
              var.c = LargeC, fitline = TRUE, 
              adj.breaks = FALSE, use.scientific = TRUE,
              data = MktDATA)
distr.plot.xy(LargeX, LargeY, plot.type = "scatter", 
              breaks.x = 10, var.c = LargeC, 
              adj.breaks = FALSE, use.scientific = TRUE,
              data = MktDATA)


UBStats documentation built on Sept. 11, 2024, 6:52 p.m.