distr.plot.x: Analysis of a univariate distribution using plots

View source: R/UBStats_Main_Visible_ALL_202406.R

distr.plot.xR Documentation

Analysis of a univariate distribution using plots

Description

distr.plot.x() generates plots of a univariate distribution.

Usage

distr.plot.x(
  x,
  freq = "counts",
  plot.type,
  ord.freq = "none",
  breaks,
  adj.breaks = TRUE,
  interval = FALSE,
  bw = FALSE,
  color = NULL,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

x

An unquoted string identifying the variable whose distribution has to be analysed. x can be the name of a vector or a factor in the workspace or the name of one of the columns in the data frame specified in the data argument.

freq

A single character specifying the frequencies to be displayed. Allowed options (possibly abbreviated) are "counts", "percentages", "proportions", "densities" (for histograms and density plots).

plot.type

A single character specifying the type of plot to build. Allowed options are "pie", "bars", "spike", "histogram", "density", "boxplot", and "cumulative".

ord.freq

A single character vector that can be specified when plot.type = "pie" or plot.type = "bars". It indicates whether the levels of x should be displayed in a standard order (ord.freq = "none", the default) or in an increasing or decreasing order (ord.freq = "increasing" or ord.freq = "decreasing").

breaks

Allows to classify a numerical variable x into intervals. It can be an integer indicating the number of intervals of equal width used to classify x, or a vector of increasing numeric values defining the endpoints of intervals (closed on the left and open on the right; the last interval is closed on the right too). To cover the entire range of values the maximum and the minimum values should be included between the first and the last break. It is possible to specify a set of breaks covering only a portion of the x range.

adj.breaks

Logical value indicating whether the endpoints of intervals of a numerical variable x when classified into intervals should be displayed avoiding scientific notation; default to TRUE.

interval

Logical value indicating whether x is a variable measured in intervals (TRUE). If the detected intervals are not consistent (e.g. overlapping intervals, or intervals with upper endpoint higher than the lower one), the variable is analyzed as it is, even if results are not necessarily consistent; default to FALSE.

bw

Logical value indicating whether plots should be colored in scale of greys (TRUE) rather than using a standard palette (FALSE, default).

color

Optional string vector allowing to specify colors to use in the plot rather than a standard palette (NULL, default).

use.scientific

Logical value indicating whether numbers on axes should be displayed using scientific notation (TRUE); default to FALSE.

data

An optional data frame containing x. If not found in data, x is taken from the environment from which distr.plot.x() is called.

...

Additional arguments to be passed to low level functions.

Value

No return value, called for side effects.

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

distr.table.x() for tabulating a univariate distribution.

distr.table.xy() for tabulating a bivariate distribution.

distr.plot.xy() for plotting a bivariate distribution.

Examples

data(MktDATA, package = "UBStats")

# Pie charts 
# - A character variable: grey scale
distr.plot.x(x = LikeMost, plot.type = "pie", bw = TRUE, data = MktDATA)
# - A discrete numeric variable: user-defined palette
distr.plot.x(x = Children, plot.type = "pie", 
             color=c("red","gold","green","forestgreen"),
             data = MktDATA)

# Bar charts 
# - A factor: standard order of levels 
distr.plot.x(x = Education, plot.type = "bars", 
             freq = "percentage", data = MktDATA)
# - A factor: levels arranged by decreasing percentage 
distr.plot.x(x = Education, plot.type = "bars", 
             freq = "perc", ord.freq = "dec", data = MktDATA)
# - A discrete variable (note: distance between values
#   not taken into account)
distr.plot.x(x = NPickUp_Purch, plot.type = "bars",
             freq = "percentage", data = MktDATA)

# Spike plots 
# - A discrete variable
distr.plot.x(x = NPickUp_Purch, plot.type = "spike", 
             freq = "percent", data = MktDATA)
# - A factor (levels placed at the same distance)
distr.plot.x(x = Education, plot.type = "spike", 
             freq = "prop",data = MktDATA)
# - A variable measured in classes (levels placed at the 
#   same distance)
distr.plot.x(x = Income.S, interval = TRUE,
             plot.type = "spike", 
             freq = "prop",data = MktDATA)
# - A numeric variable classified into intervals
#   (levels placed at the same distance)
distr.plot.x(x = AOV, breaks = 5, plot.type = "spike", 
             data = MktDATA)

# Cumulative distribution plots
# - A discrete variable
distr.plot.x(x = Children, plot.type = "cum", data = MktDATA)
# - A continuous numerical variable 
distr.plot.x(x = AOV, plot.type = "cum", 
             freq = "perc", data = MktDATA)
# - A numeric variable classified into intervals
distr.plot.x(AOV, plot.type = "cum", 
             breaks = c(0,20,40,60,80,100,180), data = MktDATA)
# - A variable measured in classes
distr.plot.x(Income, plot.type = "cum", interval = TRUE, 
             freq = "percent", data = MktDATA)
# - A factor
distr.plot.x(x = Education, plot.type = "cum", 
             freq = "prop",data = MktDATA)

# Histograms 
# - A continuous numerical variable: no breaks provided
#    default classes built by R
distr.plot.x(x = AOV, plot.type = "histogram", data = MktDATA)
# - A continuous numerical variable: equal width intervals
distr.plot.x(x = AOV, plot.type = "histogram", 
             breaks = 10, data = MktDATA)
# - A continuous numerical variable: specified breaks
distr.plot.x(AOV, plot.type = "histogram", 
             breaks = c(0,20,40,60,80,100,180), 
             data = MktDATA)
# - A variable measured in classes
distr.plot.x(Income, plot.type = "histogram", 
             interval = TRUE, data = MktDATA)

# Density plots 
# - A  numerical variable
distr.plot.x(x = AOV, plot.type = "density", data = MktDATA)
# - A  numerical variable: breaks are ignored
distr.plot.x(AOV, plot.type = "density", 
             breaks = c(0,20,40,60,80,100,180), 
             data = MktDATA)
# - A variable measured in classes
distr.plot.x(Income, plot.type = "density", 
             interval = TRUE, data = MktDATA)

# Boxplots (only for numerical unclassified variables)
# - A  numerical variable
distr.plot.x(x = TotVal, plot.type = "boxplot", data = MktDATA)
# - A  numerical variable: with specified breaks
#   the plot is not built
# distr.plot.x(AOV, plot.type = "boxplot", 
#              breaks = c(0,20,40,60,80,100,180), 
#              data = MktDATA)

# Arguments adj.breaks, use.scientific
#  A variable with a very wide range (very small densities)
LargeX<-MktDATA$AOV*5000000 
#  - Default formatting for intervals' endpoints
distr.plot.x(LargeX, breaks = 5, plot.type = "spike")
#  - Scientific notation for intervals' endpoints
distr.plot.x(LargeX, breaks = 5,plot.type = "spike",
             adj.breaks = FALSE)
#  - Default formatting for axes
distr.plot.x(LargeX, breaks = 5,plot.type = "histogram",
             freq = "densities")
#  - Scientific notation for axes
distr.plot.x(LargeX, breaks = 5,plot.type = "histogram",
             freq = "densities",use.scientific = TRUE)


UBStats documentation built on Sept. 11, 2024, 6:52 p.m.