distr.table.x: Analysis of a univariate distribution using frequency tables

View source: R/UBStats_Main_Visible_ALL_202406.R

distr.table.xR Documentation

Analysis of a univariate distribution using frequency tables

Description

distr.table.x() computes the frequency table of a vector or a factor.

Usage

distr.table.x(
  x,
  freq = c("counts", "proportions"),
  total = TRUE,
  breaks,
  adj.breaks = TRUE,
  interval = FALSE,
  f.digits = 2,
  p.digits = 0,
  d.digits = 5,
  force.digits = FALSE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

x

An unquoted string identifying the variable whose distribution has to be analysed. x can be the name of a vector or a factor in the workspace or the name of one of the columns in the data frame specified in the data argument.

freq

A character vector specifying the set of frequencies to be displayed (more options are allowed). Allowed options (possibly abbreviated) are "counts", "percentages", "proportions", "densities" (only for variables classified into intervals or measured in classes), and "cumulative". If no frequency is specified, "counts" and "proportions" are displayed by default. If only "cumulative" is specified, counts and proportions are displayed too, with their respective cumulative frequencies.

total

Logical value indicating whether the sum of the requested frequencies should be added to the table; default to TRUE.

breaks

Allows to classify a numerical variable x into intervals. It can be an integer indicating the number of intervals of equal width used to classify x, or a vector of increasing numeric values defining the endpoints of intervals (closed on the left and open on the right; the last interval is closed on the right too). To cover the entire range of values the maximum and the minimum values should be included between the first and the last break. It is possible to specify a set of breaks covering only a portion of the x range.

adj.breaks

Logical value indicating whether the endpoints of intervals of a numerical variable x when classified into intervals should be displayed avoiding scientific notation; default to TRUE.

interval

Logical value indicating whether x is a variable measured in intervals (TRUE). If the detected intervals are not consistent (e.g. overlapping intervals, or intervals with upper endpoint higher than the lower one), the variable is tabulated as it is, even if results are not necessarily consistent; default to FALSE.

f.digits, p.digits, d.digits

Integer values specifying the number of decimals used to round respectively proportions (default: f.digits=2), percentages (default: p.digits=0), and densities (default: d.digits=5). If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument force.digits is set to TRUE.

force.digits

Logical value indicating whether frequencies and densities should be forcedly rounded to the number of decimals specified in f.digits, p.digits, and d.digits even if non-zero values are rounded to zero (default to FALSE).

use.scientific

Logical value indicating whether numbers in tables (typically densities) should be displayed using scientific notation (TRUE); default to FALSE.

data

An optional data frame containing x. If not found in data, x is taken from the environment from which distr.table.x() is called.

...

Additional arguments to be passed to low level functions.

Value

A table (converted to dataframe) listing the values taken by the variable, arranged in standard order (logical, alphabetical or numerical order for vectors, order of levels for factors, ordered intervals for classified variables or for variables measured in classes), and the requested set of frequencies.

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

distr.plot.x() for plotting a univariate distribution.

distr.table.xy() for tabulating a bivariate distribution.

distr.plot.xy() for plotting a bivariate distribution.

Examples

data(MktDATA, package = "UBStats")

# Character vectors, factors, and discrete numeric vectors
distr.table.x(Education, data = MktDATA)

distr.table.x(Children, freq = c("count","prop","cum"),
              data = MktDATA)

# Numerical variable classified into intervals
# - Classes of equal width
distr.table.x(AOV, breaks = 6, freq = c("Count","Prop","Perc","Cum"),
              p.digits = 2, data = MktDATA)
# - Classes with specified endpoints
distr.table.x(AOV, breaks = c(0,20,30,50,100,180),
              freq = c("Count","Perc","Cum","Densities"), 
              p.digits = 2, data = MktDATA)
# Numerical variable measured in classes
# - Variable measured in classes
distr.table.x(Income, freq = c("count","prop","cum","dens"),
              interval = TRUE, data = MktDATA)
# - An example of non-consistent intervals. 
#   Densities are not calculated
x.inconsistent <- c(rep("0;10",30),rep("10;20",25),rep("25;8",25),
                    rep("15;31",15),rep("20;45",16),rep("30;40",18))
distr.table.x(x.inconsistent, freq = c("count","prop","cum","dens"),
              interval = TRUE)

# Arguments adj.breaks, use.scientific, and force.digits
#  A variable with a very wide range (very small densities)
LargeX <- MktDATA$AOV*5000000 
# - Default: manages possible excess of rounding
distr.table.x(LargeX, breaks = 5, 
              freq = c("count","percent","densities"))
# - Forcing digits to the default values 
distr.table.x(LargeX, breaks = 5,
              freq=c("count","percent","dens"),
              force.digits = TRUE)
#  - Scientific notation for frequencies/densities 
distr.table.x(LargeX, breaks = 5,
              freq = c("count","percent","dens"),
              use.scientific = TRUE)
#  - Scientific notation both for intervals’ endpoints 
#    and for frequencies/densities
distr.table.x(LargeX, breaks = 5, adj.breaks = FALSE,
              freq = c("count","percent","dens"),
              use.scientific = TRUE)

# Output a dataframe with the table
table.AOV<-distr.table.x(AOV, breaks = c(0,20,30,50,100,180),
                         freq = c("Count","Perc","Cum","Dens"), 
                         data = MktDATA)
                         

UBStats documentation built on Sept. 11, 2024, 6:52 p.m.