distr.summary.x: Summary statistics for a single variable
In UBStats: Basic Statistics

View source: R/UBStats_Main_Visible_ALL_202406.R

distr.summary.x

R Documentation

Summary statistics for a single variable

Description

distr.summary.x() computes summary statistics of a vector or a factor.

Usage

distr.summary.x(
  x,
  stats = c("summary"),
  by1,
  by2,
  breaks.by1,
  interval.by1 = FALSE,
  breaks.by2,
  interval.by2 = FALSE,
  adj.breaks = TRUE,
  digits = 2,
  f.digits = 4,
  force.digits = FALSE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

`x`	An unquoted string identifying the variable whose distribution has to be summarized. `x` can be the name of a vector or a factor in the workspace or the name of one of the columns in the data frame specified in the `data` argument.
`stats`	A character vector specifying the summary statistics to compute (more summaries can be specified). Specific types of summaries can be requested with the following options: `"summary"`: min, q1, median, mean, q3, max, sd, var; `"central"`: central tendency measures; `"dispersion"`: measures of dispersion; `"fivenumbers"`: five-number summary; `"quartiles"`, `"quintiles"`, `"deciles"`, `"percentiles"`: set of quantiles. It is also possible to request the following statistics: `"q1"`, `"q2"`, `"q3"`, `"mean"`, `"median"`, `"mode"` (which returns the mode, the number of modes and the proportion of cases with modal value respectively), `"min"`, `"max"`, `"sd"`, `"var"`, `"cv"` (coefficient of variation), `"range"`, `"IQrange"` (interquartile range), and `"p1"`, `"p2"`,..., `"p100"` (i.e. specific percentiles).
`by1`, `by2`	Unquoted strings identifying optional variables (typically taking few values/levels) used to build conditional summaries, that can be defined same way as `x`.
`breaks.by1`, `breaks.by2`	Allow classifying the variables `by1` and/or `by2`, if numerical, into intervals. They can be integers indicating the number of intervals of equal width used to classify `by1` and/or `by2`, or vectors of increasing numeric values defining the endpoints of intervals (closed on the left and open on the right; the last interval is closed on the right too). To cover the entire range of values the maximum and the minimum values should be included between the first and the last break. It is possible to specify a set of breaks covering only a portion of the range of `by1` and/or `by2`.
`interval.by1`, `interval.by2`	Logical values indicating whether `by1` and/or `by2` are variables measured in classes (`TRUE`). If the intervals for one variable are not consistent (e.g. overlapping intervals, or intervals with upper endpoint higher than the lower one), the variable is analysed as it is, even if results are not necessarily consistent; default to `FALSE`.
`adj.breaks`	Logical value indicating whether the endpoints of intervals of the numerical variables `by1` or `by2`, when classified into intervals, should be displayed avoiding scientific notation; default to `TRUE`.
`digits`, `f.digits`	Integer values specifying the number of decimals used to round respectively summary statistics (default: `digits=4`) and proportions percentages (default: `f.digits=2`). If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument `force.digits` is set to `TRUE`.
`force.digits`	Logical value indicating whether the requested summaries should be forcedly rounded to the number of decimals specified in `digits` and `f.digits` even if non-zero values are rounded to zero (default to `FALSE`).
`use.scientific`	Logical value indicating whether numbers in tables should be displayed using scientific notation (`TRUE`); default to `FALSE`.
`data`	An optional data frame containing `x` and/or the variables specifying the layers, `by1` and `by2`. If not found in `data`, the variables are taken from the environment from which `distr.summary.x()` is called.
`...`	Additional arguments to be passed to low level functions.

Value

A list whose elements are tables (converted to dataframes) with the requested summaries, possibly conditioned to by1 and/or by2. The values taken by the conditioning variables are arranged in standard order (logical, alphabetical or numerical order for vectors, order of levels for factors, ordered intervals for classified variables or for variables measured in classes).

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

Examples

data(MktDATA, package = "UBStats")

# Marginal summaries
# - Numerical variable: Default summaries
distr.summary.x(x = AOV, data = MktDATA)
# - Numerical variable: More summaries
distr.summary.x(x = AOV, 
                stats = c("central","dispersion","fivenum"),
                data = MktDATA)
distr.summary.x(x = AOV, stats = c("mode","mean","sd","cv","fivenum"),
                data = MktDATA)
# - Character or factor (only proper statistics calculated)
distr.summary.x(x = LikeMost, stats = c("mode","mean","sd","cv","fivenum"),
                data = MktDATA)
distr.summary.x(x = Education, stats = c("mode","mean","sd","cv","fivenum"),
                data = MktDATA)

# Measures conditioned to a single variable
# - Numerical variable by a character vector
distr.summary.x(x = TotVal, 
                stats = c("p5","p10","p25","p50","p75","p90","p95"),
                by1 = Gender, digits = 1, data = MktDATA)
# - Numerical variable by a numerical variable
#   classified into intervals
distr.summary.x(x = TotVal, 
                stats = c("central","dispersion"),
                by1 = AOV, breaks.by1 = 5,
                digits = 1, data = MktDATA)
# - Numerical variable by a variable measured in classes
distr.summary.x(x = TotVal, 
                stats = c("central","dispersion"),
                by1 = Income.S, 
                interval.by1 = TRUE,
                digits = 1, data = MktDATA)

# Measures conditioned to two variables
distr.summary.x(x = TotVal, stats = "fivenumbers", 
                by1 = Gender, by2 = Kids, data = MktDATA)
distr.summary.x(x = TotVal, stats = "fivenumbers", 
                by1 = Income.S, by2 = Gender,
                interval.by1 = TRUE, data = MktDATA)
distr.summary.x(x = TotVal, stats = "fivenumbers",
                by1 = Gender, by2 = AOV,
                breaks.by2 = 5, data = MktDATA)

# Arguments adj.breaks and use.scientific
#  Variables with a very wide range
LargeX<-MktDATA$TotVal*1000000
LargeBY<-MktDATA$AOV*5000000 
#  - Default: no scientific notation
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                data = MktDATA)
#  - Scientific notation for summaries 
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                use.scientific = TRUE, data = MktDATA)
#  - Scientific notation for intervals endpoints
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                adj.breaks = FALSE, data = MktDATA)
#  - Scientific notation for intervals endpoints and summaries
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                adj.breaks = FALSE, use.scientific = TRUE,
                data = MktDATA)

# Output the list with the requested summaries
Out_TotVal<-distr.summary.x(x = TotVal, 
                            by1 = Income.S, by2 = Gender,
                            interval.by1 = TRUE,
                            stats = c("central","fivenum","dispersion"),
                            data = MktDATA)

UBStats documentation built on Sept. 11, 2024, 6:52 p.m.