distr.summary.x: Summary statistics for a single variable

View source: R/UBStats_Main_Visible_ALL_202406.R

distr.summary.xR Documentation

Summary statistics for a single variable

Description

distr.summary.x() computes summary statistics of a vector or a factor.

Usage

distr.summary.x(
  x,
  stats = c("summary"),
  by1,
  by2,
  breaks.by1,
  interval.by1 = FALSE,
  breaks.by2,
  interval.by2 = FALSE,
  adj.breaks = TRUE,
  digits = 2,
  f.digits = 4,
  force.digits = FALSE,
  use.scientific = FALSE,
  data,
  ...
)

Arguments

x

An unquoted string identifying the variable whose distribution has to be summarized. x can be the name of a vector or a factor in the workspace or the name of one of the columns in the data frame specified in the data argument.

stats

A character vector specifying the summary statistics to compute (more summaries can be specified). Specific types of summaries can be requested with the following options:

  • "summary": min, q1, median, mean, q3, max, sd, var;

  • "central": central tendency measures;

  • "dispersion": measures of dispersion;

  • "fivenumbers": five-number summary;

  • "quartiles", "quintiles", "deciles", "percentiles": set of quantiles.

It is also possible to request the following statistics: "q1", "q2", "q3", "mean", "median", "mode" (which returns the mode, the number of modes and the proportion of cases with modal value respectively), "min", "max", "sd", "var", "cv" (coefficient of variation), "range", "IQrange" (interquartile range), and "p1", "p2",..., "p100" (i.e. specific percentiles).

by1, by2

Unquoted strings identifying optional variables (typically taking few values/levels) used to build conditional summaries, that can be defined same way as x.

breaks.by1, breaks.by2

Allow classifying the variables by1 and/or by2, if numerical, into intervals. They can be integers indicating the number of intervals of equal width used to classify by1 and/or by2, or vectors of increasing numeric values defining the endpoints of intervals (closed on the left and open on the right; the last interval is closed on the right too). To cover the entire range of values the maximum and the minimum values should be included between the first and the last break. It is possible to specify a set of breaks covering only a portion of the range of by1 and/or by2.

interval.by1, interval.by2

Logical values indicating whether by1 and/or by2 are variables measured in classes (TRUE). If the intervals for one variable are not consistent (e.g. overlapping intervals, or intervals with upper endpoint higher than the lower one), the variable is analysed as it is, even if results are not necessarily consistent; default to FALSE.

adj.breaks

Logical value indicating whether the endpoints of intervals of the numerical variables by1 or by2, when classified into intervals, should be displayed avoiding scientific notation; default to TRUE.

digits, f.digits

Integer values specifying the number of decimals used to round respectively summary statistics (default: digits=4) and proportions percentages (default: f.digits=2). If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument force.digits is set to TRUE.

force.digits

Logical value indicating whether the requested summaries should be forcedly rounded to the number of decimals specified in digits and f.digits even if non-zero values are rounded to zero (default to FALSE).

use.scientific

Logical value indicating whether numbers in tables should be displayed using scientific notation (TRUE); default to FALSE.

data

An optional data frame containing x and/or the variables specifying the layers, by1 and by2. If not found in data, the variables are taken from the environment from which distr.summary.x() is called.

...

Additional arguments to be passed to low level functions.

Value

A list whose elements are tables (converted to dataframes) with the requested summaries, possibly conditioned to by1 and/or by2. The values taken by the conditioning variables are arranged in standard order (logical, alphabetical or numerical order for vectors, order of levels for factors, ordered intervals for classified variables or for variables measured in classes).

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

summaries.plot.x() to graphically display conditioned tendency summaries of a univariate distribution.

distr.table.x() for tabulating a univariate distribution.

distr.plot.x() for plotting a univariate distribution.

Examples

data(MktDATA, package = "UBStats")

# Marginal summaries
# - Numerical variable: Default summaries
distr.summary.x(x = AOV, data = MktDATA)
# - Numerical variable: More summaries
distr.summary.x(x = AOV, 
                stats = c("central","dispersion","fivenum"),
                data = MktDATA)
distr.summary.x(x = AOV, stats = c("mode","mean","sd","cv","fivenum"),
                data = MktDATA)
# - Character or factor (only proper statistics calculated)
distr.summary.x(x = LikeMost, stats = c("mode","mean","sd","cv","fivenum"),
                data = MktDATA)
distr.summary.x(x = Education, stats = c("mode","mean","sd","cv","fivenum"),
                data = MktDATA)

# Measures conditioned to a single variable
# - Numerical variable by a character vector
distr.summary.x(x = TotVal, 
                stats = c("p5","p10","p25","p50","p75","p90","p95"),
                by1 = Gender, digits = 1, data = MktDATA)
# - Numerical variable by a numerical variable
#   classified into intervals
distr.summary.x(x = TotVal, 
                stats = c("central","dispersion"),
                by1 = AOV, breaks.by1 = 5,
                digits = 1, data = MktDATA)
# - Numerical variable by a variable measured in classes
distr.summary.x(x = TotVal, 
                stats = c("central","dispersion"),
                by1 = Income.S, 
                interval.by1 = TRUE,
                digits = 1, data = MktDATA)

# Measures conditioned to two variables
distr.summary.x(x = TotVal, stats = "fivenumbers", 
                by1 = Gender, by2 = Kids, data = MktDATA)
distr.summary.x(x = TotVal, stats = "fivenumbers", 
                by1 = Income.S, by2 = Gender,
                interval.by1 = TRUE, data = MktDATA)
distr.summary.x(x = TotVal, stats = "fivenumbers",
                by1 = Gender, by2 = AOV,
                breaks.by2 = 5, data = MktDATA)

# Arguments adj.breaks and use.scientific
#  Variables with a very wide range
LargeX<-MktDATA$TotVal*1000000
LargeBY<-MktDATA$AOV*5000000 
#  - Default: no scientific notation
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                data = MktDATA)
#  - Scientific notation for summaries 
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                use.scientific = TRUE, data = MktDATA)
#  - Scientific notation for intervals endpoints
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                adj.breaks = FALSE, data = MktDATA)
#  - Scientific notation for intervals endpoints and summaries
distr.summary.x(LargeX, by1=LargeBY, breaks.by1 = 5, 
                adj.breaks = FALSE, use.scientific = TRUE,
                data = MktDATA)

# Output the list with the requested summaries
Out_TotVal<-distr.summary.x(x = TotVal, 
                            by1 = Income.S, by2 = Gender,
                            interval.by1 = TRUE,
                            stats = c("central","fivenum","dispersion"),
                            data = MktDATA)


UBStats documentation built on Sept. 11, 2024, 6:52 p.m.