descript: Compute univariate descriptive statistics

View source: R/descript.R

descriptR Documentation

Compute univariate descriptive statistics

Description

Function returns univariate data summaries for each variable supplied, however discrete and continuous variables are treated separately. Structure provides a more pipe-friendly API for selecting and subsetting variables using the dplyr syntax, however conditional statistics are evaluated internally using the by function. Quantitative/continuous variable information is kept distinct in the output, while discrete variables (e.g., factors and character vectors) can be returned by using the discrete argument.

Usage

descript(df, funs = get_descriptFuns(), discrete = FALSE)

get_descriptFuns()

Arguments

df

a data.frame or tibble-like structure containing the variables of interest. Note that factor and character vectors will be treated as discrete observations, and by default are omitted from the computation of the descriptive statistics specified in funs

funs

functions to apply when discrete = FALSE. Can be modified by the user to include or exclude further functions, however each supplied function must return a scalar. Use get_discreteFuns() to return the full list of functions, which may then be augmented or subsetted based on the user's requirements. Default descriptive statistic returned are:

n

number of non-missing observations

miss

number of missing observations

mean

mean

trimmed

trimmed mean (10%)

sd

standard deviation

mad

mean absolute deviation

skewness

skewness (from e1701)

kurtosis

kurtosis (from e1071)

min

minimum

Q_25

25% quantile

Q_50

50% quantile (a.k.a., the median)

Q_75

75% quantile

max

maximum

discrete

logical; include summary statistics for discrete variables only? If TRUE then only count and proportion information will be returned

Details

Conditioning: As the function is intended to support pipe-friendly code specifications, conditioning/group subset specifications are declared using group_by and subsequently passed to descript. This is true of all the verbs available in dplyr.

See Also

summarise, group_by

Examples


library(dplyr)

data(mtcars)

if(FALSE){
  # run the following to see behavior with NA values in dataset
  mtcars[sample(1:nrow(mtcars), 3), 'cyl'] <- NA
  mtcars[sample(1:nrow(mtcars), 5), 'mpg'] <- NA
}

fmtcars <- within(mtcars, {
	cyl <- factor(cyl)
	am <- factor(am, labels=c('automatic', 'manual'))
	vs <- factor(vs)
})

# with and without factor variables
mtcars |> descript()
fmtcars |> descript()               # factors/discrete vars omitted
fmtcars |> descript(discrete=TRUE)  # discrete variables only

# usual pipe chaining
fmtcars |> select(mpg, wt) |> descript()
fmtcars |> filter(mpg > 20) |> select(mpg, wt) |> descript()

# conditioning with group_by()
fmtcars |> group_by(cyl) |> descript()
fmtcars |> group_by(cyl, am) |> descript()

# conditioning also works with group_by()
fmtcars |> group_by(cyl) |> descript(discrete=TRUE)
fmtcars |> group_by(am) |> descript(discrete=TRUE)
fmtcars |> group_by(cyl, am) |> descript(discrete=TRUE)

# only return a subset of summary statistics
funs <- get_descriptFuns()
sfuns <- funs[c('mean', 'sd')] # subset
fmtcars |> descript(funs=sfuns) # only mean/sd

# add a new functions
funs2 <- c(sfuns,
           Q_5 = \(x) quantile(x, .05, na.rm=TRUE),
           median= \(x) median(x, na.rm=TRUE),
           Q_95 = \(x) quantile(x, .95, na.rm=TRUE))
fmtcars |> descript(funs=funs2)


SimDesign documentation built on Feb. 10, 2026, 9:07 a.m.