Compute Summary Statistics on a Vector

View source: R/summary.formula.s

smean.sdR Documentation

Compute Summary Statistics on a Vector


A number of statistical summary functions is provided for use with summary.formula and summarize (as well as tapply and by themselves). computes 3 summary variables: the sample mean and lower and upper Gaussian confidence limits based on the t-distribution. computes the mean and standard deviation. smean.sdl computes the mean plus or minus a constant times the standard deviation. is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality. These functions all delete NAs automatically. smedian.hilow computes the sample median and a selected pair of outer quantiles having equal tail areas.

Usage, mult=qt((,n-1),, na.rm=TRUE), na.rm=TRUE)

smean.sdl(x, mult=2, na.rm=TRUE),, B=1000, na.rm=TRUE, reps=FALSE)

smedian.hilow(x,, na.rm=TRUE)



for summary functions smean.*, smedian.hilow, a numeric vector from which NAs will be removed automatically


defaults to TRUE unlike built-in functions, so that by default NAs are automatically removed


for is the multiplier of the standard error of the mean to use in obtaining confidence limits of the population mean (default is appropriate quantile of the t distribution). For smean.sdl, mult is the multiplier of the standard deviation used in obtaining a coverage interval about the sample mean. The default is mult=2 to use plus or minus 2 standard deviations.

for and specifies the confidence level (0-1) for interval estimation of the population mean. For smedian.hilow, is the coverage probability the outer quantiles should target. When the default, 0.95, is used, the lower and upper quantiles computed are 0.025 and 0.975.


number of bootstrap resamples for


set to TRUE to have return the vector of bootstrapped means as the reps attribute of the returned object


a vector of summary statistics


Frank Harrell
Department of Biostatistics
Vanderbilt University

See Also

summarize, summary.formula


x <- rnorm(100)
smedian.hilow(x,  # 25th and 75th percentiles

# Function to compute 0.95 confidence interval for the difference in two means
# g is grouping variable
bootdif <- function(y, g) {
 g <- as.factor(g)
 a <- attr([g==levels(g)[1]], B=2000, reps=TRUE),'reps')
 b <- attr([g==levels(g)[2]], B=2000, reps=TRUE),'reps')
 meandif <- diff(tapply(y, g, mean, na.rm=TRUE))
 a.b <- quantile(b-a, c(.025,.975))
 res <- c(meandif, a.b)
 names(res) <- c('Mean Difference','.025','.975')

harrelfe/Hmisc documentation built on May 19, 2024, 4:13 a.m.