smean.sd: Compute Summary Statistics on a Vector
In harrelfe/Hmisc: Harrell Miscellaneous

smean.sd

R Documentation

Compute Summary Statistics on a Vector

Description

A number of statistical summary functions is provided for use with summary.formula and summarize (as well as tapply and by themselves). smean.cl.normal computes 3 summary variables: the sample mean and lower and upper Gaussian confidence limits based on the t-distribution. smean.sd computes the mean and standard deviation. smean.sdl computes the mean plus or minus a constant times the standard deviation. smean.cl.boot is a very fast implementation of the basic nonparametric bootstrap for obtaining confidence limits for the population mean without assuming normality. These functions all delete NAs automatically. smedian.hilow computes the sample median and a selected pair of outer quantiles having equal tail areas.

Usage

smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)

smean.sd(x, na.rm=TRUE)

smean.sdl(x, mult=2, na.rm=TRUE)

smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)

smedian.hilow(x, conf.int=.95, na.rm=TRUE)

Arguments

`x`	for summary functions `smean.*`, `smedian.hilow`, a numeric vector from which NAs will be removed automatically
`na.rm`	defaults to `TRUE` unlike built-in functions, so that by default `NA`s are automatically removed
`mult`	for `smean.cl.normal` is the multiplier of the standard error of the mean to use in obtaining confidence limits of the population mean (default is appropriate quantile of the t distribution). For `smean.sdl`, `mult` is the multiplier of the standard deviation used in obtaining a coverage interval about the sample mean. The default is `mult=2` to use plus or minus 2 standard deviations.
`conf.int`	for `smean.cl.normal` and `smean.cl.boot` specifies the confidence level (0-1) for interval estimation of the population mean. For `smedian.hilow`, `conf.int` is the coverage probability the outer quantiles should target. When the default, 0.95, is used, the lower and upper quantiles computed are 0.025 and 0.975.
`B`	number of bootstrap resamples for `smean.cl.boot`
`reps`	set to `TRUE` to have `smean.cl.boot` return the vector of bootstrapped means as the `reps` attribute of the returned object

Value

a vector of summary statistics

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

Examples

set.seed(1)
x <- rnorm(100)
smean.sd(x)
smean.sdl(x)
smean.cl.normal(x)
smean.cl.boot(x)
smedian.hilow(x, conf.int=.5)  # 25th and 75th percentiles

# Function to compute 0.95 confidence interval for the difference in two means
# g is grouping variable
bootdif <- function(y, g) {
 g <- as.factor(g)
 a <- attr(smean.cl.boot(y[g==levels(g)[1]], B=2000, reps=TRUE),'reps')
 b <- attr(smean.cl.boot(y[g==levels(g)[2]], B=2000, reps=TRUE),'reps')
 meandif <- diff(tapply(y, g, mean, na.rm=TRUE))
 a.b <- quantile(b-a, c(.025,.975))
 res <- c(meandif, a.b)
 names(res) <- c('Mean Difference','.025','.975')
 res
}

harrelfe/Hmisc documentation built on June 13, 2025, 7:22 a.m.