descstats: Descriptive Statistics
In groupcompare: Comparing Two Groups Using Various Descriptive Statistics

descstats

R Documentation

Descriptive Statistics

Description

This function calculates various descriptive statistics for a given numeric vector. These statistics include measures of central tendency, dispersion, skewness, kurtosis, and some robust estimators.

Usage

descstats(x, trim = 0.1, k = 1.5)

Arguments

`x`	A numeric vector.
`trim`	The fraction (0 to 0.5) of observations to be trimmed from each end of the vector is used to calculate the trimmed mean and winsorized mean. The default is 0.1.
`k`	The robustness parameter for the Huber M-estimator. The default is 1.5.

Details

In order to determine an appropriate k value for th Huber M-estimator some experiments might be needed. In the literature, commonly used k values typically range from 1.5 to 2. Users can start by choosing any value within this range. However, to determine an appropriate k within a given range, it is also selected by performing Huber estimations for each k value within this range, as shown in the example below. In the output, the estimated Huber M-estimator values can be checked on a plot. Select k values where a smooth trend or plateau is reached. If the Huber M-estimator values stabilize after a certain k value, that k value may be appropriate. Finally, if there are outliers and you want to reduce their impact, you can use smaller k values.

Value

A list containing the computed descriptive statistics, including:

`n`	The number of observations
`min`	The minimum value
`max`	The maximum value
`mean`	The mean
`se`	The standard error of the mean
`sd`	The standard deviation
`trmean`	The trimmed mean
`med`	The median
`mad`	The median absolute deviation (MAD), a robust statistic for measuring variability in data.
`skew`	The skewness
`kurt`	The excess kurtosis measures how peaked or flat a distribution is compared to a normal distribution. Subtracting 3 centers the measure relative to the kurtosis of a normal distribution, which is always 3.
`winsmean`	The Winsorized mean
`hubermean`	The Huber's M-estimator of location
`range`	The range
`iqr`	The interquartile range

Author(s)

Zeynel Cebeci, A. Firat Ozdemir, Engin Yildiztepe

Examples

set.seed(123)
x <- rnorm(100, mean=50, sd=5)
descriptives <- descstats(x)
as.data.frame(descriptives)

descriptives$mean
descriptives$se

# Determining the appropriate k in a given set of different k values. 
# This parameter is used to calculate the Huber M-estimator of the location
# Array of k values for testing
k <- seq(0, 5, by = 0.1)
k <- k[k> 0]
result <- sapply(k, function(y) descstats(x, k = y)$hubermean)
names(result) <- paste0("k=", k)
result

plot(k, result, type = "b", col = "blue", pch = 19, ylab = "Huber's mean")

descstats(x, k=2, trim=0.05)$hubermean

groupcompare documentation built on June 26, 2025, 1:08 a.m.