descstats: Descriptive Statistics

View source: R/descstat.R

descstatsR Documentation

Descriptive Statistics

Description

This function calculates various descriptive statistics for a given numeric vector. These statistics include measures of central tendency, dispersion, skewness, kurtosis, and some robust estimators.

Usage

descstats(x, trim = 0.1, k = 1.5)

Arguments

x

A numeric vector.

trim

The fraction (0 to 0.5) of observations to be trimmed from each end of the vector is used to calculate the trimmed mean and winsorized mean. The default is 0.1.

k

The robustness parameter for the Huber M-estimator. The default is 1.5.

Details

In order to determine an appropriate k value for th Huber M-estimator some experiments might be needed. In the literature, commonly used k values typically range from 1.5 to 2. Users can start by choosing any value within this range. However, to determine an appropriate k within a given range, it is also selected by performing Huber estimations for each k value within this range, as shown in the example below. In the output, the estimated Huber M-estimator values can be checked on a plot. Select k values where a smooth trend or plateau is reached. If the Huber M-estimator values stabilize after a certain k value, that k value may be appropriate. Finally, if there are outliers and you want to reduce their impact, you can use smaller k values.

Value

A list containing the computed descriptive statistics, including:

n

The number of observations

min

The minimum value

max

The maximum value

mean

The mean

se

The standard error of the mean

sd

The standard deviation

trmean

The trimmed mean

med

The median

mad

The median absolute deviation (MAD), a robust statistic for measuring variability in data.

skew

The skewness

kurt

The excess kurtosis measures how peaked or flat a distribution is compared to a normal distribution. Subtracting 3 centers the measure relative to the kurtosis of a normal distribution, which is always 3.

winsmean

The Winsorized mean

hubermean

The Huber's M-estimator of location

range

The range

iqr

The interquartile range

Author(s)

Zeynel Cebeci, A. Firat Ozdemir, Engin Yildiztepe

Examples

set.seed(123)
x <- rnorm(100, mean=50, sd=5)
descriptives <- descstats(x)
as.data.frame(descriptives)

descriptives$mean
descriptives$se

# Determining the appropriate k in a given set of different k values. 
# This parameter is used to calculate the Huber M-estimator of the location
# Array of k values for testing
k <- seq(0, 5, by = 0.1)
k <- k[k> 0]
result <- sapply(k, function(y) descstats(x, k = y)$hubermean)
names(result) <- paste0("k=", k)
result

plot(k, result, type = "b", col = "blue", pch = 19, ylab = "Huber's mean")

descstats(x, k=2, trim=0.05)$hubermean

groupcompare documentation built on June 26, 2025, 1:08 a.m.