descstats | R Documentation |
This function calculates various descriptive statistics for a given numeric vector. These statistics include measures of central tendency, dispersion, skewness, kurtosis, and some robust estimators.
descstats(x, trim = 0.1, k = 1.5)
x |
A numeric vector. |
trim |
The fraction (0 to 0.5) of observations to be trimmed from each end of the vector is used to calculate the trimmed mean and winsorized mean. The default is 0.1. |
k |
The robustness parameter for the Huber M-estimator. The default is 1.5. |
In order to determine an appropriate k value for th Huber M-estimator some experiments might be needed. In the literature, commonly used k values typically range from 1.5 to 2. Users can start by choosing any value within this range. However, to determine an appropriate k within a given range, it is also selected by performing Huber estimations for each k value within this range, as shown in the example below. In the output, the estimated Huber M-estimator values can be checked on a plot. Select k values where a smooth trend or plateau is reached. If the Huber M-estimator values stabilize after a certain k value, that k value may be appropriate. Finally, if there are outliers and you want to reduce their impact, you can use smaller k values.
A list containing the computed descriptive statistics, including:
n |
The number of observations |
min |
The minimum value |
max |
The maximum value |
mean |
The mean |
se |
The standard error of the mean |
sd |
The standard deviation |
trmean |
The trimmed mean |
med |
The median |
mad |
The median absolute deviation (MAD), a robust statistic for measuring variability in data. |
skew |
The skewness |
kurt |
The excess kurtosis measures how peaked or flat a distribution is compared to a normal distribution. Subtracting 3 centers the measure relative to the kurtosis of a normal distribution, which is always 3. |
winsmean |
The Winsorized mean |
hubermean |
The Huber's M-estimator of location |
range |
The range |
iqr |
The interquartile range |
Zeynel Cebeci, A. Firat Ozdemir, Engin Yildiztepe
set.seed(123)
x <- rnorm(100, mean=50, sd=5)
descriptives <- descstats(x)
as.data.frame(descriptives)
descriptives$mean
descriptives$se
# Determining the appropriate k in a given set of different k values.
# This parameter is used to calculate the Huber M-estimator of the location
# Array of k values for testing
k <- seq(0, 5, by = 0.1)
k <- k[k> 0]
result <- sapply(k, function(y) descstats(x, k = y)$hubermean)
names(result) <- paste0("k=", k)
result
plot(k, result, type = "b", col = "blue", pch = 19, ylab = "Huber's mean")
descstats(x, k=2, trim=0.05)$hubermean
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.