utils_stats: Summary statistics utilities

Description Usage Arguments Details Value See Also Examples

Description

Vectorized summary statistics, including geometric mean, harmonic mean, sample standard error (SE), coefficient of variation (CV), root mean square error (RMSE), mean absolute error (MAE), sensitivity, and robust z-scores.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
geom_mean(x, na.rm = TRUE, zero.rm = FALSE, ...)

harm_mean(x, na.rm = TRUE, zero.rm = FALSE, ...)

sem(x, na.rm = TRUE, ...)

cv(x, na.rm = TRUE, ...)

rmse(x, y, na.rm = TRUE, ...)

mae(x, y, stdz = FALSE, na.rm = TRUE, ...)

zscr(x, robust = TRUE, na.rm = TRUE, ...)

Arguments

x

vector of values to evaluate

na.rm

logical, should NA values in x be removed before calculating?

zero.rm

logical, should zeros in x be removed before calculating harmonic or geometric means?

...

further arguments passed to other methods

y

vector of 'predicted' values to compare against x

stdz

logical, standardize output by range of x?

robust

logical, should robust z-scores be calculated?

Details

For vectors including at least one zero, results of geom_mean and harm_mean are always 0 by definition, unless zero.rm=TRUE.

Like sd, sem uses n-1 in denominator to correct for small-sample bias.

rmse is one way to assess prediction accuracy.

mae gives a measure of sensitivity when stdz=TRUE.

zcsr gives robust z-scores based on median (not mean) and median absolute deviation (not standard deviation).

These functions return NA when NAs present and na.rm=TRUE.

Value

Numeric values.

See Also

sd

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# test data
xx <- c(-1, 0, 1, 4, 77, NA)

# harmonic mean
harm_mean(xx, na.rm=TRUE, zero.rm=FALSE)     # 0 by definition
harm_mean(xx, na.rm=TRUE, zero.rm=TRUE)      # 15.20988

# geometric mean
### NOT RUN:
# geom_mean(xx, na.rm=TRUE, zero.rm=FALSE))  # fails for neg vals
### END NOT RUN
xx <- xx[-1]                                 # remove negative values
geom_mean(xx, na.rm=TRUE, zero.rm=FALSE)     # 0 by definition
geom_mean(xx, na.rm=TRUE, zero.rm=TRUE)      # 6.753313

# standard error of the mean
sem(xx)                       # 21.76899

# coefficient of variation
cv(xx)                        # 183.9268

# root mean squared error
set.seed(23)
xx <- c(-1, 0, 1, 4, 77, NA)
yy <- xx+rnorm(length(xx), 10)
rmse(xx, yy)                  # 10.71919
rmse(yy, xx)                  # same, order invariant

# mean absolute error
mae(xx, yy, stdz=FALSE)       # 10.69236

# range-standardized mean absolute error (aka sensitivity)
mae(xx, yy, stdz=TRUE)        # 0.1370815
mae(yy, xx, stdz=TRUE)        # 0.135684 -- order matters!

# robust z-scores not so influenced by extreme values
x <- c(-99, -9, 0, 9, 99)
plot(zscr(x, robust=FALSE), zscr(x, robust=TRUE), asp=1)
abline(0,1)

phytomosaic/ecole documentation built on Jan. 2, 2022, 11:24 p.m.