generalized_mean: Generalized mean

Description Usage Arguments Details Value Warning Note References See Also Examples

Description

Calculate a generalized mean.

Usage

1
2
3
4
5
6
7
mean_generalized(r)

mean_arithmetic(x, w = rep(1, length(x)), na.rm = FALSE, scale = TRUE)

mean_geometric(x, w = rep(1, length(x)), na.rm = FALSE, scale = TRUE)

mean_harmonic(x, w = rep(1, length(x)), na.rm = FALSE, scale = TRUE)

Arguments

r

A finite number giving the order of the generalized mean.

x

A strictly positive numeric vector.

w

A strictly positive numeric vector of weights, the same length as x. The default is to equally weight each element of x.

na.rm

Should missing values in x and w be removed?

scale

Should the weights be scaled to sum to 1?

Details

The function mean_generalized() returns a function to compute the generalized mean of x with weights w and exponent r (i.e., ∏ x^w when r = 0 and (∑ wx^r)^1/r otherwise). This is also called the power mean, Holder mean, or l_p mean. See Bullen (2003, p. 175) for a definition, or https://en.wikipedia.org/wiki/Power_mean. The generalized mean is the solution to the optimal prediction problem: choose m to minimize ∑ w [log(x) - log(m)]^2 when r = 0, ∑ w [x^r - m^r]^2 otherwise.

The functions mean_arithmetic(), mean_geometric(), and mean_harmonic() compute the arithmetic, geometric, and harmonic (or subcontrary) means, also known as the Pythagorean means. These are the most useful means for making price indexes, and correspond to setting r = 1, r = 0, and r = -1 in mean_generalized().

Both x and w should be strictly positive (and finite), especially for the purpose of making a price index. This is not enforced, but the results may not make sense if the generalized mean in not defined. There are two exceptions to this.

  1. The convention in Hardy et al. (1952, p. 13) is used in cases where x has zeros: the generalized mean is 0 whenever w is strictly positive and r < 0. (The analogous convention holds whenever at least one element of x is Inf: the generalized mean is Inf whenever w is strictly positive and r > 0.)

  2. Some authors let w be non-negative and sum to 1 (e.g., Sydsaeter et al., 2005, p. 47). If w has zeros, then the corresponding element of x has no impact on the mean whenever x is strictly positive. Unlike weighted.mean(), however, zeros in w are not strong zeros, so infinite values in x will propagate even if the corresponding elements of w are zero.

The weights should almost always be scaled to sum to 1 to satisfy the definition of a generalized mean, although there are certain types of price indexes where the weights should not be scaled (e.g., the Vartia-I index).

The underlying calculation returned by mean_generalized() is mostly identical to weighted.mean(), with one important exception: missing values in the weights are not treated differently than missing values in x. Setting na.rm = TRUE drops missing values in both x and w, not just x. This ensures that certain useful identities are satisfied with missing values in x. In most cases mean_arithmetic() is a drop-in replacement for weighted.mean().

Value

mean_generalized() returns a function:

function(x, w = rep(1, length(x)), na.rm = FALSE, scale = TRUE).

mean_arithmetic(), mean_geometric(), and mean_harmonic() each return a numeric value.

Warning

Passing very small values for r can give misleading results, and warning is given whenever abs(r) is sufficiently small. In general, r should not be a computed value.

Note

mean_generalized() can be defined on the extended real line, so that r = -Inf/Inf returns min()/max(), to agree with the definition in, e.g., Bullen (2003). This is not implemented, and r must be finite.

There are a number of existing functions for calculating unweighted geometric and harmonic means, namely the geometric.mean() and harmonic.mean() functions in the 'psych' package, the geomean() function in the 'FSA' package, the GMean() and HMean() functions in the 'DescTools' package, and the geoMean() function in the 'EnvStats' package. Similarly, the ci_generalized_mean() function in the 'Compind' package calculates an unweighted generalized mean.

References

Bullen, P. S. (2003). Handbook of Means and Their Inequalities. Springer Science+Business Media.

Fisher, I. (1922). The Making of Index Numbers. Houghton Mifflin Company.

Hardy, G., Littlewood, J. E., and Polya, G. (1952). Inequalities (2nd edition). Cambridge University Press.

ILO, IMF, OECD, Eurostat, UN, and World Bank. (2004). Producer Price Index Manual: Theory and Practice. International Monetary Fund.

Lord, N. (2002). Does Smaller Spread Always Mean Larger Product? The Mathematical Gazette, 86(506): 273-274.

Sydsaeter, K., Strom, A., and Berck, P. (2005). Economists' Mathematical Manual (4th edition). Springer.

See Also

logmean_generalized for the generalized logarithmic mean.

mean_lehmer for the Lehmer mean, an alternative to the generalized mean.

weights_transmute transforms the weights to turn an r-generalized mean into an s-generalized mean.

weights_factor calculates the weights to factor a mean of products into a product of means.

price_index and quantity_index for simple wrappers that use mean_generalized() to calculate common indexes.

back_price/base_price for a simple utility function to turn prices in a table into price relatives.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Make some data

x <- 1:3
w <- c(0.25, 0.25, 0.5)

# Arithmetic mean

mean_arithmetic(x, w) # same as stats::weighted.mean(x, w)

# Geometric mean

mean_geometric(x, w) # same as prod(x^w)

# Using prod() to manually calculate the geometric mean can give misleading
# results

z <- 1:1000
prod(z)^(1 / length(z)) # overflow
mean_geometric(z)

z <- seq(0.0001, by = 0.0005, length.out = 1000)
prod(z)^(1 / length(z)) # underflow
mean_geometric(z)

# Harmonic mean

mean_harmonic(x, w) # same as 1 / stats::weighted.mean(1 / x, w)

# Quadratic mean / root mean square

mean_generalized(2)(x, w)

# Cubic mean
# Notice that this is larger than the other means so far because the 
# generalized mean is increasing in r

mean_generalized(3)(x, w)

#--------------------

# The dispersion between the arithmetic, geometric, and harmonic mean usually
# increases as the variance of 'x' increases

x <- c(1, 3, 5)
y <- c(2, 3, 4)

var(x) > var(y)

mean_arithmetic(x) - mean_geometric(x)
mean_arithmetic(y) - mean_geometric(y)

mean_geometric(x) - mean_harmonic(x)
mean_geometric(y) - mean_harmonic(y)

# But the dispersion between these means is only bounded by
# the variance (Bullen, 2003, p. 156)

mean_arithmetic(x) - mean_geometric(x) >=  2 / 3 * var(x) / (2 * max(x))
mean_arithmetic(x) - mean_geometric(x) <=  2 / 3 * var(x) / (2 * min(x))

# Example from Lord (2002) where the dispersion decreases as the variance increases,
# counter to the claims in Fisher (1922, p. 108) and the PPI manual (p. 28)

x <- (5 + c(sqrt(5), -sqrt(5), -3)) / 4
y <- (16 + c(7 * sqrt(2), -7 * sqrt(2), 0)) / 16

var(x) > var(y)

mean_arithmetic(x) - mean_geometric(x)
mean_arithmetic(y) - mean_geometric(y)

mean_geometric(x) - mean_harmonic(x)
mean_geometric(y) - mean_harmonic(y)

# The "bias" in the arithmetic and harmonic indexes is also smaller in this case, 
# counter to the claim in Fisher (1922, p. 108)

mean_arithmetic(x) * mean_arithmetic(1 / x) - 1
mean_arithmetic(y) * mean_arithmetic(1 / y) - 1

mean_harmonic(x) * mean_harmonic(1 / x) - 1
mean_harmonic(y) * mean_harmonic(1 / y) - 1

#--------------------

# Example of how missing values are handled

w <- replace(w, 2, NA)

mean_arithmetic(x, w)
mean_arithmetic(x, w, na.rm = TRUE) # drops the second observation
stats::weighted.mean(x, w, na.rm = TRUE) # still returns NA

#--------------------

# Sometimes it makes sense to calculate a generalized mean with
# negative inputs, so the warning can be ignored

mean_arithmetic(c(1, 2, -3))

# Other times it's less obvious

mean_harmonic(c(1, 2, -3))

#--------------------

# A function to make the superlative quadratic mean price index in chapter 17, 
# section B.5.1, of the PPI manual as a product of generalized means

quadratic_index <- function(x, w0, w1, r) {
  x <- sqrt(x) 
  mean_generalized(r)(x, w0) * mean_generalized(-r)(x, w1)
}

quadratic_index(1:3, 4:6, 7:9, 2)

# Same as the geometric mean of two generalized means (with the order halved)

quadratic_index2 <- function(x, w0, w1, r) {
  res <- c(mean_generalized(r)(x, w0), mean_generalized(-r)(x, w1))
  mean_geometric(res)
}

quadratic_index2(1:3, 4:6, 7:9, 1)

gpindex documentation built on Feb. 3, 2021, 1:06 a.m.