normalize: Indicator Normalization

View source: R/normalize.R

normalizeR Documentation

Indicator Normalization

Description

Perform normalization based on indicators' polarity.

Usage

normalize(inds, method = c("min-max", "goalpost"), ind.pol,
          gp.range = c(70, 130), time = NULL, ref.time = NULL,
          ref.value = NULL)

Arguments

inds

a numeric vector, matrix, or data frame which provides indicators to be normalized.

method

normalization method to be used. See ‘Details’.

ind.pol

a character vector whose elements can be "pos" (positive) or "neg" (negative), indicating the polarity of indicators. An indicator's polarity is the sign of the relation between the indicator and the phenomenon to be measured.

gp.range

a vector of the form c(a,b) giving the normalization range for method "goalpost". The default value is c(70,130).

time

a vector of temporal factors for input indicators. The length of time must equal the number of rows in inds. If NULL, the input data will be treated as cross-sectional.

ref.time

a value denoting the reference time for normalization. See ‘Details’.

ref.value

a vector containing reference values for indicators to facilitate the interpretation of results, required by method "goalpost". When normalizing each indicator, their reference values will be mapped to the midpoint of gp.range. See ‘Details’.

Details

By default, each indicator x is normalized by method "min-max" with the formulas

\displaystyle\tilde{x}^{+}_i = \frac{x_i - \mathrm{inf}_x}{\mathrm{sup}_x - \mathrm{inf}_x},

or

\displaystyle \tilde{x}^{-}_i = 1 - \tilde{x}^{+}_i,

where \mathrm{sup}_x and \mathrm{inf}_x are respectively the superior and inferior values of the indicator. The former formula is applied to indicators with positive polarity while the latter one is used for those with negative polarity.

If either time or ref.time is NULL, the superior and inferior values are respectively the maximum and minimum values of x. If both time and ref.time are not NULL, the superior and inferior values are respectively the maximum and minimum values of x observed at the reference time. In other words, if time is not provided or provided without specifying a value for ref.time, the input data will be treated as cross-sectional.

For method "goalpost", a vector of reference values for indicators is required. If not specified by users (ref.value = NULL), these values are automatically set #' to the indicator means for cross-sectional data or to the indicator means at the reference time for longitudinal data.

Method "goalpost" computes two goalposts for normalization as \mathrm{gp\_min}_x = \mathrm{ref}_x - \Delta and \mathrm{gp\_max}_x = \mathrm{ref}_x + \Delta, where \mathrm{ref}_x is the reference value of x and \Delta = (\mathrm{sup}_x - \mathrm{inf}_x)/2. Indicators with positive polarity are rescaled using the formula

\displaystyle \tilde{x}^{+}_i = \frac{x_i - \mathrm{gp\_min}_x}{\mathrm{gp\_max}_x - \mathrm{gp\_min}_x} (b - a) + a,

while indicators with negative polarity are rescaled using the formula

\displaystyle \tilde{x}^{-}_i = a + b - \tilde{x}^{+}_i.

If an indicator follows a symmetric probability distribution and its reference value is set to the mean, the normalized values will theoretically remain in the range [a,b]. In other cases, the normalized values may extend beyond gp.range.

Value

An object of class "data.frame" containing normalized indicators.

Author(s)

Viet Duong Nguyen, Chiara Gigliarano, Mariateresa Ciommi

References

Mazziotta, M. and Pareto, A. (2016). On a Generalized Non-compensatory Composite Index for Measuring Socio-economic Phenomena. Social Indicators Research, 127, 983–1003.

See Also

giniCI.

Examples

# Generate data samples
set.seed(1)
df1 <- data.frame(X1 = rnorm(100, 0, 5),
                  X2 = runif(100, 1, 10),
                  X3 = rpois(100, 10))
set.seed(1)
df2 <- data.frame(X1 = rnorm(300, 0, 5),
                  X2 = runif(300, 1, 10),
                  X3 = rpois(300, 10),
                  time = rep(c(2020:2022), rep(100,3)))

# Min-max normalization
df1.mm <- normalize(inds = df1,
                    ind.pol = c("pos", "neg", "pos"))
summary(df1.mm)
df2.mm <- normalize(inds = df2[, 1:3],
                    ind.pol = c("pos", "neg", "pos"),
                    time = df2[, 4], ref.time = 2020)
summary(df2.mm)

# Goalpost normalization
df1.gp <- normalize(inds = df1, method = "goalpost",
                    ind.pol = c("pos", "neg", "pos"))
summary(df1.gp)
df2.gp <- normalize(inds = df2[, 1:3], method = "goalpost",
                    ind.pol = c("pos", "neg", "pos"),
                    time = df2[, 4], ref.time = 2020)
summary(df2.gp)

giniCI documentation built on April 3, 2025, 7:35 p.m.