mded: Measuring the difference between two empirical distributions
In mded: Measuring the Difference Between Two Empirical Distributions

Description Usage Arguments Details Value Author(s) References Examples

The function measures the difference between two independent or non-independent empirical distributions and returns a significance level of the difference.

mded(distr1, distr2, detail = FALSE, independent = TRUE)

## S3 method for class 'mded'
print(x, digits = max(3, getOption("digits") - 3), ...)

`distr1`	A vector of empirical distribution. `distr1` is greater than `distr2`.
`distr2`	A vector of empirical distribution.
`detail`	If `TRUE`, a vector of the difference between `distr1` and `distr2` is returned.
`independent`	Set as `FALSE` when `distr1` and `distr2` are not independent of each other.
`x`	An object of S3 class 'mded.'
`digits`	A number of significant digits.
`...`	Arguments passed to the function `print`.

The function measures the difference between two independent or non-independent empirical distributions and returns a significance level of the difference on the basis of the methods proposed by Poe et al. (1997, 2005). Such calculations are frequently needed in empirical econometric studies wherein (marginal) willingness-to-pay distributions that are estimated using contingent valuation methods or discrete choice experiments have to be compared to each other.

Let us assume that X and Y are empirical distributions, which are depicted by the vector x = (x1, x2, ..., xm), and y = (y1, y2, ..., yn). The null hypothesis (H0) is X - Y = 0, while the alternative hypothesis (H1) is X - Y > 0. When X and Y are independent of each other, the complete combinatorial method (Poe et al. 2005) provides the one-sided significance level of H0 that is calculated by #{xi - yj <= 0} / m * n, where #{cond} provides the number of times that cond is true. When X and Y are not independent of each other, the paird difference method (Poe et al. 1997) provides the one-sided significance level of H0 that is calculated by #{xi - yi <= 0} / m, where m is equal to n.

Note that the function may take quite long, and would require large amount of memory to calculate the difference between two independent distributions if the argument detail is set as TRUE because the resulting difference is stored as a vector. For example, when distr1 and distr2 each contain 10,000 elements (observations), the vector of the difference contains 100,000,000 elements. If memory is lacking, R would stop running the function, showing an error message related to memory limitaion.

`stat`	One-side significance level of the difference between `distr1` and `distr2`.
`means`	A vector of mean values of `distr1` and `distr2`.
`cases`	A vector of integer values describing a number of cases wherein the cond is true and that is false.
`distr1`	A vector assigned to `distr1`.
`distr2`	A vector assigned to `distr2`.
`distr.names`	A vector of the names of objects assigned to `distr1` and `distr2`.
`diff`	A vector of the difference. If `detail = TRUE`, it is returned.

Hideo Aizaki

Poe GL, Giraud KL, Loomis JB (2005). Computational methods for measuring the difference of empirical distributions. American Journal of Agricultural Economics, 87, 353–365.

Poe GL, Severance-Lossin EK, Welsh WP (1994). Measuring the difference (X - Y) of simulated distributions: A convolutions approach. American Journal of Agricultural Economics, 76, 904–915.

Poe GL, Welsh MP, Champ PA (1997). Measuring the difference in mean willingness to pay when dichotomous choice contingent valuation responses are not independent. Land Economics, 73, 255–267.

set.seed(123)
x <- rnorm(100, 3)
y <- rnorm(100, 1)

out <- mded(distr1 = x, distr2 = y, detail = TRUE)
out

Test:
H0  x = y 
H1  x > y 
significance level = 0.054 

Data:
distr1 = x 
distr2 = y 

Means:
    means    n
x  3.0904  100
y  0.8925  100

Cases in the difference:
           n
true     540
false   9460
total  10000