lmomben | R Documentation |
Experimental—This function returns previously numerical estimations of the L-moments of the Benford distribution (Benford's Law) given parameters defining the number of first M-significant digits and the numeric base.
For the first significant digits (d \in 1, \cdots, 9
) (base 10) (designate as m = 1
), the L-moments were estimated through very large sample-size simulation and sample L-moments computed ( lmoms
), direct numerical integration (theoLmoms
), and through numerical integration of the probability weighted moments and conversion to L-moments (pwm2lmom
) as
\lambda_1 = 3.43908699617500524\mbox{,}
\lambda_2 = 1.34518434179517077\mbox{,}
\tau_3 = 0.24794090889493661\mbox{, and}
\tau_4 = 0.01614509742647182\mbox{.}
For the first two-significant digits (d \in 10, \cdots, 99
) (base 10) (designate as m = 2
), the L-moments were estimated through very large sample-size simulation, direct numerical integration (theoLmoms
), and through numerical integration of the probability weighted moments and conversion to L-moments (pwm2lmom
) as
\lambda_1 = 38.59062918136093145\mbox{,}
\lambda_2 = 13.81767809210059283\mbox{,}
\tau_3 = 0.22237541787527126\mbox{, and}
\tau_4 = 0.03541037418894027\mbox{.}
For the first three-significant digits (d \in 100, \cdots, 999
) (base 10) (designate as m = 3
), the L-moments were estimated through very large sample-size simulation, direct numerical integration (theoLmoms
), and through numerical integration of the probability weighted moments and conversion to L-moments (pwm2lmom
) as
\lambda_1 = 390.36783537821605705\mbox{,}
\lambda_2 = 138.21917489739223583\mbox{,}
\tau_3 = 0.22192482374529940\mbox{, and}
\tau_4 = 0.03571514686148788\mbox{.}
Source of the L-moments—The script inst/doc/benford/compLmomsBenford.R
in the lmomco package sources is the authoritative source of the computation of the L-moments shown. Three methods are used, and the arithmetic average of the three provides the L-moments: (1) Probability-weighted simulation of the probability mass function (PMF) is used in very large sample size and sample L-moments computed by lmoms
, (2) direct numerical integration for the theoretical L-moments of the quantile function (quaben
) of the distribution that itself is from the cumulative distribution function (cdfben
) that itself is from the PMF (pmfben
), and (3) direct numerical integration of the probability-weighted moments of the quantile function (quaben
) and subsequent linear system of equations to compute the L-moments. Each of the aforementioned methods result in numerical differences say at about the fourth decimal. (No previous description of the L-moments of the Benford distribution appear extant in the literature in July 2024.)
lmomben(para=list(para=c(1, 10)), ...)
para |
The number of first M-significant digits followed by the numerical base (only base10 supported) and the list structure mimics similar uses of the lmomco list structure. Default are the first significant digits and hence the digits 1 through 9. |
... |
Additional arguments to pass (not likely to be needed but changes in base handling might need this). |
An R list
is returned.
lambdas |
Vector of the L-moments. First element is
|
ratios |
Vector of the L-moment ratios. Second element is
|
trim |
Level of symmetrical trimming used in the computation, which is |
leftrim |
Level of left-tail trimming used in the computation, which is |
rightrim |
Level of right-tail trimming used in the computation, which is |
source |
An attribute identifying the computational source of the L-moments: “lmomben”. |
Hypothesis Testing—Let the squared Euclidean distance of the L-moments (not the L-moment ratios) between the first four sample L-moments (\hat{\lambda}_r
) and the theoretical versions (\lambda
) provided by this function be defined as
D^2 = (\hat{\lambda}_1 - \lambda_1)^2 + (\hat{\lambda}_2 - \lambda_2)^2 + (\hat{\lambda}_3 - \lambda_3)^2 + (\hat{\lambda}_4 - \lambda_4)^2\mbox{.}
Let \alpha \in 0.10, 0.05, 0.01, 0.005, 0.001
be upper tail probability levels (statistical significance thresholds). Let m
denote the number of significant digits (m \in 1, 2, 3
) in base 10 and n
denote sample size. Let \gamma = -\mathrm{log}(-\mathrm{log}(\alpha))
be a transformation (in the style of a Gumbel reduced variate) (prob2grv
). Using extensive simulation for many sample sizes, the \alpha
values, and computing D^2(\alpha, m; n)
, it can be shown that the critical values for the D^2
distances are
D^2(\alpha) = \frac{1}{n}\,\mathrm{exp}\bigl[(-2.6607150 + 4.6154937m) - 1.217283\gamma\bigr]\mbox{,}
wherein linear regression was used to estimate relation between each D^2
and n \ge 5
and the coefficients subsequently subjected to linear regression as functions of \alpha
. The Examples shows an implementation of the critical values.
W.H. Asquith
cdfben
, pmfben
, quaben
lmomben(para=list(para=c(3, 10)))
## Not run:
# Code suitable for study of performance of Cho and Gaines D
# against using the first for L-moments with controls for having
# the Benford distribution as the true parent or alternative
# distributions fit to the the L-moments of the Benford for the
# first significant digit.
# https://en.wikipedia.org/wiki/Benford
ChoGainesD <- function(x) {
n <- length(x)
d <- sapply(1:9, function(d) (length(x[x == d])/n - log10(1+1/d))^2)
return(sqrt(n * sum(d)))
}
CritChoGainesD <- function(alpha=c("0.1", "0.05", "0.01")) {
alpha <- as.character(as.numeric( alpha ))
alpha <- as.numeric(match.arg( alpha ))
if(alpha == 0.10) return(1.212)
if(alpha == 0.05) return(1.330)
if(alpha == 0.01) return(1.569)
return(NULL)
}
D2lmom <- function(x, theolmr=NULL) {
lmr <- lmoms(x)
sum((lmr$lambdas[1:4] - theolmr$lambdas[1:4])^2)
}
CritD2lmom <-
function(m, n, alpha=c("0.1", "0.05", "0.01", "0.005", "0.001")) {
alpha <- as.character(as.numeric( alpha ))
alpha <- as.numeric(match.arg( alpha ))
exp((-2.6607150 + 4.6154937*m) - 1.217283*(-log(-log(alpha))))/n
}
nsim <- 2E4; n <- 100; alpha <- 0.05
is_Benford_parent <- FALSE
CritCGD <- CritChoGainesD( alpha=alpha )
CritLMR <- CritD2lmom(1, n, alpha=alpha )
bens <- 1:9; pmf <- log10(1 + 1/bens) # for the Benford being true
benlmr <- lmomben(list(para=c(1, 10))); dtype <- "nor" # Normal (say)
parent <- lmom2par(benlmr, type=dtype)
DF <- NULL
ix <- seq(1, n, by=2)
for(i in 1:nsim) {
if(is_Benford_parent) {
x <- sample(bens, n, replace=TRUE, prob=pmf)
} else {
x <- rlmomco(n, parent) # simulate from the parent
x <- unlist(strsplit(sprintf("
x <- as.integer(x) # complete extraction of the first digit
}
CGD <- ChoGainesD(x)
LMR <- D2lmom(x, theolmr=benlmr)
rejCGD <- ifelse(CGD > CritCGD, TRUE, FALSE)
rejLMR <- ifelse(LMR > CritLMR, TRUE, FALSE)
DF <- rbind(DF, data.frame(CGD=rejCGD, LMR=rejLMR))
}
print(summary(DF))
if(is_Benford_parent) { # H0 is True
CGDpct <- 100*(sum(as.numeric(DF$CGD)) / nsim - alpha) / alpha
LMRpct <- 100*(sum(as.numeric(DF$LMR)) / nsim - alpha) / alpha
message("The ChoGainesD rejection rate for alpha=", alpha,
" is ", sum(as.numeric(DF$CGD)) / nsim,
" (", round(CGDpct, digits=2), " percent difference).")
message("The D2lmom rejection rate for alpha=", alpha,
" is ", sum(as.numeric(DF$LMR)) / nsim,
" (", round(LMRpct, digits=2), " percent difference).")
} else { # H0 is False
acceptH0_H0false_CDG <- sum(as.numeric(! DF$CGD)) / nsim
acceptH0_H0false_LMR <- sum(as.numeric(! DF$LMR)) / nsim
betaCDG <- round(1 - acceptH0_H0false_CDG, digits=2)
betaLMR <- round(1 - acceptH0_H0false_LMR, digits=2)
message("Power of ChoGainesD = ", betaCDG, ".")
message("Power of D2lmom = ", betaLMR, ".")
} #
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.