lmomben: L-moments of the Benford Distribution

lmombenR Documentation

L-moments of the Benford Distribution

Description

Experimental—This function returns previously numerical estimations of the L-moments of the Benford distribution (Benford's Law) given parameters defining the number of first M-significant digits and the numeric base.

For the first significant digits (d \in 1, \cdots, 9) (base 10) (designate as m = 1), the L-moments were estimated through very large sample-size simulation and sample L-moments computed ( lmoms), direct numerical integration (theoLmoms), and through numerical integration of the probability weighted moments and conversion to L-moments (pwm2lmom) as

\lambda_1 = 3.43908699617500524\mbox{,}

\lambda_2 = 1.34518434179517077\mbox{,}

\tau_3 = 0.24794090889493661\mbox{, and}

\tau_4 = 0.01614509742647182\mbox{.}

For the first two-significant digits (d \in 10, \cdots, 99) (base 10) (designate as m = 2), the L-moments were estimated through very large sample-size simulation, direct numerical integration (theoLmoms), and through numerical integration of the probability weighted moments and conversion to L-moments (pwm2lmom) as

\lambda_1 = 38.59062918136093145\mbox{,}

\lambda_2 = 13.81767809210059283\mbox{,}

\tau_3 = 0.22237541787527126\mbox{, and}

\tau_4 = 0.03541037418894027\mbox{.}

For the first three-significant digits (d \in 100, \cdots, 999) (base 10) (designate as m = 3), the L-moments were estimated through very large sample-size simulation, direct numerical integration (theoLmoms), and through numerical integration of the probability weighted moments and conversion to L-moments (pwm2lmom) as

\lambda_1 = 390.36783537821605705\mbox{,}

\lambda_2 = 138.21917489739223583\mbox{,}

\tau_3 = 0.22192482374529940\mbox{, and}

\tau_4 = 0.03571514686148788\mbox{.}

Source of the L-moments—The script inst/doc/benford/compLmomsBenford.R in the lmomco package sources is the authoritative source of the computation of the L-moments shown. Three methods are used, and the arithmetic average of the three provides the L-moments: (1) Probability-weighted simulation of the probability mass function (PMF) is used in very large sample size and sample L-moments computed by lmoms, (2) direct numerical integration for the theoretical L-moments of the quantile function (quaben) of the distribution that itself is from the cumulative distribution function (cdfben) that itself is from the PMF (pmfben), and (3) direct numerical integration of the probability-weighted moments of the quantile function (quaben) and subsequent linear system of equations to compute the L-moments. Each of the aforementioned methods result in numerical differences say at about the fourth decimal. (No previous description of the L-moments of the Benford distribution appear extant in the literature in July 2024.)

Usage

lmomben(para=list(para=c(1, 10)), ...)

Arguments

para

The number of first M-significant digits followed by the numerical base (only base10 supported) and the list structure mimics similar uses of the lmomco list structure. Default are the first significant digits and hence the digits 1 through 9.

...

Additional arguments to pass (not likely to be needed but changes in base handling might need this).

Value

An R list is returned.

lambdas

Vector of the L-moments. First element is \lambda_1, second element is \lambda_2, and so on.

ratios

Vector of the L-moment ratios. Second element is \tau, third element is \tau_3 and so on.

trim

Level of symmetrical trimming used in the computation, which is 0.

leftrim

Level of left-tail trimming used in the computation, which is NULL.

rightrim

Level of right-tail trimming used in the computation, which is NULL.

source

An attribute identifying the computational source of the L-moments: “lmomben”.

Note

Hypothesis Testing—Let the squared Euclidean distance of the L-moments (not the L-moment ratios) between the first four sample L-moments (\hat{\lambda}_r) and the theoretical versions (\lambda) provided by this function be defined as

D^2 = (\hat{\lambda}_1 - \lambda_1)^2 + (\hat{\lambda}_2 - \lambda_2)^2 + (\hat{\lambda}_3 - \lambda_3)^2 + (\hat{\lambda}_4 - \lambda_4)^2\mbox{.}

Let \alpha \in 0.10, 0.05, 0.01, 0.005, 0.001 be upper tail probability levels (statistical significance thresholds). Let m denote the number of significant digits (m \in 1, 2, 3) in base 10 and n denote sample size. Let \gamma = -\mathrm{log}(-\mathrm{log}(\alpha)) be a transformation (in the style of a Gumbel reduced variate) (prob2grv). Using extensive simulation for many sample sizes, the \alpha values, and computing D^2(\alpha, m; n), it can be shown that the critical values for the D^2 distances are

D^2(\alpha) = \frac{1}{n}\,\mathrm{exp}\bigl[(-2.6607150 + 4.6154937m) - 1.217283\gamma\bigr]\mbox{,}

wherein linear regression was used to estimate relation between each D^2 and n \ge 5 and the coefficients subsequently subjected to linear regression as functions of \alpha. The Examples shows an implementation of the critical values.

Author(s)

W.H. Asquith

See Also

cdfben, pmfben, quaben

Examples

lmomben(para=list(para=c(3, 10)))

## Not run: 
  # Code suitable for study of performance of Cho and Gaines D
  # against using the first for L-moments with controls for having
  # the Benford distribution as the true parent or alternative
  # distributions fit to the the L-moments of the Benford for the
  # first significant digit.
  # https://en.wikipedia.org/wiki/Benford
  ChoGainesD <- function(x) {
    n <- length(x)
    d <- sapply(1:9, function(d) (length(x[x == d])/n - log10(1+1/d))^2)
    return(sqrt(n * sum(d)))
  }
  CritChoGainesD <- function(alpha=c("0.1", "0.05", "0.01")) {
    alpha <- as.character(as.numeric( alpha ))
    alpha <-   as.numeric(match.arg(  alpha ))
    if(alpha == 0.10) return(1.212)
    if(alpha == 0.05) return(1.330)
    if(alpha == 0.01) return(1.569)
    return(NULL)
  }
  D2lmom <- function(x, theolmr=NULL) {
    lmr <- lmoms(x)
    sum((lmr$lambdas[1:4] - theolmr$lambdas[1:4])^2)
  }
  CritD2lmom <-
    function(m, n, alpha=c("0.1", "0.05", "0.01", "0.005", "0.001")) {
      alpha <- as.character(as.numeric( alpha ))
      alpha <-   as.numeric(match.arg(  alpha ))
      exp((-2.6607150 + 4.6154937*m) - 1.217283*(-log(-log(alpha))))/n
  }

  nsim <- 2E4; n <- 100; alpha <- 0.05
  is_Benford_parent <- FALSE

  CritCGD <- CritChoGainesD(  alpha=alpha )
  CritLMR <- CritD2lmom(1, n, alpha=alpha )
  bens <- 1:9; pmf <- log10(1 + 1/bens) # for the Benford being true
  benlmr <- lmomben(list(para=c(1, 10))); dtype <- "nor" # Normal (say)
  parent <- lmom2par(benlmr, type=dtype)

  DF <- NULL
  ix <- seq(1, n, by=2)
  for(i in 1:nsim) {
    if(is_Benford_parent) {
      x <- sample(bens, n, replace=TRUE, prob=pmf)
    } else {
      x <- rlmomco(n, parent) # simulate from the parent
      x <- unlist(strsplit(sprintf("
      x <- as.integer(x) # complete extraction of the first digit
    }
    CGD    <- ChoGainesD(x)
    LMR    <- D2lmom(x, theolmr=benlmr)
    rejCGD <- ifelse(CGD > CritCGD, TRUE, FALSE)
    rejLMR <- ifelse(LMR > CritLMR, TRUE, FALSE)
    DF <- rbind(DF, data.frame(CGD=rejCGD, LMR=rejLMR))
  }
  print(summary(DF))
  if(is_Benford_parent) { # H0 is True
    CGDpct <- 100*(sum(as.numeric(DF$CGD)) / nsim - alpha) / alpha
    LMRpct <- 100*(sum(as.numeric(DF$LMR)) / nsim - alpha) / alpha
    message("The ChoGainesD rejection rate for alpha=", alpha,
            " is ", sum(as.numeric(DF$CGD)) / nsim,
            " (", round(CGDpct, digits=2), " percent difference).")
	   message("The   D2lmom   rejection rate for alpha=", alpha,
            " is ", sum(as.numeric(DF$LMR)) / nsim,
            " (", round(LMRpct, digits=2), " percent difference).")
  } else { # H0 is False
    acceptH0_H0false_CDG <- sum(as.numeric(! DF$CGD)) / nsim
    acceptH0_H0false_LMR <- sum(as.numeric(! DF$LMR)) / nsim
    betaCDG <- round(1 - acceptH0_H0false_CDG, digits=2)
    betaLMR <- round(1 - acceptH0_H0false_LMR, digits=2)
    message("Power of ChoGainesD = ", betaCDG, ".")
    message("Power of   D2lmom   = ", betaLMR, ".")
  } #
## End(Not run)

wasquith/lmomco documentation built on Nov. 13, 2024, 4:53 p.m.