pfactor.bernstein: Estimation of Optimal p-factor of Distributional Support...

pfactor.bernsteinR Documentation

Estimation of Optimal p-factor of Distributional Support Estimation for Smoothed Quantiles from the Bernstein or Kantorovich Polynomials

Description

Compute the optimal p-factor through numerical integration of the smoothed empirical quantile function to estimate the L-moments of the distribution. This function attempts to report an optimal “p-factor” (author's term) for the given parent distribution in para based on estimating the crossing of the origin of an error between the given L-moment ratio \tau_r for 3, 4, and 5 that will come from either the distribution parameter object or given as an argument in lmr.dist. The estimated support of the distribution is that shown by Turnbull and Ghosh (2014) and is computed as follows

\biggl(x_{0:n},\: x_{n+1:n}\biggr) = \biggl(x_{1:n} - \frac{(x_{2:n} - x_{1:n})}{(1 - p)^{-2} - 1},\: x_{n:n} + \frac{(x_{n:n} - x_{n-1:n})}{(1 - p)^{-2} - 1}\biggr)\mbox{,}

where p is the p-factor. The support will honor natural bounds if given by either fix.lower or fix.upper. The polynomial type for smooth is provided in poly.type. These three arguments are the same as those for dat2bernqua and lmoms.bernstein. The statistic type used to measure central tendency of the errors for the nsim simulations per p. The function has its own hardwired p-factors to compute but these can be superseded by the pfactors argument. The p.lo and p.hi are the lower and upper bounds to truncate on immediately after the p-factors to use are assembled. These are made for three purposes: (1) protection against numerical problems for mathematical upper limits (unity), (2) to potentially provide for much faster execution if the user already knows the approximate optimal value for the p-factor, and (3) to potentially use this function in a direct optimization framework using the R functions optim or uniroot. It is strongly suggested to keep plot.em set so the user can inspect the computations.

Usage

pfactor.bernstein(para, x=NULL, n=NULL,
                        bern.control=NULL,
                        poly.type=c("Bernstein", "Kantorovich"),
                        stat.type=c("Mean", "Median"),
                        fix.lower=NULL, fix.upper=NULL,
                        lmr.dist=NULL, lmr.n=c("3", "4", "5"),
                        nsim=500, plot.em=TRUE, pfactors=NULL,
                        p.lo=.Machine$double.eps, p.hi=1)

Arguments

para

A mandatory “parent” distribution defined by a usual lmomco distribution parameter object for a distribution. The simulations are based on this distribution, although optimization for p can be set to a different L-moment value by lmr.dist.

x

An optional vector of data values.

n

An optional sample size to run the simulations on. This value is computed by length(x) if x is provided. If set by argument, then that size supersedes the length of the optional observed sample.

bern.control

A list that holds poly.type, stat.type, fix.lower, and fix.upper. And this list will supersede the respective values provided as separate arguments. There is an implicit bound.type of "Carv".

poly.type

Same argument as for dat2bernqua.

stat.type

The central estimation statistic for each p-factor evaluated.

fix.lower

Same argument as for dat2bernqua.

fix.upper

Same argument as for dat2bernqua.

lmr.dist

This is the value for the lmr.n of the distribution in para unless explicitly set through lmr.dist.

lmr.n

The L-moment ratio number for p-factor optimization.

nsim

The number of simulations to run. Experiments suggest the default is adequate for reasonably small sample sizes—the simulation count can be reduced as n becomes large.

plot.em

A logical to trigger the diagnostic plot of the simulated errors and a smooth line through these errors.

pfactors

An optional vector of p-factors to loop through for the simulations. The vector computing internall is this is set to NULL seems to be more than adequate.

p.lo

An computational lower boundary for which the pfactors by argument or default are truncated to. The default for lo is to be quite small and does no truncate the default pfactors.

p.hi

An computational upper boundary for which the pfactors by argument or default are truncated to. The default for hi is unity, which is the true upper limit that results in a 0 slope between the x_{0:n} to x_{1:n} or x_{n:n} to x_{n+1:n} order statistics.

Value

An R list or real is returned. If pfactors is a single value, then the single value for the error statistic is returned, otherwise the list described will be. If the returned pfactor is NA, then likely the smooth line did not cross zero and the reason the user should keep plot.em=TRUE and inspect the plot. Perhaps revisions to the arguments will become evident. The contents of the list are

pfactor

The estimated value of p smoothed by lowess that has an error of zero, see err.stat as a function of ps.

bounds.type

Carv, which is the same bound type as needed by dat2bernqua and
lmoms.bernstein.

poly.type

The given poly.type.

stat.type

The given stat.type. The “Mean” seems to be preferable.

lmom.type

A string of the L-moment type: “Tau3”, “Tau4”, “Tau5”.

fix.lower

The given fixed lower boundary, which could stay NULL.

fix.upper

The given fixed upper boundary, which could stay NULL.

source

An attribute identifying the computational source of the L-moments: “pfactor.bernstein”.

ps

The p-factors actually evaluated.

err.stat

The error statistic computed by stat.type of the simulated \hat{\tau_r} by integration provided by lmoms.bernstein minus the “true” value \tau_r provided by either para or given by lmr.dist where r is lmr.n.

err.smooth

The lowess-smoothed values for err.stat and the pfactor comes from a linear interpolation of this smooth for the error being zero.

Note

Repeated application of this function for various n would result in the analyst having a vector of n and p (pfactor). The analyst could then fit a regression equation and refine the estimated p(n). For example, a dual-logarithmic regression is suggested lm(log(p)~log(n)).

Also, symmetrical data likely see little benefit from optimizing on the symmetry-measuring L-moments Tau3 and Tau5; the analyst might prefer to optimize on peakedness measured by Tau4.

Note

This function is highly experimental and subject to extreme overhaul. Please contact the author if you are an interested party in Bernstein and Kantorovich polynomials.

Author(s)

W.H. Asquith

References

Turnbull, B.C., and Ghosh, S.K., 2014, Unimodal density estimation using Bernstein polynomials. Computational Statistics and Data Analysis, v. 72, pp. 13–29.

See Also

lmoms.bernstein, dat2bernqua, lmoms

Examples

## Not run: 
pdf("pfactor_exampleB.pdf")
X <- exp(rnorm(200)); para <- parexp(lmoms(X))
# nsim is too small, but makes the following three not take too long
pfactor.bernstein(para, n=20, lmr.n="3", nsim=100, p.lo=.06, p.hi=.3)
pfactor.bernstein(para, n=20, lmr.n="4", nsim=100, p.lo=.06, p.hi=.3)
pfactor.bernstein(para, n=20, lmr.n="5", nsim=100, p.lo=.06, p.hi=.3)
dev.off()

## End(Not run)
## Not run: 
# Try intra-sample p-factor optimization from two perspectives. The 3-parameter
# GEV "over fits" the data and provides the parent.  Then use Tau3 of the fitted
# GEV for peakedness restraint and then use Tau3 of the data. Then repeat but use
# the apparent "exact" value of Tau3 for the true exponential parent.
pdf("pfactor_exampleB.pdf")
lmr <- vec2lmom(c(60,20)); paraA <- parexp(lmr); n <- 40
tr <- lmorph(par2lmom(paraA))$ratios[3]
X <- rlmomco(n, paraA); para <- pargev(lmoms(X))
F <- seq(0.001,0.999, by=0.001)
plot(qnorm(pp(X, a=0.40)), sort(X), type="n", log="y",
      xlab="Standard normal variate", ylab="Quantile",
      xlim=qnorm(range(F)), ylim=range(qlmomco(F,paraA)))
lines(qnorm(F), qlmomco(F, paraA), col=8, lwd=2)
lines(qnorm(F), qlmomco(F, para), lty=2)
points(qnorm(pp(X, a=0.40)), sort(X))

# Make sure to fill in the p-factor when needed!
bc <- list(poly.type = "Bernstein", bound.type="Carv",
           stat.type="Mean", fix.lower=0, fix.upper=NULL, p=NULL)
kc <- list(poly.type = "Kantorovich", bound.type="Carv",
           stat.type="Mean", fix.lower=0, fix.upper=NULL, p=NULL)

# Bernstein
A <- pfactor.bernstein(para,      n=n, nsim=100,              bern.control=bc)
B <- pfactor.bernstein(para, x=X, n=n, nsim=100,              bern.control=bc)
C <- pfactor.bernstein(para,      n=n, nsim=100, lmr.dist=tr, bern.control=bc)
D <- pfactor.bernstein(para, x=X, n=n, nsim=100, lmr.dist=tr, bern.control=bc)
plot(qnorm(pp(X, a=0.40)), sort(X), type="n", log="y",
      xlab="Standard normal variate", ylab="Quantile",
      xlim=qnorm(range(F)), ylim=range(qlmomco(F,paraA)))
lines(qnorm(F), qlmomco(F, paraA), col=8, lwd=2)
lines(qnorm(F), qlmomco(F, para), lty=2)
points(qnorm(pp(X, a=0.40)), sort(X))
      bc$p <- A$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=2)
      bc$p <- B$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=3)
      bc$p <- C$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=2, lty=2)
      bc$p <- D$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=3, lty=2)
# Kantorovich
A <- pfactor.bernstein(para,      n=n, nsim=100,              bern.control=kc)
B <- pfactor.bernstein(para, x=X, n=n, nsim=100,              bern.control=kc)
C <- pfactor.bernstein(para,      n=n, nsim=100, lmr.dist=tr, bern.control=kc)
D <- pfactor.bernstein(para, x=X, n=n, nsim=100, lmr.dist=tr, bern.control=kc)
plot(qnorm(pp(X, a=0.40)), sort(X), type="n", log="y",
      xlab="Standard normal variate", ylab="Quantile",
      xlim=qnorm(range(F)), ylim=range(qlmomco(F,paraA)))
lines(qnorm(F), qlmomco(F, paraA), col=8, lwd=2)
lines(qnorm(F), qlmomco(F, para), lty=2)
points(qnorm(pp(X, a=0.40)), sort(X))
      kc$p <- A$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=2)
      kc$p <- B$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=3)
      kc$p <- C$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=2, lty=2)
      kc$p <- D$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=3, lty=2)
dev.off()

## End(Not run)
## Not run: 
X <- exp(rnorm(200)); para <- parexp(lmoms(X))
"pfactor.root" <- function(para, p.lo, p.hi, ...) {
    afunc <- function(p, para=NULL, x=NULL, ...) {
      return(pfactor.bernstein(para=para, x=x, pfactors=p, ...)) }
    rt <- uniroot(afunc, c(p.lo, p.hi),
                  tol=0.001, maxiter=30, nsim=500, para=para, ...)
    return(rt)
}
pfactor.root(para, 0.05, 0.15, n=10, lmr.n="4")
pfactor.bernstein(para, n=10, lmr.n="4", nsim=200, p.lo=.05, p.hi=.15)

## End(Not run)

lmomco documentation built on May 29, 2024, 10:06 a.m.