dat2bernqua | R Documentation |
The empirical quantile function can be “smoothed” (Hernández-Maldonado and others, 2012, p. 114) through the Kantorovich polynomial (Muñoz-Pérez and Fernández-Palacín, 1987) for the sample order statistics x_{k:n}
for a sample of size n
by
\tilde{X}_n(F) = \frac{1}{2}\sum_{k=0}^n (x_{k:n} + x_{(k+1):n}) {n \choose k} F^k (1-F)^{n-k}\mbox{,}
where F
is nonexceedance probability, and (n\:k)
are the binomial coefficients from the R function choose()
, and the special situations for k=0
and k=n
are described within the Note section. The form for the Bernstein polynomial is
\tilde{X}_n(F) = \sum_{k=0}^{n+1} (x_{k:n}) {n+1 \choose k} F^k (1-F)^{n+1-k}\mbox{.}
There are subtle differences between the two and dat2bernqua
function supports each. Readers are also directed to the Special Attention section.
Turnbull and Ghosh (2014) consider through the direction of a referee and recommendation of p=0.05
by that referee (and credit to ideas by de Carvalho [2012]) that the support of the probability density function for the Turnbull and Ghosh (2014) study of Bernstein polynomials can be computed letting \alpha = (1 - p)^{-2} - 1
by
\biggl(x_{1:n} - (x_{2:n} - x_{1:n})/\alpha,\: x_{n:n} + (x_{n:n} - x_{n-1:n})/\alpha\biggr)\mbox{,}
for the minimum and maximum, respectively. Evidently, the original support considered by Turnbull and Ghosh (2014) was
\biggl(x_{1:n} - \lambda_2\sqrt{\pi/n},\: x_{n:n} + \lambda_2\sqrt{\pi/n}\biggr)\mbox{,}
for the minimum and maximum, respectively and where the standard deviation is estimated in the function using the 2nd L-moment as s = \lambda\sqrt{\pi}
.
The p
is referred to by this author as the “p-factor” this value has great influence in the estimated support of the distribution and therefore distal-tail estimation or performance is sensitive to the value for p
. General exploratory analysis suggests that the p
can be optimized based on information external or internal to the data for shape restrained smoothing. For example, an analyst might have external information as to the expected L-skew of the phenomenon being studied or could use the sample L-skew of the data (internal information) for shape restraint (see pfactor.bernstein
).
An alternative formula for smoothing is by Cheng (1995) and is
\tilde{X}^{\mathrm{Cheng}}_n(F) = \sum_{k=1}^n x_{k:n}\:{n - 1 \choose k-1}\: F^{k-1}(1-F)^{n-k}\mbox{.}
dat2bernqua(f, x, bern.control=NULL,
poly.type=c("Bernstein", "Kantorovich", "Cheng", "Parzen",
"bernstein", "kantorovich", "cheng", "parzen"),
bound.type=c("none", "sd", "Carv", "either", "carv"),
fix.lower=NULL, fix.upper=NULL, p=0.05, listem=FALSE)
f |
A vector of nonexceedance probabilities |
x |
A vector of data values. |
bern.control |
A |
poly.type |
The Bernstein or Kantorovich polynomial will be used. The two are quite closely related. Or the formula by Cheng (1995) will be used and |
bound.type |
Triggers to the not involve alternative supports ( |
fix.lower |
For |
fix.upper |
For |
p |
A small probability value to serve as the |
listem |
A logical controlling whether (1) a vector of |
Yet another alternative formula for smoothing if by Parzen (1979) and known as the “Parzen weighting method” is
\tilde{X}^{\mathrm{Parzen}}_n(F) = n\left(\frac{r}{n} - F\right)x_{r-1:n} + n\left(F - \frac{r-1}{n}\right)x_{r:n}\mbox{,}
where (r-1)/n \le F \le (r/n)
for r = 1, 2, \ldots, n
and x_{0:n}
is taken as either the minimum of the data (\mathrm{min}(x)
) or the lower bounds fix.lower
as externally set by the user. For protection, the minimum of (\mathrm{min}(x),
fix.lower
)
is formally used. If the Parzen method is used, the only arguments considered are poly.type
and fix.lower
; all others are ignored including the f
(see Value section). The user does not actually have to provide f
in the arguments but a place holder such as f=NULL
is required; internally the Parzen method takes over full control. The Parzen method in general is not smooth and not recommended like the others that rely on a polynomial basis function. Further the Parzen method has implicit asymmetry in the estimated F
. The method has F=0
and F < 1
on output, but if the data are reversed, then the method has F > 0
and F=1
. Data reversal is made in -X
as this example illustrates:
X <- sort(rexp(30)) P <- dat2bernqua(f=NULL, X, poly.type="Parzen") R <- dat2bernqua(f=NULL, -X, poly.type="Parzen") plot(pp(X, a=0.5), X, xlim=c(0, 1)) # Hazen plotting position to lines( P$f, P$x, col="red" ) # basically split the horizontal lines(1-R$f, -R$x, col="blue") # differences between blue and red.
An R vector
is returned unless the Parzen weighting method is used and in that case an R list
is returned with elements f
and x
, which respectively are the F
values as shown in the formula and the \tilde{X}^{\mathrm{Parzen}}_n(F)
.
The limiting properties of the Bernstein and Kantorovich polynomials differ. The Kantorovich polynomial uses the average of the largest (smallest) value and the respective outer order statistics (x_{n+1:n}
or x_{0:n}
) unlike the Bernstein polynomial whose F = 0
or F = 1
are purely a function of the outer order statistics. Thus, the Bernstein polynomial can attain the fix.lower
and(or) fix.upper
whereas the Kantorovich fundamentally can not. For a final comment, the function dat2bernquaf
is an inverse of dat2bernqua
.
The function makes use of R functions lchoose
and exp
and logarithmic expressions, such as (1-F)^{(n-k)} \rightarrow (n-k)\log(1-F)
, for numerical stability for large sample sizes.
Muñoz-Pérez and Fernández-Palacín (1987, p. 391) describe what to do with the condition of k = 0
but seemingly do not comment on the condition of k = n
. There is no 0th-order statistic nor is there a k > n
order statistic. Muñoz-Pérez and Fernández-Palacín (1987) bring up the notion of a natural minimum for the data (for example, data that must be positive, fix.lower = 0
could be set). Logic dictates that a similar argument must be made for the maximum to keep a critical error from occurring if one tries to access the not plausible x[n+1]
-order statistic. Lastly, the argument names bound.type
, fix.lower
, and fix.upper
mimic, as revisions were made to this function in December 2013, the nomenclature of software for probability density function smoothing by Turnbull and Ghosh (2014). The dat2bernqua
function was originally added to lmomco in May 2013 prior to the author learning about Turnbull and Ghosh (2014).
Lastly, there can be many practical situations in which transformation is desired. Because of the logic structure related to how fix.lower
and fix.upper
are determined or provided by the user, it is highly recommended that this function not internally handle transformation and detransformation. See the second example for use of logarithms.
W.H. Asquith
Cheng, C., 1995, The Bernstein polynomial estimator of a smooth quantile function: Statistics and Probability Letters, v. 24, pp. 321–330.
de Carvalho, M., 2012, A generalization of the Solis-Wets method: Journal of Statistical Planning and Inference, v. 142, no. 3, pp. 633–644.
Hernández-Maldonado, V., Díaz-Viera, M., and Erdely, A., 2012, A joint stochastic simulation method using the Bernstein copula as a flexible tool for modeling nonlinear dependence structures between petrophysical properties: Journal of Petroleum Science and Engineering, v. 90–91, pp. 112–123.
Muñoz-Pérez, J., and Fernández-Palacín, A., 1987, Estimating the quantile function by Bernstein polynomials: Computational Statistics and Data Analysis, v. 5, pp. 391–397.
Nair, N.U., Sankaran, P.G., and Balakrishnan, N., 2013, Quantile-based reliability analysis: Springer, New York.
Turnbull, B.C., and Ghosh, S.K., 2014, Unimodal density estimation using Bernstein polynomials: Computational Statistics and Data Analysis, v. 72, pp. 13–29.
Parzen, E., 1979, Nonparametric statistical data modeling: Journal American Statistical Association, v. 75, pp. 105–122.
lmoms.bernstein
, pfactor.bernstein
, dat2bernquaf
# Compute smoothed extremes, quartiles, and median
# The smoothing seems to extend to F=0 and F=1.
set.seed(1); X <- exp(rnorm(20)); F <- c(0, .25, .50, .75, 1)
dat2bernqua(F, X, bound.type="none", listem=TRUE)$x
dat2bernqua(F, X, bound.type="Carv", listem=TRUE)$x
dat2bernqua(F, X, bound.type="sd", listem=TRUE)$x
dat2bernqua(F, X, bound.type="either", listem=TRUE)$x
dat2bernqua(F, X, bound.type="sd", listem=TRUE, fix.lower=0)$x
## Not run:
X <- sort(10^rnorm(20)); F <- nonexceeds(f01=TRUE)
plot(qnorm(pp(X)), X, xaxt="n", xlab="", ylab="QUANTILE", log="y")
add.lmomco.axis(las=2, tcl=0.5, side.type="NPP", twoside=TRUE)
lines(qnorm(F), dat2bernqua(F, X, bound.type="sd"), col="red", lwd=2)
lines(qnorm(F), exp(dat2bernqua(F,log(X), bound.type="sd"))) #
## End(Not run)
## Not run:
X <- exp(rnorm(20)); F <- seq(0.001, 0.999, by=.001)
dat2bernqua(0.9, X, poly.type="Bernstein", listem=TRUE)$x
dat2bernqua(0.9, X, poly.type="Kantorovich", listem=TRUE)$x
dat2bernqua(0.9, X, poly.type="Cheng", listem=TRUE)$x
plot(pp(X), sort(X), log="y", xlim=range(F))
lines(F, dat2bernqua(F, X, poly.type="Bernstein" ), col="red" )
lines(F, dat2bernqua(F, X, poly.type="Kantorovich"), col="green")
lines(F, dat2bernqua(F, X, poly.type="Cheng" ), col="blue" ) #
## End(Not run)
## Not run:
X <- exp(rnorm(20)); F <- nonexceeds()
plot(pp(X), sort(X))
lines(F, dat2bernqua(F,X, bound.type="sd", poly.type="Bernstein"))
lines(F, dat2bernqua(F,X, bound.type="sd", poly.type="Kantorovich"), col=2) #
## End(Not run)
## Not run:
X <- rnorm(25); F <- nonexceeds()
Q <- dat2bernqua(F, X) # the Bernstein estimates
plot( F, dat2bernqua(F, X, bound.type="Carv"), type="l" )
lines(F, dat2bernqua(F, X, bound.type="sd"), col="red" )
lines(F, dat2bernqua(F, X, bound.type="none"), col="green")
points(pp(X), sort(X), pch=16, cex=.75, col="blue" ) #
## End(Not run)
## Not run:
set.seed(13)
par <- parkap(vec2lmom(c(1, .5, .4, .2)))
F <- seq(0.001, 0.999, by=0.001)
X <- sort(rlmomco(100, par))
pp <- pp(X)
pdf("lmomco_example_dat2bernqua.pdf")
plot(qnorm(pp(X)), dat2bernqua(pp, X), col="blue", pch=1,
ylim=c(0,qlmomco(0.9999, par)))
lines(qnorm(F), dat2bernqua(F, sort(X)), col="blue")
lines(qnorm(F), qlmomco(F, par), col="red" )
sampar <- parkap(lmoms(X))
sampar2 <- parkap(lmoms(dat2bernqua(pp, X)))
lines( qnorm(pp(F)), qlmomco(F, sampar ), col="black")
lines( qnorm(pp(F)), qlmomco(F, sampar2), col="blue", lty=2)
points(qnorm(pp(X)), X, col="black", pch=16)
dev.off() #
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.