EDFtest: Goodness of Fit Based on Empirical Distribution Function

Documented in AD.exp.pvalue AD.extremevalue.pvalue AD.gamma.pvalue AD.laplace.pvalue AD.logistic.pvalue AD.normal.pvalue AD.uniform.pvalue AD.weibull.pvalue

#' P-value of Anderson-Darling statistic
#'
#' @description
#' Compute the P-value of the given Anderson-Darling statistic \eqn{A^2}
#' using \code{\link[CompQuadForm]{imhof}} function in \code{CompQuadForm}.
#'
#' @details
#' Parameters must be estimated by maximum likelihood (ML) in order
#' for the P-values computed here to be asymptotically valid.
#' They are computed using the fact that when parameters are
#' estimated by maximum likelihood and the null hypothesis
#' is true, the asymptotic distribution of the GOF statistic
#' is the distribution of an infinite weighted sum
#' of weighted chi-square random variables on 1 degree of freedom.
#' The weights are eigenvalues of an integral equation. They
#' depend on the distribution being tested, the statistic being used,
#' and in some cases on the actual parameter values. These weights
#' are then computed approximately by discretization of the integral
#' equation; when that equation depends on one or more parameter
#' values we use the MLE in the equation.
#'
#' Some notes on the specific distributions: For the Normal, Logistic,
#' Laplace, Extreme Value, Weibull and Exponential distributions, the
#' limiting distributions do not depend on the parameters. For the
#' Gamma distribution, the shape parameter affects the limiting
#' distribution. The tests remain asymptotically valid when the MLE
#' is used to approximate the limit distribution.
#'
#' The Exponential distribution is a special case of the Weibull and
#' Gamma families arising when the shape is known to be 1. Knowing a
#' parameter and therefore not estimating it affects the distribution
#' of the test statistic and the functions provided for the Exponential
#' distribution allow for this.
#'
#' If a data set X_1,...,X_n follows the Weibull distribution then
#' Y_1 = log(X_1), ... ,Y_n = log(X_n) follows the Extreme Value
#' distribution and vice versa. The two procedures give identical
#' test statistics and P-values, in principal.
#'
#' Some of the models have more than one common parametrization. For
#' the Exponential, Gamma, and Weibull distributions, some writers
#' use a rate parameter and some use the scale parameter which is
#' the inverse of the rate. Our code uses the scale parameter.
#'
#' For the Laplace distribution, some writers use the density
#' \eqn{f(x)= exp(-|x-\mu|/\beta)/(2\beta)} in which \eqn{\beta}
#' is a scale parameter. Others use the
#' standard deviation \eqn{\sigma = \beta/\sqrt{2}}. Our code
#' uses the scale parameter.
#'
#' For the Uniform distribution, we offer code for the two parameter
#' Uniform distribution on the range \eqn{\theta_1} to \eqn{\theta_2}.
#' These are estimated by the sample minimum and sample maximum.
#' The probability integral transforms of the remaining n-2 points
#' are then tested for uniformity on the range 0 to 1. This procedure
#' is justified because the these probability integral transforms
#' have exactly this distribution if the original data had a uniform
#' distribution over any interval.
#'
#' It is not unusual to test the hypothesis that a sample follows the
#' standard uniform distribution on [0,1]. In this case the parameters
#' \emph{should not} be estimated. Instead use \code{AD(z)} or \code{CvM(z)} or
#' \code{Watson(z)} to compute the statistic values and then get P-values from
#' \code{AD.uniform.pvalue(a)} or \code{CvM.uniform.pvalue(w)} or
#' \code{Watson.uniform.pvalue(u)} whichever is wanted.
#'
#' @param a Anderson-Darling statistic \eqn{A^2} with a given distribution.
#' @param neig Number of eigenvalues used for \code{\link[CompQuadForm]{imhof}}.
#' @param verbose Logical; if TRUE, print warning messages.
#' @param shape The shape parameter of Gamma distribution.
#'
#' @return P-value of the Anderson-Darling statistic.
#'
#' @seealso
#' \code{\link{AD}} for calculating Anderson-Darling statistic;
#' \code{\link{CvM.pvalue}} for calculating P-value of Cramér-von Mises statistic;
#' \code{\link{Watson.pvalue}} for calculating P-value of Watson statistic.
#'
#' @name AD.pvalue
#' @examples
#' x0=runif(n=100,min=-1,max=1)
#' asq0 = AD.uniform(x0)
#' AD.uniform.pvalue(asq0)
#'
#' x1=rnorm(n=100,mean=0,sd=1)
#' asq1 = AD.normal(x1)
#' AD.normal.pvalue(asq1)
#'
#' x2=rgamma(n=100,shape=1,scale=1)
#' asq2 = AD.gamma(x2)
#' AD.gamma.pvalue(asq2,1)
#'
#' x3=rlogis(n=100,location=0,scale=1)
#' asq3 = AD.logistic(x3)
#' AD.logistic.pvalue(asq3)
#'
#' x4=rmutil::rlaplace(n=100,m=0,s=1)
#' asq4 = AD.laplace(x4)
#' AD.laplace.pvalue(asq4)
#'
#' x5=rweibull(n=100,shape=1,scale=1)
#' asq5 = AD.weibull(x5)
#' AD.weibull.pvalue(asq5)
#' x5_log=log(x5)
#' AD.extremevalue.pvalue(AD.extremevalue(x5_log))
#'
#' x6=rexp(n=100,rate=1/2)
#' asq6 = AD.exp(x6)
#' AD.exp.pvalue(asq6)
NULL

#' @export
#' @rdname AD.pvalue
AD.uniform.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.uniform.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.normal.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.normal.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.gamma.pvalue = function(a,shape,neig = 100,verbose=FALSE){
  e = AD.gamma.eigen(neig,shape=shape)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.logistic.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.logistic.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.laplace.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.laplace.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.weibull.pvalue = function(a,neig=100,verbose=FALSE){
  e=AD.weibull.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.extremevalue.pvalue = function(a,neig=100,verbose=FALSE){
  AD.weibull.pvalue(a=a,neig=neig,verbose=verbose)
}

#' @export
#' @rdname AD.pvalue
AD.exp.pvalue = function(a,neig=100,verbose=FALSE){
  e=AD.exp.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7) #play with eps and limit
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

# Helpers -----------------------------------------------------------------

AD.uniform.eigen = function(n){
  M=AD.uniform.covmat(n)
  e=eigen(M)$values/n
  e/sum(e)
}

AD.uniform.covmat=function(n){
  s=1:n
  s=s/(n+1)
  CvM.uniform.covmat(n)/sqrt(outer(s*(1-s),s*(1-s)))
}

AD.normal.eigen = function(n){
  M=AD.normal.covmat(n)
  e=eigen(M)$values/n
  e
}

AD.normal.covmat=function(n){
  s=1:n
  s=s/(n+1)
  CvM.normal.covmat(n)/sqrt(outer(s*(1-s),s*(1-s)))
}

AD.gamma.eigen=function(n,shape){
  # I don't have code for this yet
  # mean = 1 -?
  s=1:n
  s=s/(n+1)
  M=AD.gamma.covmat(n,shape)
  e=eigen(M)$values/n
  e # *mean/sum(e)
}

AD.gamma.covmat=function(n,shape){
  s=1:n
  s=s/(n+1)
  CvM.gamma.covmat(n,shape=shape)/sqrt(outer(s*(1-s),s*(1-s)))
}

AD.logistic.eigen = function(n){
  mean.asq.logistic = 1-(2*pi^2-3)/(2*(pi^2+3))  # from Maple
  M=AD.logistic.covmat(n)
  e=eigen(M)$values/n
  e*mean.asq.logistic/sum(e)
}

AD.logistic.covmat=function(n){
  s=1:n
  s=s/(n+1)
  t=s
  CvM.logistic.covmat(n)/sqrt(outer(s*(1-s),t*(1-t)))
}

AD.laplace.eigen = function(n){
  mean = 1-.5351471355520514227 # from Maple
  M=AD.laplace.covmat(n)
  e=eigen(M)$values/n
  e  * mean / sum(e)
}

AD.laplace.covmat=function(n){
  s=1:n
  s=s/(n+1)
  s2=sqrt(s*(1-s))
  CvM.laplace.covmat(n)/outer(s2,s2)
}

AD.weibull.eigen = function(n){
  mean.asq.weibull = 0.3868394505  # from Maple
  M=AD.weibull.covmat(n)
  e=eigen(M)$values/n
  e*mean.asq.weibull/sum(e)
}

AD.weibull.covmat=function(n){
  s=1:n
  s=s/(n+1)
  t=s
  CvM.weibull.covmat(n)/sqrt(outer(s*(1-s),t*(1-t)))
}

AD.exp.eigen=function(n){
  mean = 0.595886194 # 3 - 2*zeta(3) from Maple
  s=1:n
  s=s/(n+1)
  M=AD.exp.covmat(n)
  e=eigen(M)$values/n
  e * mean / sum(e)
}

AD.exp.covmat=function(n){
  s=1:n
  s=s/(n+1)
  CvM.exp.covmat(n)/sqrt(outer(s*(1-s),s*(1-s)))
}

LiYao-sfu/EDFtest documentation built on Dec. 18, 2021, 4:35 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LiYao-sfu/EDFtest
Goodness of Fit Based on Empirical Distribution Function

R/AD.pvalue.R
In LiYao-sfu/EDFtest: Goodness of Fit Based on Empirical Distribution Function

Documented in AD.exp.pvalue AD.extremevalue.pvalue AD.gamma.pvalue AD.laplace.pvalue AD.logistic.pvalue AD.normal.pvalue AD.uniform.pvalue AD.weibull.pvalue

R Package Documentation

Browse R Packages

We want your feedback!

LiYao-sfu/EDFtest Goodness of Fit Based on Empirical Distribution Function

R/AD.pvalue.R In LiYao-sfu/EDFtest: Goodness of Fit Based on Empirical Distribution Function

Documented in AD.exp.pvalue AD.extremevalue.pvalue AD.gamma.pvalue AD.laplace.pvalue AD.logistic.pvalue AD.normal.pvalue AD.uniform.pvalue AD.weibull.pvalue

R Package Documentation

Browse R Packages

We want your feedback!

LiYao-sfu/EDFtest
Goodness of Fit Based on Empirical Distribution Function

R/AD.pvalue.R
In LiYao-sfu/EDFtest: Goodness of Fit Based on Empirical Distribution Function