R/AD.pvalue.R

Defines functions AD.exp.covmat AD.exp.eigen AD.weibull.covmat AD.weibull.eigen AD.laplace.covmat AD.laplace.eigen AD.logistic.covmat AD.logistic.eigen AD.gamma.covmat AD.gamma.eigen AD.normal.covmat AD.normal.eigen AD.uniform.covmat AD.uniform.eigen AD.exp.pvalue AD.extremevalue.pvalue AD.weibull.pvalue AD.laplace.pvalue AD.logistic.pvalue AD.gamma.pvalue AD.normal.pvalue AD.uniform.pvalue

Documented in AD.exp.pvalue AD.extremevalue.pvalue AD.gamma.pvalue AD.laplace.pvalue AD.logistic.pvalue AD.normal.pvalue AD.uniform.pvalue AD.weibull.pvalue

#' P-value of Anderson-Darling statistic
#'
#' @description
#' Compute the P-value of the given Anderson-Darling statistic \eqn{A^2}
#' using \code{\link[CompQuadForm]{imhof}} function in \code{CompQuadForm}.
#'
#' @details
#' Parameters must be estimated by maximum likelihood (ML) in order
#' for the P-values computed here to be asymptotically valid.
#' They are computed using the fact that when parameters are
#' estimated by maximum likelihood and the null hypothesis
#' is true, the asymptotic distribution of the GOF statistic
#' is the distribution of an infinite weighted sum
#' of weighted chi-square random variables on 1 degree of freedom.
#' The weights are eigenvalues of an integral equation. They
#' depend on the distribution being tested, the statistic being used,
#' and in some cases on the actual parameter values. These weights
#' are then computed approximately by discretization of the integral
#' equation; when that equation depends on one or more parameter
#' values we use the MLE in the equation.
#'
#' Some notes on the specific distributions: For the Normal, Logistic,
#' Laplace, Extreme Value, Weibull and Exponential distributions, the
#' limiting distributions do not depend on the parameters. For the
#' Gamma distribution, the shape parameter affects the limiting
#' distribution. The tests remain asymptotically valid when the MLE
#' is used to approximate the limit distribution.
#'
#' The Exponential distribution is a special case of the Weibull and
#' Gamma families arising when the shape is known to be 1. Knowing a
#' parameter and therefore not estimating it affects the distribution
#' of the test statistic and the functions provided for the Exponential
#' distribution allow for this.
#'
#' If a data set X_1,...,X_n follows the Weibull distribution then
#' Y_1 = log(X_1), ... ,Y_n = log(X_n) follows the Extreme Value
#' distribution and vice versa. The two procedures give identical
#' test statistics and P-values, in principal.
#'
#' Some of the models have more than one common parametrization. For
#' the Exponential, Gamma, and Weibull distributions, some writers
#' use a rate parameter and some use the scale parameter which is
#' the inverse of the rate. Our code uses the scale parameter.
#'
#' For the Laplace distribution, some writers use the density
#' \eqn{f(x)= exp(-|x-\mu|/\beta)/(2\beta)} in which \eqn{\beta}
#' is a scale parameter. Others use the
#' standard deviation \eqn{\sigma = \beta/\sqrt{2}}. Our code
#' uses the scale parameter.
#'
#' For the Uniform distribution, we offer code for the two parameter
#' Uniform distribution on the range \eqn{\theta_1} to \eqn{\theta_2}.
#' These are estimated by the sample minimum and sample maximum.
#' The probability integral transforms of the remaining n-2 points
#' are then tested for uniformity on the range 0 to 1. This procedure
#' is justified because the these probability integral transforms
#' have exactly this distribution if the original data had a uniform
#' distribution over any interval.
#'
#' It is not unusual to test the hypothesis that a sample follows the
#' standard uniform distribution on [0,1]. In this case the parameters
#' \emph{should not} be estimated. Instead use \code{AD(z)} or \code{CvM(z)} or
#' \code{Watson(z)} to compute the statistic values and then get P-values from
#' \code{AD.uniform.pvalue(a)} or \code{CvM.uniform.pvalue(w)} or
#' \code{Watson.uniform.pvalue(u)} whichever is wanted.
#'
#' @param a Anderson-Darling statistic \eqn{A^2} with a given distribution.
#' @param neig Number of eigenvalues used for \code{\link[CompQuadForm]{imhof}}.
#' @param verbose Logical; if TRUE, print warning messages.
#' @param shape The shape parameter of Gamma distribution.
#'
#' @return P-value of the Anderson-Darling statistic.
#'
#' @seealso
#' \code{\link{AD}} for calculating Anderson-Darling statistic;
#' \code{\link{CvM.pvalue}} for calculating P-value of Cramér-von Mises statistic;
#' \code{\link{Watson.pvalue}} for calculating P-value of Watson statistic.
#'
#' @name AD.pvalue
#' @examples
#' x0=runif(n=100,min=-1,max=1)
#' asq0 = AD.uniform(x0)
#' AD.uniform.pvalue(asq0)
#'
#' x1=rnorm(n=100,mean=0,sd=1)
#' asq1 = AD.normal(x1)
#' AD.normal.pvalue(asq1)
#'
#' x2=rgamma(n=100,shape=1,scale=1)
#' asq2 = AD.gamma(x2)
#' AD.gamma.pvalue(asq2,1)
#'
#' x3=rlogis(n=100,location=0,scale=1)
#' asq3 = AD.logistic(x3)
#' AD.logistic.pvalue(asq3)
#'
#' x4=rmutil::rlaplace(n=100,m=0,s=1)
#' asq4 = AD.laplace(x4)
#' AD.laplace.pvalue(asq4)
#'
#' x5=rweibull(n=100,shape=1,scale=1)
#' asq5 = AD.weibull(x5)
#' AD.weibull.pvalue(asq5)
#' x5_log=log(x5)
#' AD.extremevalue.pvalue(AD.extremevalue(x5_log))
#'
#' x6=rexp(n=100,rate=1/2)
#' asq6 = AD.exp(x6)
#' AD.exp.pvalue(asq6)
NULL

#' @export
#' @rdname AD.pvalue
AD.uniform.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.uniform.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.normal.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.normal.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.gamma.pvalue = function(a,shape,neig = 100,verbose=FALSE){
  e = AD.gamma.eigen(neig,shape=shape)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.logistic.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.logistic.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.laplace.pvalue = function(a,neig=100,verbose=FALSE){
  e = AD.laplace.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.weibull.pvalue = function(a,neig=100,verbose=FALSE){
  e=AD.weibull.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7)
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

#' @export
#' @rdname AD.pvalue
AD.extremevalue.pvalue = function(a,neig=100,verbose=FALSE){
  AD.weibull.pvalue(a=a,neig=neig,verbose=verbose)
}

#' @export
#' @rdname AD.pvalue
AD.exp.pvalue = function(a,neig=100,verbose=FALSE){
  e=AD.exp.eigen(neig)
  plb=pchisq(a/max(e),df=1,lower.tail = FALSE)
  warn=getOption("warn")
  im = imhof(a,lambda=e,epsabs = 1e-9,limit=2^7) #play with eps and limit
  options(warn=warn)
  aerror=im$abserr
  p=im$Qq
  if(p<0&&verbose)cat("for A = ",a," and neig = ",neig,
                      " imhof returned a negative probability\n")
  if(p<plb){
    p=plb
    if(verbose) cat("for A = ",a," and neig = ",neig,
                    " p was replaced by a lower bound on p: ",plb, "\n")
  }
  list(P=p,error=aerror)
}

# Helpers -----------------------------------------------------------------

AD.uniform.eigen = function(n){
  M=AD.uniform.covmat(n)
  e=eigen(M)$values/n
  e/sum(e)
}

AD.uniform.covmat=function(n){
  s=1:n
  s=s/(n+1)
  CvM.uniform.covmat(n)/sqrt(outer(s*(1-s),s*(1-s)))
}

AD.normal.eigen = function(n){
  M=AD.normal.covmat(n)
  e=eigen(M)$values/n
  e
}

AD.normal.covmat=function(n){
  s=1:n
  s=s/(n+1)
  CvM.normal.covmat(n)/sqrt(outer(s*(1-s),s*(1-s)))
}

AD.gamma.eigen=function(n,shape){
  # I don't have code for this yet
  # mean = 1 -?
  s=1:n
  s=s/(n+1)
  M=AD.gamma.covmat(n,shape)
  e=eigen(M)$values/n
  e # *mean/sum(e)
}

AD.gamma.covmat=function(n,shape){
  s=1:n
  s=s/(n+1)
  CvM.gamma.covmat(n,shape=shape)/sqrt(outer(s*(1-s),s*(1-s)))
}

AD.logistic.eigen = function(n){
  mean.asq.logistic = 1-(2*pi^2-3)/(2*(pi^2+3))  # from Maple
  M=AD.logistic.covmat(n)
  e=eigen(M)$values/n
  e*mean.asq.logistic/sum(e)
}

AD.logistic.covmat=function(n){
  s=1:n
  s=s/(n+1)
  t=s
  CvM.logistic.covmat(n)/sqrt(outer(s*(1-s),t*(1-t)))
}

AD.laplace.eigen = function(n){
  mean = 1-.5351471355520514227 # from Maple
  M=AD.laplace.covmat(n)
  e=eigen(M)$values/n
  e  * mean / sum(e)
}

AD.laplace.covmat=function(n){
  s=1:n
  s=s/(n+1)
  s2=sqrt(s*(1-s))
  CvM.laplace.covmat(n)/outer(s2,s2)
}

AD.weibull.eigen = function(n){
  mean.asq.weibull = 0.3868394505  # from Maple
  M=AD.weibull.covmat(n)
  e=eigen(M)$values/n
  e*mean.asq.weibull/sum(e)
}

AD.weibull.covmat=function(n){
  s=1:n
  s=s/(n+1)
  t=s
  CvM.weibull.covmat(n)/sqrt(outer(s*(1-s),t*(1-t)))
}

AD.exp.eigen=function(n){
  mean = 0.595886194 # 3 - 2*zeta(3) from Maple
  s=1:n
  s=s/(n+1)
  M=AD.exp.covmat(n)
  e=eigen(M)$values/n
  e * mean / sum(e)
}

AD.exp.covmat=function(n){
  s=1:n
  s=s/(n+1)
  CvM.exp.covmat(n)/sqrt(outer(s*(1-s),s*(1-s)))
}
LiYao-sfu/EDFtest documentation built on Dec. 18, 2021, 4:35 a.m.