e_dist: Computes the energy distance of a point set
In support: Support Points

View source: R/criteria.R

e_dist

R Documentation

Computes the energy distance of a point set

Description

e_dist computes the energy distance between points D and a target distribution (or big dataset) F. The cross-term E[||X-X'||], X,X'~F is NOT computed in e_dist for computational efficiency, since this is not needed for optimizing D. The target distribution or big dataset can be set using dist.str or dist.samp, respectively.

Usage

  e_dist(D, dist.str=NA, dist.param=vector("list",ncol(D)),
       nsamp=1e6, dist.samp=NA)

Arguments

`D`	An `n` x `p` point set.
`dist.str`	A `p`-length string vector for target distribution (assuming independence). Current options include uniform, normal, exponential, gamma, lognormal, student-t, weibull, cauchy and beta. Exactly one of `dist.str` or `dist.samp` should be `NA`.
`dist.param`	A `p`-length list for desired distribution parameters in `dist.str`. The following parameters are supported: Uniform: Minimum, maximum; Normal: Mean, standard deviation; Exponential: Rate parameter; Gamma: Shape parameter, scale parameter; Lognormal: Log-mean, Log-standard deviation; Student-t: Degree-of-freedom; Weibull: Shape parameter, scale parameter; Cauchy: Location parameter, scale parameter; Beta: Shape parameter 1, shape parameter 2.
`nsamp`	Number of samples to draw from `dist.str` for comparison.
`dist.samp`	An `N` x `p` matrix for the target big dataset (e.g., MCMC chain). Exactly one of `dist.str` or `dist.samp` should be `NA`.

References

Szekely, G. J. and Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143(8):1249-1272.

Examples

    ## Not run: 
    #############################################################
    # Generate 25 SPs for the 2-d i.i.d. N(0,1) distribution
    #############################################################
    n <- 25 #number of points
    p <- 2 #dimension
    D <- sp(n,p,dist.str=rep("normal",p))
    Drnd <- matrix(rnorm(n*p),ncol=p)
    e_dist(D$sp,dist.str=rep("normal",p)) #smaller
    e_dist(Drnd,dist.str=rep("normal",p))
  
  
    #############################################################
    # Support points for big data reduction: Franke's function
    #############################################################
  #  library(MHadaptive) # Package archived, but you can use your favorite MCMC sampler
    
  #  #Generate MCMC samples
  #  li_func <- franke2d #Desired log-posterior
  #  ini <- c(0.5,0.5) #Initial point for MCMc
  #  NN <- 1e5 #Number of MCMC samples desired
  #  burnin <- NN/2 #Number of burn-in runs
  #  mcmc_franke <- Metro_Hastings(li_func, pars=ini, prop_sigma=0.05*diag(2),
  #                           iterations=NN, burn_in=burnin)
    data(mcmc_franke) # Loading MCMC sample from data
    
    #Use modified Franke's function as posterior
    franke2d <- function(xx){
      if ((xx[1]>1)||(xx[1]<0)||(xx[2]>1)||(xx[2]<0)){
        return(-Inf)
      }
      else{
        x1 <- xx[1]
        x2 <- xx[2]
        
        term1 <- 0.75 * exp(-(9*x1-2)^2/4 - (9*x2-2)^2/4)
        term2 <- 0.75 * exp(-(9*x1+1)^2/49 - (9*x2+1)/10)
        term3 <- 0.5 * exp(-(9*x1-7)^2/4 - (9*x2-3)^2/4)
        term4 <- -0.2 * exp(-(9*x1-4)^2 - (9*x2-7)^2)
        
        y <- term1 + term2 + term3 + term4
        return(2*log(y))
      }
    }
    
    #Generate ncur SPs
    ncur <- 50
    D <- sp(ncur,2,dist.samp=mcmc_franke$trace)$sp
    Drnd <- mcmc_franke$trace[sample(1:nrow(mcmc_franke$trace),n,F),]
    e_dist(D,dist.samp=mcmc_franke$trace) #smaller
    e_dist(Drnd,dist.samp=mcmc_franke$trace)
    
## End(Not run)

support documentation built on Aug. 8, 2025, 6:05 p.m.