twosample_power: Power Estimation for Multivariate Two-Sample Tests
In MD2sample: Various Methods for the Two Sample Problem in D>1 Dimensions

twosample_power

R Documentation

Power Estimation for Multivariate Two-Sample Tests

Description

Estimate the power of various two sample tests using Rcpp and parallel computing.

Usage

twosample_power(
  f,
  ...,
  TS,
  TSextra,
  alpha = 0.05,
  B = 1000,
  nbins = c(5, 5),
  minexpcount = 5,
  Ranges = matrix(c(-Inf, Inf, -Inf, Inf), 2, 2),
  samplingmethod = "Binomial",
  rnull,
  With.p.value = FALSE,
  DoTransform = TRUE,
  SuppressMessages = FALSE,
  LargeSampleOnly = FALSE,
  maxProcessor,
  doMethods = "all"
)

Arguments

`f`	function to generate a list with data sets x and y for continuous data or a matrix with columns vals_x, vals_y, x and y for discrete data.
`...`	additional arguments passed to f, up to 2.
`TS`	routine to calculate test statistics for new tests.
`TSextra`	additional info passed to TS, if necessary.
`alpha`	=0.05, the type I error probability of the hypothesis test.
`B`	=1000, number of simulation runs.
`nbins`	=c(5, 5), number of bins for chi square test if Dim=2.
`minexpcount`	=5, lowest required count for chi-square test.
`Ranges`	=matrix(c(-Inf, Inf, -Inf, Inf),2,2), a 2x2 matrix with lower and upper bounds.
`samplingmethod`	="Binomial" for Binomial sampling or "independence" for independence sampling in the discrete data case.
`rnull`	function to generate new data sets for parametric bootstrap.
`With.p.value`	=FALSE, does user supplied routine return p values?
`DoTransform`	=TRUE, should data be transformed to to unit hypercube?
`SuppressMessages`	=FALSE, should messages be printed?
`LargeSampleOnly`	=FALSE, should only methods with large sample theories be run?
`maxProcessor`	number of cores to use. If missing the number of physical cores-1 is used. If set to 1 no parallel processing is done.
`doMethods`	="all", which methods should be included?

Details

For details consult vignette("MD2sample","MD2sample")

Value

A numeric matrix or vector of power values.

Examples

#Note that the resulting power estimates are meaningless because
#of the extremely low number of simulation runs B, required because of CRAN timing rule
#
#Power of tests when one data set comes from a standard normal multivariate distribution function
#and the other data set from a multivariate normal with correlation
#number of simulation runs is ridiculously small because of CRAN submission rules
f=function(a=0) {
 S=diag(2) 
 x=mvtnorm::rmvnorm(100, sigma = S)
 S[1,2]=a
 S[2,1]=a
 y=mvtnorm::rmvnorm(120, sigma = S)
 list(x=x, y=y)
}
twosample_power(f, c(0, 0.5), B=10, maxProcessor=1)
#Power of use supplied test. Example is a (included) chi-square test:
TSextra=list(which="statistics", nbins=rbind(c(3,3), c(4,4)))
twosample_power(f, c(0, 0.5), TS=chiTS.cont, TSextra=TSextra, B=10, maxProcessor=1)
#Same example, but this time the user supplied routine calculates p values:
TSextra=list(which="pvalues", nbins=c(4,4))
twosample_power(f, c(0, 0.5), TS=chiTS.cont, TSextra=TSextra, B=10, 
             With.p.value=TRUE, maxProcessor=1)
#Example for discrete data
g=function(p1, p2) {
  x = table(sample(1:4, size=1000, replace = TRUE))
  y = table(sample(1:4, size=500, replace = TRUE, prob=c(p1,p2,1,1)))
  cbind(vals_x=rep(1:2,2),  vals_y=rep(1:2, each=2), x=x, y=y)
}  
twosample_power(g, 1.5, 1.6, B=10, maxProcessor=1)

MD2sample documentation built on Aug. 8, 2025, 7:10 p.m.