hdsci: Construct Simultaneous Confidence Interval
In linulysses/hdanova: High-dimensional ANOVA and Simultaneous Confidence Intervals

Description Usage Arguments Details Value References Examples

View source: R/sci.R

Construct (1-alpha) simultaneous confidence interval (SCI) for the mean or difference of means of high-dimensional vectors.

hdsci(
  X,
  alpha = 0.05,
  side = "both",
  tau = 1/(1 + exp(-0.8 * seq(-6, 5, by = 1))),
  B = ceiling(50/alpha),
  pairs = NULL,
  Sig = NULL,
  verbose = F,
  tau.method = "MGB",
  R = 10 * ceiling(1/alpha),
  ncore = 1,
  cuda = T,
  nblock = 32,
  tpb = 64,
  seed = sample.int(2^30, 1)
)

`X`	a matrix (one-sample) or a list of matrices (multiple-samples), with each row representing an observation.
`alpha`	significance level; default value: 0.05.
`side`	either of `'lower','upper'` or `'both'`; default value: `'both'`.
`tau`	real number(s) in the interval `[0,1)` that specifies the decay parameter and is automatically selected if it is set to `NULL` or multiple values are provided; default value: `NULL`, which is equivalent to `tau=1/(1+exp(-0.8*seq(-6,5,by=1))).`
`B`	the number of bootstrap replicates; default value: `ceiling(50/alpha)`.
`pairs`	a matrix with two columns, only used when there are more than two populations, where each row specifies a pair of populations for which the SCI is constructed; default value: `NULL`, so that SCIs for all pairs are constructed.
`Sig`	a matrix (one-sample) or a list of matrices (multiple-samples), each of which is the covariance matrix of a sample; default value: `NULL`, so that it is automatically estimated from data.
`verbose`	TRUE/FALSE, indicator of whether to output diagnostic information or report progress; default value: FALSE.
`tau.method`	the method to select tau; possible values are 'MGB' (default), 'MGBA', 'RMGB', 'RMGBA', 'WB' and 'WBA' (see details).
`R`	the number of Monte Carlo replicates for estimating the empirical size; default: `ceiling(25/alpha)`
`ncore`	the number of CPU cores to be used; default value: 1.
`cuda`	T/F to indicate whether to use CUDA GPU implementation when the package `hdanova.cuda` is installed. This option takes effect only when `ncore=1`.
`nblock`	the number of block in CUDA computation
`tpb`	number of threads per block; the maximum number of total number of parallel GPU threads is then `nblock*tpb`
`seed`	the seed for random number generator

Four methods to select the decay parameter tau are provided. Using the fact that a SCI is equivalent to a hypothesis test problem, all of them first identify a set of good candidates which give rise to test that respects the specified level alpha, and then select a candidate that minimizes the p-value. These methods differ in how to identify the good candidates.

MGB: for this method, conditional on the data X, R=10*ceiling(1/alpha) i.i.d. zero-mean multivariate Gaussian samples (called MGB samples here) are drawn, where the covariance of each sample is equal to the sample covariance matrix Sig of the data X. For each candidate value in tau, 1) the empirical distribution of the corresponding max/min statistic is obtained by reusing the same bootstrapped sample, 2) the corresponding p-value is obtained, and 3) the size is estimated by applying the test to all MGB samples. The candidate values with the empirical size closest to alpha are considered as good candidates.
MGBA: an slightly more aggressive version of MGB, where the candidate values with the estimated empirical size no larger than alpha are considered good candidates.
RMGB: this method is similar to MGB, except that for each MGB sample, the covariance matrix is the sample covariance matrix of a resampled (with replacement) data X.
RMGBA: an slightly more aggressive version of RMGB, where the candidate values with the estimated empirical size no larger than alpha are considered good candidates.
WB: for this method, conditional on X, R=10*ceiling(1/alpha) i.i.d. samples (called WB samples here) are drawn by resampling X with replacement. For each candidate value in tau, 1) the corresponding p-value is obtained, and 2) the size is estimated by applying the test to all WB samples without reusing the bootstrapped sample. The candidate values with the empirical size closest to alpha are considered as good candidates.
WBA: an slightly more aggressive version of WB, where the candidate values with the estimated empirical size no larger than alpha are considered good candidates.

Among these methods, MGB and MGBA are recommended, since they are computationally more efficiently and often yield good performance. The MGBA might have slightly larger empirical size. The WB and WBA methods may be subject to outliers, in which case they become more conservative. The RMGB is computationally slightly slower than WB, but is less subject to outliers.

a list of the following objects:

sci

the constructed SCI, which is a list of the following objects:

sci.lower: a vector (when <= two samples) or a list of vectors (when >= 3 samples) specifying the lower bound of the SCI for the mean (one-sample) or the difference of means of each pair of samples.
sci.upper: a vector (when <= two samples) or a list of vectors (when >= 3 samples) specifying the upper bound of the SCI.
pairs: a matrix of two columns, each row containing the a pair of indices of samples of which the SCI of the difference in mean is constructed.
tau: the decay parameter that is used to construct the SCI.
Mn: the sorted (in increasing order) bootstrapped max statistic.
Ln: the sorted (in increasing order) bootstrapped min statistic.
side: the input side.
alpha: the input alpha.

tau

a vector of candidate values of the decay parameter.

sci.tau

a list of sci objects corresponding to the candidate values in tau.

selected.tau

the selected value of the decay parameter from tau.

side

the input side.

alpha

the input alpha.

pairs

a matrix of two columns, each row containing the a pair of indices of samples of which the SCI of the difference in mean is constructed.

sigma2

a vector (for one sample) or a list (for multiple samples) of vectors containing variance for each coordinate.

\insertRef

Lopes2020hdanova

\insertRef

Lin2020hdanova

 
# simulate a dataset of 4 samples
X <- lapply(1:4, function(g) MASS::mvrnorm(30,rep(0,10),diag((1:10)^(-0.5*g))))

# construct SCIs for the mean vectors with pairs={(1,3),(2,4)}
hdsci(X,alpha=0.05,pairs=matrix(1:4,2,2))$sci