sampleH: samples within the acceptance region defined by the kernel...

Description Usage Arguments Details Value References Examples

View source: R/RcppExports.R

Description

To approximate the distribution of the test statistics, we iteratively sample replicates of the response in order to generate replicates of the test statistics. The response replicates are iteratively sampled within the acceptance region of the selection event. The goal of the constrained sampling is to obtain a valid post-selection distribution of the test statistic. To perform the constrained sampling, we develop a hit-and-run sampler based on the hypersphere directions algorithm (see references).

Usage

1
2
3
4
5
6
7
8
9
sampleH(
  A,
  initial,
  n_replicates,
  mu = 0,
  sigma = 1,
  n_iter = 1e+05,
  burn_in = 1000
)

Arguments

A

list of matrices modeling the quadratic constraints of the selection event

initial

initialization sample. This sample must belong to the acceptance region given by A. In practice, this parameter is set to the outcome of the original dataset.

n_replicates

total number of replicates to be generated

mu

mean of the outcome

sigma

standard deviation of the outcome

n_iter

maximum number of rejections for the parameter λ in a single iteration

burn_in

number of burn-in iterations

Details

Given the iterative nature of the sampler, a large number of n_replicates and burn_in iterations is needed to correctly approximate the test statistics distributions.

For high-dimensional responses, and depending on the initialization, the sampler may not scale well to generate tens of thousands of replicates because of an intermediate rejection sampling step.

Value

a matrix with n_replicates columns where each column contains a sample within the acceptance region

References

Berbee, H. C. P., Boender, C. G. E., Rinnooy Ran, A. H. G., Scheffer, C. L., Smith, R. L., & Telgen, J. (1987). Hit-and-run algorithms for the identification of non-redundant linear inequalities. Mathematical Programming, 37(2), 184–207.

Belisle, C. J. P., Romeijn, H. E., & Smith, R. L. (2016). HIT-AND-RUN ALGORITHMS FOR GENERATING MULTIVARIATE DISTRIBUTIONS, 18(2), 255–266.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
n <- 30
p <- 20
K <- replicate(5, matrix(rnorm(n*p), nrow = n, ncol = p), simplify = FALSE)
K <-  sapply(K, function(X) return(X %*% t(X) / dim(X)[2]), simplify = FALSE)
Y <- rnorm(n)
L <- Y %*% t(Y)
selection <- FOHSIC(K, L, 2)
constraintQ <- forwardQ(K, select = selection)
samples <- sampleH(A = constraintQ, initial = Y,
                   n_replicates = 50, burn_in = 20)

kernelPSI documentation built on Dec. 8, 2019, 1:07 a.m.