sampleH: samples within the acceptance region defined by the kernel...
In kernelPSI: Post-Selection Inference for Nonlinear Variable Selection

Description Usage Arguments Details Value References Examples

To approximate the distribution of the test statistics, we iteratively sample replicates of the response in order to generate replicates of the test statistics. The response replicates are iteratively sampled within the acceptance region of the selection event. The goal of the constrained sampling is to obtain a valid post-selection distribution of the test statistic. To perform the constrained sampling, we develop a hit-and-run sampler based on the hypersphere directions algorithm (see references).

sampleH(
  A,
  initial,
  n_replicates,
  mu = 0,
  sigma = 1,
  n_iter = 1e+05,
  burn_in = 1000
)

`A`	list of matrices modeling the quadratic constraints of the selection event
`initial`	initialization sample. This sample must belong to the acceptance region given by `A`. In practice, this parameter is set to the outcome of the original dataset.
`n_replicates`	total number of replicates to be generated
`mu`	mean of the outcome
`sigma`	standard deviation of the outcome
`n_iter`	maximum number of rejections for the parameter λ in a single iteration
`burn_in`	number of burn-in iterations

Given the iterative nature of the sampler, a large number of n_replicates and burn_in iterations is needed to correctly approximate the test statistics distributions.

For high-dimensional responses, and depending on the initialization, the sampler may not scale well to generate tens of thousands of replicates because of an intermediate rejection sampling step.

a matrix with n_replicates columns where each column contains a sample within the acceptance region

Berbee, H. C. P., Boender, C. G. E., Rinnooy Ran, A. H. G., Scheffer, C. L., Smith, R. L., & Telgen, J. (1987). Hit-and-run algorithms for the identification of non-redundant linear inequalities. Mathematical Programming, 37(2), 184–207.

Belisle, C. J. P., Romeijn, H. E., & Smith, R. L. (2016). HIT-AND-RUN ALGORITHMS FOR GENERATING MULTIVARIATE DISTRIBUTIONS, 18(2), 255–266.

n <- 30
p <- 20
K <- replicate(5, matrix(rnorm(n*p), nrow = n, ncol = p), simplify = FALSE)
K <-  sapply(K, function(X) return(X %*% t(X) / dim(X)[2]), simplify = FALSE)
Y <- rnorm(n)
L <- Y %*% t(Y)
selection <- FOHSIC(K, L, 2)
constraintQ <- forwardQ(K, select = selection)
samples <- sampleH(A = constraintQ, initial = Y,
                   n_replicates = 50, burn_in = 20)