SIR_threshold_bootstrap: SIR optimally thresholded on bootstraped replications

View source: R/SIR_threshold_bootstrap.R

SIR_threshold_bootstrapR Documentation

SIR optimally thresholded on bootstraped replications

Description

Apply a single-index optimally soft/hard thresholded SIR with H slices on 'n_replications' bootstraped replications of (X,Y). The optimal number of selected variables is the number of selected variables that came back most often among the replications performed. From this, we can get the corresponding \hat{b} and \lambda_{opt} that produce the same number of selected variables in the result of 'SIR_threshold_opt'.

Usage

SIR_threshold_bootstrap(
  Y,
  X,
  H = 10,
  thresholding = "hard",
  n_replications = 50,
  graph = TRUE,
  output = TRUE,
  n_lambda = 100,
  k = 2,
  choice = ""
)

Arguments

Y

A numeric vector representing the dependent variable (a response vector).

X

A matrix representing the quantitative explanatory variables (bind by column).

H

The chosen number of slices (default is 10).

thresholding

The thresholding method to choose between hard and soft (default is hard).

n_replications

The number of bootstraped replications of (X,Y) done to estimate the model (default is 50).

graph

A boolean, set to TRUE to plot graphs (default is TRUE).

output

A boolean, set to TRUE to print information (default is TRUE).

n_lambda

The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix (default is 100).

k

Multiplication factor of the bootstrapped sample size (default is 1 = keep the same size as original data).

choice

the graph to plot:

  • "estim_ind" Plot the estimated index by the SIR model versus Y.

  • "size" Plot the size of the models across the replications.

  • "selec_var" Plot the occurrence of the selected variables across the replications.

  • "coefs_b" Plot the value of b across the replications.

  • "lambdas_replic" Plot the optimal lambdas across the replications.

  • "" Plot every graphs (default).

Value

An object of class SIR_threshold_bootstrap, with attributes:

b

This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix.

lambda_opt

The optimal lambda.

vec_nb_var_selec

Vector that contains the number of selected variables for each replications.

occurrences_var

Vector that contains at index i the number of times the i_th variable has been selected in a replication.

call

Unevaluated call to the function.

nb_var_selec_opt

Optimal number of selected variables which is the number of selected variables that came back most often among the replications performed.

list_relevant_variables

A list that contains the variables selected by the model.

n

Sample size.

p

The number of variables in X.

H

The chosen number of slices.

n_replications

The number of bootstraped replications of (X,Y) done to estimate the model.

thresholding

The thresholding method used.

X_reduced

The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b.

mat_b

Contains the estimation b at each bootstraped replications.

lambdas_opt_boot

Contains the optimal lambda found by SIR_threshold_opt at each replication.

index_pred

The index Xb' estimated by SIR.

Y

The response vector.

M1

The interest matrix thresholded with the optimal lambda.

Examples


# Generate Data
set.seed(8)
n <-  170
beta <- c(1,1,1,1,1,rep(0,15))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,20))
eps <- rnorm(n,sd=8)
Y <- (X%*%beta)**3+eps

# Apply SIR with hard thresholding
SIR_threshold_bootstrap(Y,X,H=10,n_lambda=300,thresholding="hard", n_replications=30,k=2)


SIRthresholded documentation built on July 10, 2023, 2:03 a.m.