View source: R/SIR_threshold_bootstrap.R
SIR_threshold_bootstrap | R Documentation |
Apply a single-index optimally soft/hard thresholded SIR
with H
slices on
'n_replications' bootstraped replications of (X,Y)
. The optimal number of
selected variables is the number of selected variables that came back most often
among the replications performed. From this, we can get the corresponding \hat{b}
and \lambda_{opt}
that produce the same number of selected variables in the result of
'SIR_threshold_opt'.
SIR_threshold_bootstrap(
Y,
X,
H = 10,
thresholding = "hard",
n_replications = 50,
graph = TRUE,
output = TRUE,
n_lambda = 100,
k = 2,
choice = ""
)
Y |
A numeric vector representing the dependent variable (a response vector). |
X |
A matrix representing the quantitative explanatory variables (bind by column). |
H |
The chosen number of slices (default is 10). |
thresholding |
The thresholding method to choose between hard and soft (default is hard). |
n_replications |
The number of bootstraped replications of (X,Y) done to estimate the model (default is 50). |
graph |
A boolean, set to TRUE to plot graphs (default is TRUE). |
output |
A boolean, set to TRUE to print information (default is TRUE). |
n_lambda |
The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix (default is 100). |
k |
Multiplication factor of the bootstrapped sample size (default is 1 = keep the same size as original data). |
choice |
the graph to plot:
|
An object of class SIR_threshold_bootstrap, with attributes:
b |
This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix. |
lambda_opt |
The optimal lambda. |
vec_nb_var_selec |
Vector that contains the number of selected variables for each replications. |
occurrences_var |
Vector that contains at index i the number of times the i_th variable has been selected in a replication. |
call |
Unevaluated call to the function. |
nb_var_selec_opt |
Optimal number of selected variables which is the number of selected variables that came back most often among the replications performed. |
list_relevant_variables |
A list that contains the variables selected by the model. |
n |
Sample size. |
p |
The number of variables in X. |
H |
The chosen number of slices. |
n_replications |
The number of bootstraped replications of (X,Y) done to estimate the model. |
thresholding |
The thresholding method used. |
X_reduced |
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b. |
mat_b |
Contains the estimation b at each bootstraped replications. |
lambdas_opt_boot |
Contains the optimal lambda found by SIR_threshold_opt at each replication. |
index_pred |
The index Xb' estimated by SIR. |
Y |
The response vector. |
M1 |
The interest matrix thresholded with the optimal lambda. |
# Generate Data
set.seed(8)
n <- 170
beta <- c(1,1,1,1,1,rep(0,15))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,20))
eps <- rnorm(n,sd=8)
Y <- (X%*%beta)**3+eps
# Apply SIR with hard thresholding
SIR_threshold_bootstrap(Y,X,H=10,n_lambda=300,thresholding="hard", n_replications=30,k=2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.