View source: R/SIR_threshold_opt.R
SIR_threshold_opt  R Documentation 
Apply a singleindex SIR
on (X,Y)
with H
slices, with a soft/hard thresholding
of the interest matrix \widehat{\Sigma}_n^{1}\widehat{\Gamma}_n
by an optimal
parameter \lambda_{opt}
. The \lambda_{opt}
is found automatically among a vector
of n_lambda
\lambda
, starting from 0 to the maximum value of
\widehat{\Sigma}_n^{1}\widehat{\Gamma}_n
. For each feature of X
,
the number of \lambda
associated with a selection of this feature is stored
(in a vector of size p
). This vector is sorted in a decreasing way. Then, thanks to
strucchange::breakpoints
, a breakpoint is found in this sorted vector. The coefficients
of the variables at the left of the breakpoint, tend to be automatically toggled to 0 due
to the thresholding operation based on \lambda_{opt}
, and so should be removed (useless
variables). Finally, \lambda_{opt}
corresponds to the first \lambda
such that the
associated \hat{b}
provides the same number of zeros as the breakpoint's value.
For example, for X \in R^{10}
and n_lambda=100
, this sorted vector can look like this :
X10  X3  X8  X5  X7  X9  X4  X6  X2  X1 
2  3  3  4  4  4  6  10  95  100 
Here, the breakpoint would be 8.
SIR_threshold_opt(
Y,
X,
H = 10,
n_lambda = 100,
thresholding = "hard",
graph = TRUE,
output = TRUE,
choice = ""
)
Y 
A numeric vector representing the dependent variable (a response vector). 
X 
A matrix representing the quantitative explanatory variables (bind by column). 
H 
The chosen number of slices (default is 10). 
n_lambda 
The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix. (default is 100). 
thresholding 
The thresholding method to choose between hard and soft (default is hard). 
graph 
A boolean, set to TRUE to plot graphs (default is TRUE). 
output 
A boolean, set to TRUE to print informations (default is TRUE). 
choice 
the graph to plot:

An object of class SIR_threshold_opt, with attributes:
b 
This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix. 
lambdas 
A vector that contains the tested lambdas. 
lambda_opt 
The optimal lambda. 
mat_b 
A matrix of size p*n_lambda that contains an estimation of beta in the columns for each lambda. 
n_lambda 
The number of lambda tested. 
vect_nb_zeros 
The number of 0 in b for each lambda. 
list_relevant_variables 
A list that contains the variables selected by the model. 
fit_bp 
An object of class breakpoints from the strucchange package, that contains informations about the breakpoint which allows to deduce the optimal lambda. 
indices_useless_var 
A vector that contains p items: each variable is associated with the number of lambda that selects this variable. 
vect_cos_squared 
A vector that contains for each lambda, the cosine squared between vanilla SIR and SIR thresholded. 
Y 
The response vector. 
n 
Sample size. 
p 
The number of variables in X. 
H 
The chosen number of slices. 
M1 
The interest matrix thresholded with the optimal lambda. 
thresholding 
The thresholding method used. 
call 
Unevaluated call to the function. 
X_reduced 
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b. 
index_pred 
The index Xb' estimated by SIR. 
# Generate Data
set.seed(2)
n < 200
beta < c(1,1,rep(0,8))
X < mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps < rnorm(n)
Y < (X%*%beta)**3+eps
# Apply SIR with soft thresholding
SIR_threshold_opt(Y,X,H=10,n_lambda=300,thresholding="soft")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.