splitFlip | R Documentation |
This function computes resampling-based standardized scores for high-dimensional linear regression.
splitFlip(X, Y, Q = 50, B = 200, target = NULL, varSel = selLasso, varSelArgs = NULL, exact = FALSE, maxRepeat = 20, seed = NULL)
X |
numeric design matrix (including the intercept), where columns correspond to variables, and rows to observations. |
Y |
numeric response vector. |
Q |
numer of data splits. |
B |
number of sign flips. |
target |
maximum number of variables to be selected. |
varSel |
a function to perform variable selection. It must have at least three arguments:
|
varSelArgs |
named list of further arguments for |
exact |
logical, |
maxRepeat |
maximum number of split trials. |
seed |
seed. |
The data are iteratively split into two subsets of equal size for Q
times.
For each split, the first subset is used to perform variable selection,
while the second is used to compute the effective scores for
each variable and B
random sign flips (including the identity).
If a variable is not selected, its score is set to zero. For each variable and each sign flip, the standardized score is defined as (an approximation of)
the sum of the effective scores over the Q
splits, divided by its variance.
If too many variables are selected in a split (more than half the sample size),
a warning is returned and the data is randomly split again.
After maxRepeat
trials where too many variables are selected, the function returns an error message.
splitFlip
returns a numeric matrix of standardized scores, where columns correspond to variables,
and rows to B
random sign flips. The first flip is the identity.
Anna Vesely.
# generate linear regression data with 20 variables and 10 observations
res <- simData(m1=2, m=20, n=10, rho=0.5, type="toeplitz", SNR=5, seed=42)
X <- res$X # design matrix
Y <- res$Y # response vector
active <- res$active # indices of active variables
# choose target as twice the number of active variables
target <- 2*length(active)
# standardized scores using the approximate method with Lasso selection of target variables
G1 <- splitFlip(X, Y, target=target, seed=42)
# maxT algorithm
maxT(G1, alpha=0.1)
# standardized scores using the exact method with oracle selection of target variables
G2 <- splitFlip(X, Y, target=target, varSel=selOracle, varSelArgs=list(toSel=active), seed=42)
# maxT algorithm
maxT(G2, alpha=0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.