View source: R/variable_selection.R
SerialRegression | R Documentation |
Runs stability selection regression models with different combinations of parameters controlling the sparsity of the underlying selection algorithm (e.g. penalty parameter for regularised models) and thresholds in selection proportions. These two parameters are jointly calibrated by maximising the stability score of the model (possibly under a constraint on the expected number of falsely stably selected features). This function uses a serial implementation and requires the grid of parameters controlling the underlying algorithm as input (for internal use only).
SerialRegression(
xdata,
ydata = NULL,
Lambda,
pi_list = seq(0.6, 0.9, by = 0.01),
K = 100,
tau = 0.5,
seed = 1,
n_cat = 3,
family = "gaussian",
implementation = PenalisedRegression,
resampling = "subsampling",
cpss = FALSE,
PFER_method = "MB",
PFER_thr = Inf,
FDP_thr = Inf,
group_x = NULL,
group_penalisation = FALSE,
output_data = FALSE,
verbose = TRUE,
...
)
xdata |
matrix of predictors with observations as rows and variables as columns. |
ydata |
optional vector or matrix of outcome(s). If |
Lambda |
matrix of parameters controlling the level of sparsity in the
underlying feature selection algorithm specified in |
pi_list |
vector of thresholds in selection proportions. If
|
K |
number of resampling iterations. |
tau |
subsample size. Only used if |
seed |
value of the seed to initialise the random number generator and
ensure reproducibility of the results (see |
n_cat |
computation options for the stability score. Default is
|
family |
type of regression model. This argument is defined as in
|
implementation |
function to use for variable selection. Possible
functions are: |
resampling |
resampling approach. Possible values are:
|
cpss |
logical indicating if complementary pair stability selection
should be done. For this, the algorithm is applied on two non-overlapping
subsets of half of the observations. A feature is considered as selected if
it is selected for both subsamples. With this method, the data is split
|
PFER_method |
method used to compute the upper-bound of the expected
number of False Positives (or Per Family Error Rate, PFER). If
|
PFER_thr |
threshold in PFER for constrained calibration by error
control. If |
FDP_thr |
threshold in the expected proportion of falsely selected
features (or False Discovery Proportion) for constrained calibration by
error control. If |
group_x |
vector encoding the grouping structure among predictors. This
argument indicates the number of variables in each group. Only used for
models with group penalisation (e.g. |
group_penalisation |
logical indicating if a group penalisation should
be considered in the stability score. The use of
|
output_data |
logical indicating if the input datasets |
verbose |
logical indicating if a loading bar and messages should be printed. |
... |
additional parameters passed to the functions provided in
|
A list with:
S |
a matrix of the best stability scores for different parameters controlling the level of sparsity in the underlying algorithm. |
Lambda |
a matrix of parameters controlling the level of sparsity in the underlying algorithm. |
Q |
a matrix of the average number of selected features by the underlying algorithm with different parameters controlling the level of sparsity. |
Q_s |
a matrix of the calibrated number of stably selected features with different parameters controlling the level of sparsity. |
P |
a matrix of calibrated thresholds in selection proportions for different parameters controlling the level of sparsity in the underlying algorithm. |
PFER |
a matrix of upper-bounds in PFER of calibrated stability selection models with different parameters controlling the level of sparsity. |
FDP |
a matrix of upper-bounds in FDP of calibrated stability selection models with different parameters controlling the level of sparsity. |
S_2d |
a matrix of stability scores obtained with different combinations of parameters. Columns correspond to different thresholds in selection proportions. |
PFER_2d |
a matrix of upper-bounds in FDP obtained with different combinations of parameters. Columns correspond to different thresholds in selection proportions. |
FDP_2d |
a matrix of upper-bounds in PFER obtained with different combinations of parameters. Columns correspond to different thresholds in selection proportions. |
selprop |
a matrix of selection proportions. Columns correspond to
predictors from |
Beta |
an array of model coefficients.
Columns correspond to predictors from |
method |
a list with |
params |
a
list with values used for arguments |
For all
matrices and arrays returned, the rows are ordered in the same way and
correspond to parameter values stored in Lambda
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.