StabilizedRegression: StabilizedRegression

View source: R/StabilizedRegression.R

StabilizedRegressionR Documentation

StabilizedRegression

Description

StabilizedRegression based on linear OLS

Usage

StabilizedRegression(
  X,
  Y,
  A,
  pars = list(m = ncol(X), B = 100, alpha_stab = 0.05, alpha_pred = 0.05, size_weight =
    "linear", compute_predictive_model = TRUE, use_resampling = FALSE, prescreen_size =
    NA, prescreen_type = "correlation", stab_test = "exact", pred_score = "mse", topk =
    1, variable_importance = "scaled_coefficient"),
  verbose = 0,
  seed = NA
)

Arguments

X

predictor matrix. Numeric matrix of size n times d, where columns correspond to individual predictors.

Y

response variable. Numeric vector of length n.

A

stabilizing variable. Numeric vector of length n which can be interpreted as a factor.

pars

list of additional parameters. m (default ncol(X)) integer specifying the largest possible subset size. B (default 100) integer specifying the number of random subsets to sample, if NA all subsets will be used. alpha_stab (default 0.05) value between 0 and 1 specifiying the stability cutoff. alpha_pred (default 0.05) value between 0 and 1 specifiying the predictive cutoff. size_weight (default "linear") one of the strings "linear", "constant", "quadratic", "rbf" or numeric weight vector specifying a probablity for each potential set size from 1 to m. compute_predictive_model (default TRUE) boolean specifying whether to additionally compute SR (pred) and SR (diff) as well. prescreen_size (default NA) integer specifying the number of variables to screen down to before applying SR, if NA then no screening is applied. prescreen_type (default "correlation") one of the strings "correlation", "ols", "lasso", "deconfounding", "correlation_env", "deconfounding_env" specifying the type of screening. stab_test (default "exact") specifies which stability test to use. Either "exact" for a Bonferroni-corrected version of Chow's test, "mean_sres" a mean test based on resampling of the scaled residuals or "meanvar_sres" a mean and variance test based on resampling of the scaled residuals. pred_score (default "mse") specifies the prediction score. Either "mse" for the mean squared error, "mse_env" for the environment-wise best mean squared error, "aic" for the Akaike information criterion or "bic" for the Bayesian information criterion. topk (default 1) is a tuning parameter that can be used to increase the number of predictive sets. It should be an integer value, where higher values lead to more accepted sets based on the predictive cutoff. variable_importance (default "scaled_coefficient") specifies the type of variable ranking. Either "weighted" for a weighted average of all selected subsets, "scaled_coefficient" for a ranking based on the scaled average regression parameter or "permutation" for a permutation based ranking.

verbose

0 for no output, 1 for text output and 2 for text and diagnostic plots.

seed

fix the seed value at the beginning of the function.

Details

Performs a linear regression of a response Y on a set of predictors X while ensuring stability across different values of a stabilizing variable A.

Value

Object of class 'StabilizedRegression' consisting of the following elements

learner_list

List of all fitted linear OLS regressions (fitted R6 'linear_regression' objects).

weighting

Weighting of the individual regressions in SR.

weighting_pred

Weighting of the individual regressions in SR (pred). Only computed if compute_predictive_model is TRUE.

variable_importance

Variable importance measure for all predictors based on SR.

variable_importance_pred

Variable importance measure for all predictors based on SR (pred). Only computed if compute_predictive_model is TRUE.

variable_importance_diff

Variable importance measure for all predictors based on difference between SR and SR (pred). Only computed if compute_predictive_model is TRUE.

Author(s)

Niklas Pfister

References

Pfister, N., E. Williams, R. Aebersold, J. Peters and P. B\"uhlmann (2019). Stabilizing Variable Selection and Regression. arXiv preprint arXiv:1911.01850.

Examples

## Example
set.seed(1)
X1 <- rnorm(200)
Y <- X1 + rnorm(200)
X2 <- 0.5 * X1 + Y + 0.2 * c(rnorm(100), rnorm(100)+2)

X <- cbind(X1, X2)
A <- as.factor(rep(c(0, 1), each=100))

fit_sr <- StabilizedRegression(X, Y, A, pars=list(B=NA))
fit_lm <- lm(Y ~ X)

print(paste("Coefficients of SR:", toString(coefficients(fit_sr))))
print(paste("Coefficients of SR (pred):", toString(coefficients(fit_sr, predictive_model=TRUE))))
print(paste("Coefficients of OLS:", toString(coefficients(fit_lm))))

StabilizedRegression documentation built on June 30, 2022, 9:06 a.m.