RSAVS_Path_PureR: Robust subgroup analysis and variable selection...
In fenguoerbian/RSAVS: Robust Subgroup Analysis and Variable Selection

RSAVS_Path_PureR

R Documentation

Robust subgroup analysis and variable selection simultaneously

Description

This function is implemented purely in R. It carries out robust subgroup analysis and variable selection simultaneously. And it supports different types of loss functions and penalties.

Usage

RSAVS_Path_PureR(
  y_vec,
  x_mat,
  l_type = "L1",
  l_param = NULL,
  p1_type = "S",
  p1_param = c(2, 3.7),
  p2_type = "S",
  p2_param = c(2, 3.7),
  lam1_vec,
  lam2_vec,
  min_lam1_ratio = 0.03,
  min_lam2_ratio = 0.03,
  lam1_len,
  lam2_len,
  const_r123,
  const_abc = rep(1, 3),
  initial_values,
  phi = 1,
  tol = 0.001,
  max_iter = 10,
  cd_max_iter = 1,
  cd_tol = 0.001,
  subgroup_benchmark = FALSE,
  update_mu = NULL
)

Arguments

`y_vec`	numerical vector of response. n = length(y_vec) is the number of observations.
`x_mat`	numerical matrix of covariates. Each row for one observation and `p = ncol(x_mat)` is the number of covariates.
`l_type`	character string, type of loss function. "L1": l-1 loss(absolute value loss) "L2": l-2 loss(squared error loss) "Huber": Huber loss. Its parameter is given in l_param. Default value is "L1".
`l_param`	numeric vector containing necessary parameters of the corresponding loss function. The default value is `NULL`.
`p1_type`, `p2_type`	a character indicating the penalty types for subgroup identification and variable selection. "S": SCAD "M": MCP "L": Lasso Default values for both parameters are "S".
`p1_param`, `p2_param`	numerical vectors providing necessary parameters for the corresponding penalties. For Lasso, lam = p_param[1] For SCAD and MCP, lam = p_param[1], gamma = p_param[2] Default values for both parameters are `c(2, 3.7)`. Note: This function searches the whole lam1_vec * lam2_vec grid for the best solution. Hence the `lambda`s provided in these parameters serve only as placeholder and will be ignored and overwritten in the actual computation.
`lam1_vec`, `lam2_vec`	numerical vectors of customized lambda vectors. For `lam1_vec`, it's preferred to be in the order from small to big.
`lam1_len`, `lam2_len`	integers, lengths of the auto-generated lambda vectors.
`const_r123`	a length-3 numerical vector, providing the scalars needed in the augmented lagrangian part of the ADMM algorithm
`const_abc`	a length-3 numeric vector, providing the scalars to adjust weight of regression function, penalty for subgroup identification and penalty for variable selection in the overall objective function. Defaults to `c(1, 1, 1)`.
`phi`	numerical variable. A parameter needed for mBIC.
`tol`	numerical, convergence tolerance for the algorithm.
`max_iter`	integer, max number of iteration during the algorithm.
`cd_max_iter`	integer, max number of iteration during the coordinate descent update of `mu` and `beta`. If set to 0, will use analytical solution( instead of coordinate descent algorithm) to update `mu` and `beta`.
`cd_tol`	numerical, convergence tolerance for the coordinate descent part when updating `mu` and `beta`.
`subgroup_benchmark`	bool. Whether this call should be taken as a benchmark of subgroup identification. If `TRUE`, then the penalty for variable selection will be surpressed to a minimal value.
`update_mu`	list of parameters for updating `mu_vec` in the algorithm into meaningful subgroup structure. Defaults to `NULL`, which means there is no update performed. The update of `mu_vec` is carried out through `RSAVS_Determine_Mu` and the necessary parameters in `update_mu` are: `UseS`: bool variable, whether the `s_vec` should be used to provide subgroup structure information. `round_digits`: non-negative integer digits, indicating the rounding digits when merging `mu_vec` Please refer to `RSAVS_Determine_Mu` to find out more details about how the algorithm works
`min_lam_ratio`	the ratio between the minimal and maximal lambda, equals to (minimal lambda) / (maximal lambda). The default value is 0.03.
`initial_vec`	list of vector, providing initial values for the algorithm.

Examples

# a toy example
# first we generate data
n <- 200    # number of observations
q <- 5    # number of active covariates
p <- 50    # number of total covariates
k <- 2    # number of subgroups

# k subgroup effect, centered at 0
group_center <- seq(from = 0, to = 2 * (k - 1), by = 2) - (k - 1)
# covariate effect vector
beta_true <- c(rep(1, q), rep(0, p - q))
# subgroup effect vector    
alpha_true <- sample(group_center, size = n, replace = TRUE)    
x_mat <- matrix(rnorm(n * p), nrow = n, ncol = p)    # covariate matrix
err_vec <- rnorm(n, sd = 0.1)    # error term
y_vec <- alpha_true + x_mat %*% beta_true + err_vec    # response vector

# a simple analysis using default loss and penalties
res <- RSAVS_Path_PureR(y_vec = y_vec, x_mat = x_mat, 
                        lam1_len = 10, lam2_len = 8, 
                        phi = 5)

fenguoerbian/RSAVS documentation built on Oct. 25, 2024, 3:16 p.m.