locus | R Documentation |
Variational approximation procedure fitting sparse multivariate regression models for combined selection of predictors and associated responses in high-dimensional set-ups. Dependence across responses linked to the same predictors is modelled through the model hierarchical structure. The responses can be purely continuous, purely binary (logit or probit link fits), or a mix of continuous and binary variables.
locus(
Y,
X,
p0_av,
Z = NULL,
link = "identity",
ind_bin = NULL,
list_hyper = NULL,
list_init = NULL,
list_cv = NULL,
list_blocks = NULL,
list_groups = NULL,
list_struct = NULL,
user_seed = NULL,
tol = 0.1,
maxit = 1000,
anneal = NULL,
save_hyper = FALSE,
save_init = FALSE,
full_output = FALSE,
verbose = TRUE,
checkpoint_path = NULL
)
Y |
Response data matrix of dimension n x d, where n is the number of samples and d is the number of response variables. |
X |
Input matrix of dimension n x p, where p is the number of candidate
predictors. |
p0_av |
Prior average number of predictors (or groups of predictors if
|
Z |
Covariate matrix of dimension n x q, where q is the number of
covariates. Variables in |
link |
Response link. Must be " |
ind_bin |
If |
list_hyper |
An object of class " |
list_init |
An object of class " |
list_cv |
An object of class " |
list_blocks |
An object of class " |
list_groups |
An object of class " |
list_struct |
An object of class " |
user_seed |
Seed set for reproducible default choices of hyperparameters
(if |
tol |
Tolerance for the stopping criterion. |
maxit |
Maximum number of iterations allowed. |
anneal |
Parameters for annealing scheme. Must be a vector whose first
entry is sets the type of ladder: 1 = geometric spacing, 2 = harmonic
spacing or 3 = linear spacing, the second entry is the initial temperature,
and the third entry is the ladder size. If |
save_hyper |
If |
save_init |
If |
full_output |
If |
verbose |
If |
checkpoint_path |
Path where to save temporary checkpoint outputs.
Default is |
The optimization uses efficient block coordinate ascent schemes, for which convergence is ensured as the objective (elbo) is multiconcave for the selected blocks, i.e., it is concave in each block of parameters whose updates are made simultaneously, see Wu et al. (reference Section below).
The continuous response variables in Y
(if any) will be centered
before application of the variational algorithm, and the candidate predictors
and covariates resp. in X
and Z
will be standardized. An
intercept will be added if link
is "logit"
, "probit"
or
"mix"
(do not supply it in X
or Z
).
An object of class "vb
" containing the following variational
estimates and settings:
gam_vb |
Posterior inclusion probability matrix of dimension p x d. Entry (s, t) corresponds to the posterior probability of association between candidate predictor s and response t. |
alpha_vb |
Matrix of dimension q x d whose entries are the posterior
mean regression coefficients for the covariates provided
in |
om_vb |
Vector of length p containing the posterior mean of omega. Entry s controls the proportion of responses associated with candidate predictor s. |
converged |
A boolean indicating whether the algorithm has converged
before reaching |
it |
Final number of iterations. |
lb_opt |
Optimized variational lower bound for the marginal log-likelihood (ELBO). |
diff_lb |
Difference in ELBO between the last and penultimate
iterations. This may be a useful diagnostic information when
convergence has not been reached before |
p_star |
Vector of length 1 or p defining the applied sparsity control. |
rmvd_cst_x, rmvd_cst_z |
Vectors containing the indices of constant
variables in |
rmvd_coll_x, rmvd_coll_z |
Vectors containing the indices of variables
in |
list_hyper, list_init |
If |
group_labels |
If |
... |
Other specific outputs are possible depending on the model used. |
H. Ruffieux, A. C. Davison, J. Hager, I. Irincheeva. Efficient inference for genetic association studies with multiple outcomes. Biostatistics, 2017.
Y. Xu, and W. Yin. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on imaging sciences, 6, pp.1758-1789, 2013.
set_hyper
, set_init
,
set_cv
, set_blocks
, set_groups
and set_struct
.
seed <- 123; set.seed(seed)
###################
## Simulate data ##
###################
## Examples using small problem sizes:
##
n <- 200; p <- 250; p0 <- 25; d <- 30; d0 <- 25; q <- 3
## Candidate predictors (subject to selection)
##
# Here we simulate common genetic variants (but any type of candidate
# predictors can be supplied).
# 0 = homozygous, major allele, 1 = heterozygous, 2 = homozygous, minor allele
#
X_act <- matrix(rbinom(n * p0, size = 2, p = 0.25), nrow = n)
X_inact <- matrix(rbinom(n * (p - p0), size = 2, p = 0.25), nrow = n)
shuff_x_ind <- sample(p)
X <- cbind(X_act, X_inact)[, shuff_x_ind]
bool_x_act <- shuff_x_ind <= p0
pat_act <- beta <- matrix(0, nrow = p0, ncol = d0)
pat_act[sample(p0*d0, floor(p0*d0/5))] <- 1
beta[as.logical(pat_act)] <- rnorm(sum(pat_act))
## Covariates (not subject to selection)
##
Z <- matrix(rnorm(n * q), nrow = n)
alpha <- matrix(rnorm(q * d), nrow = q)
## Gaussian responses
##
Y_act <- matrix(rnorm(n * d0, mean = X_act %*% beta, sd = 0.5), nrow = n)
Y_inact <- matrix(rnorm(n * (d - d0), sd = 0.5), nrow = n)
shuff_y_ind <- sample(d)
Y <- cbind(Y_act, Y_inact)[, shuff_y_ind] + Z %*% alpha
## Binary responses
##
Y_bin <- ifelse(Y > 0, 1, 0)
########################
## Infer associations ##
########################
## Continuous responses
##
# We take p0_av = p0 (known here); this choice may, in some cases, result in
# (too) conservative variable selections. In practice, it is advised to set
# p0_av as a slightly overestimated guess of p0, or perform cross-validation
# using function `set_cv'.
# No covariate
#
vb_g <- locus(Y = Y, X = X, p0_av = p0, link = "identity", user_seed = seed)
# With covariates
#
vb_g_z <- locus(Y = Y, X = X, p0_av = p0, Z = Z, link = "identity",
user_seed = seed)
## Binary responses
##
vb_logit <- locus(Y = Y_bin, X = X, p0_av = p0, Z = Z, link = "logit",
user_seed = seed)
vb_probit <- locus(Y = Y_bin, X = X, p0_av = p0, Z = Z, link = "probit",
user_seed = seed)
## Mix of continuous and binary responses
##
Y_mix <- cbind(Y, Y_bin)
ind_bin <- (d+1):(2*d)
vb_mix <- locus(Y = Y_mix, X = X, p0_av = p0, Z = Z, link = "mix",
ind_bin = ind_bin, user_seed = seed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.