moretrees: moretrees: A package for fitting Multi-Outcome Regression...

Description Usage Arguments Value Here are some functions Model Description See Also Examples

View source: R/moretrees_wrapper.R

Description

Fit MOReTreeS to: normally distributed outcome data (moretrees_normal) or binary data (moretrees_logistic).

All the details go here!

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
moretrees(
  X,
  W = NULL,
  y,
  outcomes,
  tr,
  random_init = FALSE,
  initial_values = NULL,
  method = "tree",
  W_method = "shared",
  family = "bernoulli",
  ci_level = 0.95,
  get_ml = FALSE,
  update_hyper = T,
  update_hyper_freq = 50,
  print_freq = update_hyper_freq,
  hyper_fixed = NULL,
  tol = 0.00000001,
  max_iter = 5000,
  nrestarts = 3,
  keep_restarts = nrestarts > 1,
  parallel = nrestarts > 1,
  log_restarts = nrestarts > 1,
  log_dir = getwd(),
  hyper_random_init = list(omega_max = 100, tau_max = 100, sigma2_max = 100),
  vi_random_init = list(eta_sd = 10, mu_sd = 10, delta_sd = 10)
)

Arguments

X

An n x K matrix of exposure data, where K is the dimension of the exposure. Grouping of the outcomes will be based on their relationships with the variables in X.

W

Matrix of covariates of dimension n x m. Coefficients for these variables do not affect grouping of the outcomes.

y

Vector of length n containing outcomes data. If family = "bernoulli", y must be an integer vector where 1 = success, 0 = failure. If family = "gaussian", y must be a numeric vector containing continuous data.

outcomes

Character vector of length n. outcomes[i] is a string indicating the outcome experienced by unit i.

tr

A directed igraph object. This is a tree representing the relationships among the outcomes. The leaves represent individual outcomes, and internal nodes represent outcome categories consisting of their leaf descendants. All nodes of tr must have unique names as given by names(V(tr)). The names of the leaves must be equal to the unique elements of outcomes.

method

= "matrix" or "tree". "matrix" uses a transformation of the design matrix to fit the MOReTreeS model; "tree" uses the information in tr. "matrix" may be more efficient for small trees; "tree" may be more efficient for large trees. (?)

W_method

= "shared" if information about the effect of variables in W wil be shared across the outcomes according to the tree structure. If W_method = "individual", the effect of W will be estimated separately for each outcome (no infromation sharing).

family

A string specifying the distribution of the outcomes: either "bernoulli" (for classification) or "gaussian" (for regression)

ci_level

A number between 0 and 1 giving the desired credible interval. For example, ci_level = 0.95 (the default) returns a 95% credible interval

get_ml

If TRUE, moretrees will also return the maximum likelihood estimates of the coefficients for each outcome group discovered by the model. The default is FALSE.

update_hyper

Update hyperparameters? Default = TRUE.

update_hyper_freq

How frequently to update hyperparameters. Default = every 50 iterations.

print_freq

How often to print out iteration number.

hyper_fixed

Fixed values of hyperparameters to use if update_hyper = FALSE. If family = "bernoulli", this should be a list including the following elements: tau (prior variance for sparse node coefficients) rho (prior node selection probability for sparse node coefficients) omega (prior variance for non-sparse node coefficients) If family = "gaussian", in addition to the above, the list should also include: sigma2 (variance of residuals)

tol

Convergence tolerance for ELBO. Default = 1E-8.

nrestarts

Number of random re-starts of the VI algorithm. The result that gives the highest ELBO will be returned. It is recommended to choose nrestarts > 1. The default is 3.

keep_restarts

If TRUE, the results from all random restarts will be returned. If FALSE, only the restart with the highest ELBO is returned.

parallel

If TRUE, the random restarts will be run in parallel. It is recommended to first set the number of cores using doParallel::registerDoParallel(). Otherwise, the default number of cores specified by the doParallel package will be used.

log_restarts

If TRUE, progress of each random restart will be logged to a text file in log_dir.

log_dir

Directory for logging progress of random restarts. Default is the working directory.

hyper_random_init

If update_hyper = TRUE, this is a list containing the maximum values of the hyperparameters. Each hyperparameter will be initialised uniformly at random between 0 and the maximum values given by the list elements below. If multiple random restarts are being used, it is recommended to use a large range for these initial values so that the parameter space can be more effectively explored. The list contains the following elements: tau_max (maxmimum of prior sparse node variance) omega_max (maximum of prior non-sparse node variance) sigma2_max (maximum of residual error variance— for gaussian data only)

vi_random_init

A list with parameters that determine the distributions from which the initial VI parameters will be randomly chosen. All parameters will be randomly selected from independent normal distributions with the standard deviations given by the list elements below. If multiple random restarts are being used, it is recommended to use large standard deviations for these initial values so that the parameter space can be more effectively explored. The list contains the following elements: mu_sd (standard deviation for posterior means of sparse node coefficients) delta_sd (standard deviation for posterior means of non-sparse node coefficients) xi_sd (standard deviation for auxilliary parameters xi— for bernoulli data only)

maxiter

Maximum number of iterations of the VI algorithm.

Value

A list containing the following elements: 1. estimated coefficients and credible intervals; 2. outputs from variational inference algorithm

Here are some functions

All about functions!

Model Description

Describe MOReTreeS model and all parameters here.

See Also

Other MOReTreeS functions: moretrees_compute_betas(), moretrees_compute_thetas(), moretrees_design_matrix(), moretrees_design_tree(), moretrees_init_W_logistic(), moretrees_init_logistic(), moretrees_init_rand()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# test dataset 
load(system.file("extdata", "example_data.Rdata", package = "moretrees"))

mod <- moretrees(
  X = X, 
  W = W, 
  y = y, 
  outcomes = outcomes,
  W_method = "shared",
  tr = ccs_tree(7)$tr, 
  family = "bernoulli",
  update_hyper = TRUE, 
  update_hyper_freq = 1,
  hyper_fixed = list(tau = 3, 
                     rho = 0.5, 
                     omega = 2),
  tol = 1E-8, 
  max_iter = 4,
  print_freq = 1,
  nrestarts = 1,
  get_ml = FALSE,
  log_dir = "."
)

beta_est <- mod$beta_est
beta_moretrees <- mod$beta_moretrees
beta_ml <- mod$beta_ml
theta_est <- mod$theta_est
mod_restarts <- mod$mod_restarts
mod1 <- mod$mod

IQSS/moretrees documentation built on March 20, 2020, 8:44 p.m.