moretrees: moretrees: fitting Multi-Outcome Regression with...

Description Usage Arguments Value Examples

View source: R/moretrees_wrapper.R

Description

Fits MOReTreeS model to matched case-control or case-crossover data. The posterior is approximated via variational inference. Returns estimated outcome groups and group-specific coefficient estimates with credible intervals. See vignette('moretrees') for model details and example usage.

Usage

1
2
3
4
5
6
7
8
9
moretrees(Xcase, Xcontrol, Wcase = NULL, Wcontrol = NULL, outcomes, tr,
  ci_level = 0.95, get_ml = TRUE, update_hyper_freq = 50,
  print_freq = 50, hyper_fixed = NULL, tol = 1e-08,
  tol_hyper = 1e-04, max_iter = 5000, nrestarts = 3,
  keep_restarts = TRUE, parallel = TRUE, log_restarts = FALSE,
  log_dir = ".", vi_params_init = list(), hyperparams_init = list(),
  random_init = FALSE, random_init_vals = list(omega_lims = c(0.5,
  1.5), tau_lims = c(0.5, 1.5), eta_sd_frac = 0.2, mu_sd_frac = 0.2,
  delta_sd_frac = 0.2, u_sd_frac = 0.2))

Arguments

Xcase

An n x K matrix of exposure data for cases, where K is the dimension of the exposure. Grouping of the outcomes is based on their associations with variables in Xcase. Rows of Xcase correspond to inividual cases, columns correspond to variables.

Xcontrol

An n x K matrix of exposure data for controls; row i in Xcontrol is the matched control for case i.

Wcase

An n x m matrix of covariate data for cases, where m is the dimension of the exposure. Coefficients for these variables do not affect grouping of the outcomes. Rows of Wcase correspond to inividual cases, columns correspond to variables.

Wcontrol

An n x m matrix of covariate data for controls; row i in Wcontrol is the matched control for case i.

outcomes

Character vector of length n. outcomes[i] is a string indicating the outcome experienced by unit i.

tr

A directed igraph object. This is a tree representing the relationships among the outcomes. The leaves represent individual outcomes, and internal nodes represent outcome categories consisting of their leaf descendants. All nodes of tr must have unique names as given by names(V(tr)). The names of the leaves must be equal to the unique elements of outcomes. The vertices of tr, V(tr), may have an attribute levels containing integer values from 1 to max(V(tr)$levels). In this case, the levels attribute specifies groups of nodes that share common hyperparameters rho[f], tau[f], and omega[f]. If V(tr)$levels is NULL, the default is two levels of hyperparameters: one for all leaf nodes, and one for all internal nodes.

ci_level

A number between 0 and 1 giving the desired credible interval. For example, ci_level = 0.95 (the default) returns a 95% credible interval

get_ml

If TRUE, moretrees will also return the maximum likelihood estimates of the coefficients for each outcome group discovered by the model. Default is TRUE.

update_hyper_freq

How frequently to update hyperparameters. Default = every 50 iterations.

print_freq

How often to print out iteration number and current value of epsilon (the difference in objective function value for the two most recent iterations).

hyper_fixed

Fixed values of hyperprior parameters for rho. This should be a list with two elements: a and b, both numeric vectors of length L, representing the parameters of the beta prior on rho for each level, where L is the number of levels. Default is list(a = rep(1, L), b = rep(1, L)) (uniform hyperprior)

tol

Convergence tolerance for the objective function. Default is 1E-8.

tol_hyper

The convergence tolerance for the objective function between between subsequent hyperparmeter updates. Typically a more generous tolerance than tol. Default is 1E-4.

max_iter

Maximum number of iterations of the VI algorithm. Default is 5000.

nrestarts

Number of random re-starts of the VI algorithm. The result that gives the highest value of the objective function will be returned. It is recommended to choose nrestarts > 1. The default is 3.

keep_restarts

If TRUE, the results from all random restarts will be returned. If FALSE, only the restart with the highest objective function is returned. ' Default is TRUE.

parallel

If TRUE, the random restarts will be run in parallel. It is recommended to first set the number of cores using doParallel::registerDoParallel(). Otherwise, the default number of cores specified by the doParallel package will be used. Default is TRUE.

log_restarts

If TRUE, when nrestarts > 1 progress of each random restart will be logged to a text file in log_dir. If FALSE and nrestarts > 1, progress will not be shown. If nrestarts = 1, progress will always be printed to the console. Default is FALSE.

log_dir

Directory for logging progress of random restarts. Default is the working directory.

vi_params_init, hyperparams_init

Named lists containing initial values for the variational parameters and hyperparameters. Supplying good initial values can be challenging, and moretrees() provides a way to guess initial values based on transformations of conditional logistic regression estimates of the effect sizes for each individual outcome (see moretrees_init_logistic()). The most common use for vi_params_init and hyperparams_init is to supply starting values based on previous output from moretrees(); see the vignette('moretrees') for examples. The user can provide initial values for all parameters or a subset. When initial values for one or more parameters are not supplied, the missing values will be filled in by moretrees_init_logistic().

random_init

If TRUE, some random variability will be added to the initial values. The default is FALSE, unless nrestarts > 1, in which case random_init will be set to TRUE and a warning message will be printed. The amount of variability is determined by random_init_vals.

random_init_vals

If random_init = TRUE, this is a list containing the following parameters for randomly permuting the inital values:

tau_lims

a vector of length 2, where tau_lims[1] is between 0 and 1, and tau_lims[2] > 1. The initial values for the hyperparameter tau will be chosen uniformly at random in the range (tau_init * tau_lims[1], tau_init * tau_lims[2]), where tau_init is the initial value for tau either supplied in hyperparams_init or guessed using moretrees_init_logistic().

omega_lims

a vector of length 2, where omega_lims[1] is between 0 and 1, and omega_lims[2] > 1. The initial values for the hyperparameter omega will be chosen uniformly at random in the range (omega_init * omega_lims[1], omega_init * omega_lims[2]), where omega_init is the initial value for omega either supplied in hyperparams_init or guessed using moretrees_init_logistic().

eta_sd_frac

a value between 0 and 1. The initial values for the auxilliary parameters eta will have a normal random variate added to them with standard deviation equal to eta_sd_frac multiplied by the initial value for eta either supplied in hyperparams_init or guessed using moretrees_init_logistic(). Absolute values are then taken for any values of eta that are < 0.

mu_sd_frac

a value between 0 and 1. The initial values for mu will have a normal random variate added to them with standard deviation equal to mu_sd_frac multiplied by the absolute value of the initial value for mu either supplied in vi_params_init or guessed using moretrees_init_logistic().

delta_sd_frac

a value between 0 and 1. The initial values for delta will have a normal random variate added to them with standard deviation equal to delta_sd_frac multiplied by the absolute value of the initial value for delta either supplied in vi_params_init or guessed using moretrees_init_logistic().

u_sd_frac

a value between 0 and 1. The initial value for the node inclusion probabilities will first be transformed to the log odds scale to obtain u. A normal random variate will be added to u with standard deviation eqaul to u_sd_frac multiplied by the absolute value of the initial value for u either supplied in vi_params_init or guessed using moretrees_init_logistic(). u will then be transformed back to the probability scale.

Value

A list containing the following elements:

beta_est

estimated exposure coefficients and credible intervals for each outcome. This is a data frame where the variables est1, cil1, ciu1 correspond to the estimated coefficient and lower and upper credible interval bounds for the variable in first column of Xcase/Xcontrol. est2, cil2, ciu2, correspond to the second column in Xcase/Xcontrol, and so on. The variable group indicates to which estimated group each outcome belongs.

beta_moretrees

estimated exposure coefficients and credible intervals for each outcome group. This is the same information in beta_est but presented by group. Outcomes is a list of the outcomes in each group and n_obs is the number of matched pairs corresponding to those outcomes.

theta_est

estimated covariate coefficients and credible intervals for each outcome. This is a matrix where the columns est1, cil1, ciu1 correspond to the estimated coefficient and lower and upper credible interval bounds for the variable in first column of Wcase/Wcontrol. est2, cil2, ciu2, correspond to the second column in Wcase/Wcontrol, and so on.

beta_ml, theta_ml

Results from running separate, classic conditional logisitic regression models on the data from observations corresponding to each outcome group shown in beta_moretrees.

mod

outputs from variational inference algorithm

mod_restarts

outputs from other random restarts of the algorithm, if nrestarts > 1

Examples

1
vignette('moretrees')

emgthomas/moretrees_pkg documentation built on June 20, 2020, 6:13 p.m.