cvma: Cross-validated maximal association measures

Description Usage Arguments Details Value See Also Examples

Description

A flexible interface for computing cross-validation-based measures of maximal association. In an outer layer of V-fold cross validation, training samples are used to train a prediction algorithm for each outcome. Multiple algorithms may be ensembled using stacking (also known as super learning) based on V-2 fold cross-validation. An inner layer of V-1 cross validation is used to determine a user-specified combination of outcomes that maximizes a user-specified prediction criteria. The outer layer validation sample is used to compute a user-specified cross-validated measure of performance of the prediction algorithm for predicting the combined outcome that was computed in the training sample. Several common choices for outcome combinations (convex combination of outcomes and single outcome that is most associated) and prediction criteria (nonparametric R^2, negative log-likelihood, and area under ROC curve) are included; however, users may specify their own criteria as well. The function returns the cross-validated summary measure for the maximally combined outcome and, if desired, the cross-validated summary measure for each outcome.

Usage

1
2
3
4
5
6
7
8
9
cvma(Y, X, V = 5, learners, sl_control = list(ensemble_fn =
  "ensemble_linear", optim_risk_fn = "optim_risk_sl_se", weight_fn =
  "weight_sl_convex", cv_risk_fn = "cv_risk_sl_r2", family = gaussian(),
  alpha = 0.05), y_weight_control = list(ensemble_fn = "ensemble_linear",
  weight_fn = "weight_y_convex", optim_risk_fn = "optim_risk_y_r2",
  cv_risk_fn = "cv_risk_y_r2", alpha = 0.05),
  return_control = list(outer_weight = TRUE, outer_sl = TRUE, inner_sl =
  FALSE, all_y = TRUE, all_learner_assoc = TRUE, all_learner_fits = FALSE),
  scale = FALSE)

Arguments

Y

A matrix or data.frame of outcomes

X

A matrix or data.frame of predictors

V

Number of outer folds of cross-validation (nested cross-validation uses V-1 and V-2 folds), so must be at least four.

learners

Super learner wrappers. See SuperLearner::listWrappers.

sl_control

A list with named entries ensemble_fn, optim_risk_fn, weight_fn, cv_risk_fn, family. Available functions can be viewed with sl_control_options(). See ?sl_control_options for more on how users may supply their own functions.

y_weight_control

A list with named entries ensemble_fn, optim_risk_fn, weight_fn, cv_risk_fn. Available functions can be viewed with y_weight_control_options(). See ?y_weight_control_options for more on how users may supply their own functions.

return_control

A list with named entries outer_weight (whether to return outcome weights for outer-most fold of CV, default TRUE), outer_sl (whether to return the super learner fit for each outcome on all the data), all_y (whether to return cross-validated performance metrics for all outcomes), all_learner_assoc (whether to return cross-validation performance metrics for all learners), all_learner_fits (whether to return all learner fits, which, while memory intensive, can be helpful if association measures based on different outcome weighting schemes are desired). TO DO: For all the control options, it would be nice if one could just input one of the entries and have the others set to default a la SuperLearner control or caret trControl.

scale

Standardize each outcome to be mean zero with standard deviation 1.

Details

TO DO: Figure out how future works (e.g., can plan() be specified internally or externally?)

Value

cv_assoc returns risk for the entire procedure. The cv_assoc_all_y will return cross-validated performance metric for all the outcomes, including the confidence interval, p-value and influence curve. all_learner_assoc will return for each outcome and learner cross-validated metric, confidence interval, associated p-value and influence curve. The sl_fit will return Super Learner fit for each outcome and associated learner risks on all the data. In addition, it will return the fit for all learners based on all folds. The outer_weight will return the outcome weights obtained using outer-most fold of CV. inner_weight returns outcome weights obtained using inner-most fold of CV. Additinally, all_learner_fits returns all learner fits. TO DO: Should cvma have $cv_measure, $ci_low, $ci_high, and $p_value returned? These are seen when the objects are printed so it may be natural for users to think that those are named in the cvma object.

See Also

predict method

Examples

1
2
3
4
5
6
7
8
9
set.seed(1234)
library(SuperLearner)
library(future)
X <- data.frame(x1=runif(n=100,0,5), x2=runif(n=100,0,5))
Y1 <- rnorm(100, X$x1 + X$x2, 1)
Y2 <- rnorm(100, X$x1 + X$x2, 3)
Y <- data.frame(Y1 = Y1, Y2 = Y2)
fit <- cvma(Y = Y, X = X, V = 5, 
                learners = c("SL.glm","SL.mean"))

benkeser/cvma documentation built on May 5, 2019, 1:37 p.m.