view.contribution: Evaluate the contribution of data views in making prediction
In multiview: Cooperative Learning for Multi-View Analysis

view.contribution

R Documentation

Evaluate the contribution of data views in making prediction

Description

Evaluate the contribution of each data view in making prediction. The function has two options. If force is set to NULL, the data view contribution is benchmarked by the null model. If force is set to a list of data views, the contribution is benchmarked by the model fit on this list of data views, and the function evaluates the marginal contribution of each additional data view on top of this benchmarking list of views. The function returns a table showing the percentage improvement in reducing error as compared to the bechmarking model made by each data view.

Usage

view.contribution(
  x_list,
  y,
  family = gaussian(),
  rho,
  s = c("lambda.min", "lambda.1se"),
  eval_data = c("train", "test"),
  weights = NULL,
  type.measure = c("default", "mse", "deviance", "class", "auc", "mae", "C"),
  x_list_test = NULL,
  test_y = NULL,
  nfolds = 10,
  foldid = NULL,
  force = NULL,
  ...
)

Arguments

`x_list`	a list of `x` matrices with same number of rows `nobs`
`y`	the quantitative response with length equal to `nobs`, the (same) number of rows in each `x` matrix
`family`	A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.)
`rho`	the weight on the agreement penalty, default 0. `rho=0` is a form of early fusion, and `rho=1` is a form of late fusion. We recommend trying a few values of `rho` including 0, 0.1, 0.25, 0.5, and 1 first; sometimes `rho` larger than 1 can also be helpful.
`s`	Value(s) of the penalty parameter `lambda` at which predictions are required. Default is the value `s="lambda.1se"` stored on the CV `object`. Alternatively `s="lambda.min"` can be used. If `s` is numeric, it is taken as the value(s) of `lambda` to be used. (For historical reasons we use the symbol 's' rather than 'lambda' to reference this parameter)
`eval_data`	If `train`, we evaluate the contribution of data views based on training data using cross validation error; if `test`, we evaluate the contribution of data views based on test data. Default is `train`. If set to `test`, users need to provide the test data, i.e. `x_list_test` and `y_test`.
`weights`	Observation weights; defaults to 1 per observation
`type.measure`	loss to use for cross-validation. Currently five options, not all available for all models. The default is `type.measure="deviance"`, which uses squared-error for gaussian models (a.k.a `type.measure="mse"` there), deviance for logistic and poisson regression, and partial-likelihood for the Cox model. `type.measure="class"` applies to binomial and multinomial logistic regression only, and gives misclassification error. `type.measure="auc"` is for two-class logistic regression only, and gives area under the ROC curve. `type.measure="mse"` or `type.measure="mae"` (mean absolute error) can be used by all models except the `"cox"`; they measure the deviation from the fitted mean to the response. `type.measure="C"` is Harrel's concordance measure, only available for `cox` models.
`x_list_test`	A list of `x` matrices in the test data for evaluation.
`test_y`	The quantitative response in the test data with length equal to the number of rows in each `x` matrix of the test data.
`nfolds`	number of folds - default is 10. Although `nfolds` can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is `nfolds=3`
`foldid`	an optional vector of values between 1 and `nfold` identifying what fold each observation is in. If supplied, `nfold` can be missing.
`force`	If `NULL`, the data view contribution is benchmarked by the null model. If users want to benchmark by the model fit on a specified list of data views, `force` needs to be set to this list of benchmarking data views, i.e. a list of `x` matrices. The function then evaluates the marginal contribution of each additional data, i.e. the data views in `x_list` but not in `force`, on top of the benchmarking views.
`...`	Other arguments that can be passed to `multiview`

Value

a data frame consisting of the view, error metric, and percentage improvement.

Examples

set.seed(3)
# Simulate data based on the factor model
x = matrix(rnorm(200*20), 200, 20)
z = matrix(rnorm(200*20), 200, 20)
w = matrix(rnorm(200*20), 200, 20)
U = matrix(rep(0, 200*10), 200, 10) # latent factors
for (m in seq(10)){
    u = rnorm(200)
    x[, m] = x[, m] + u
    z[, m] = z[, m] + u
    w[, m] = w[, m] + u
    U[, m] = U[, m] + u}
beta_U = c(rep(2, 5),rep(-2, 5))
y = U %*% beta_U + 3 * rnorm(100)

# Split training and test sets
smp_size_train = floor(0.9 * nrow(x))
train_ind = sort(sample(seq_len(nrow(x)), size = smp_size_train))
test_ind = setdiff(seq_len(nrow(x)), train_ind)
train_X = scale(x[train_ind, ])
test_X = scale(x[test_ind, ])
train_Z <- scale(z[train_ind, ])
test_Z <- scale(z[test_ind, ])
train_W <- scale(w[train_ind, ])
test_W <- scale(w[test_ind, ])
train_y <- y[train_ind, ]
test_y <- y[test_ind, ]
foldid = sample(rep_len(1:10, dim(train_X)[1]))

# Benchmarked by the null model:
rho = 0.3
view.contribution(x_list=list(x=train_X,z=train_Z), train_y, rho = rho,
                  eval_data = 'train', family = gaussian())
view.contribution(x_list=list(x=train_X,z=train_Z), train_y, rho = rho,
                  eval_data = 'test', family = gaussian(),
                  x_list_test=list(x=test_X,z=test_Z), test_y=test_y)

# Force option -- benchmarked by the model train on a specified list of data views:
view.contribution(x_list=list(x=train_X,z=train_Z,w=train_W), train_y, rho = rho,
                  eval_data = 'train', family = gaussian(), force=list(x=train_X))

multiview documentation built on April 3, 2023, 5:20 p.m.