cv.multiview | R Documentation |
Does k-fold cross-validation (CV) for multiview and produces a CV curve.
cv.multiview(
x_list,
y,
family = gaussian(),
rho = 0,
weights = NULL,
offset = NULL,
lambda = NULL,
type.measure = c("default", "mse", "deviance", "class", "auc", "mae", "C"),
nfolds = 10,
foldid = NULL,
alignment = c("lambda", "fraction"),
grouped = TRUE,
keep = FALSE,
trace.it = 0,
...
)
x_list |
a list of |
y |
the quantitative response with length equal to |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.) |
rho |
the weight on the agreement penalty, default 0. |
weights |
Observation weights; defaults to 1 per observation |
offset |
Offset vector (matrix) as in |
lambda |
A user supplied |
type.measure |
loss to use for cross-validation. Currently
five options, not all available for all models. The default is
|
nfolds |
number of folds - default is 10. Although |
foldid |
an optional vector of values between 1 and |
alignment |
This is an experimental argument, designed to fix
the problems users were having with CV, with possible values
|
grouped |
This is an experimental argument, with default
|
keep |
If |
trace.it |
If |
... |
Other arguments that can be passed to |
The current code can be slow for "large" data sets, e.g. when the number of features is larger than 1000. It can be helpful to see the progress of multiview as it runs; to do this, set trace.it = 1 in the call to multiview or cv.multiview. With this, multiview prints out its progress along the way. One can also pre-filter the features to a smaller set, using the exclude option, with a filter function.
If there are missing values in the feature matrices: we recommend that you center the columns of each feature matrix, and then fill in the missing values with 0.
For example,
x <- scale(x,TRUE,FALSE)
x[is.na(x)] <- 0
z <- scale(z,TRUE,FALSE)
z[is.na(z)] <- 0
Then run multiview in the usual way. It will exploit the assumed shared latent factors to make efficient use of the available data.
The function runs multiview
nfolds+1
times; the first to get the
lambda
sequence, and then the remainder to compute the fit with each
of the folds omitted. The error is accumulated, and the average error and
standard deviation over the folds is computed. Note that cv.multiview
does NOT search for values for rho
. A specific value should be
supplied, else rho=0
is assumed by default. If users would like to
cross-validate rho
as well, they should call cv.multiview
with
a pre-computed vector foldid
, and then use this same fold vector in
separate calls to cv.multiview
with different values of rho
.
an object of class "cv.multiview"
is returned, which is a
list with the ingredients of the cross-validation
fit.
lambda |
the values of |
cvm |
The mean cross-validated error - a vector of length
|
cvsd |
estimate of standard error of
|
cvup |
upper curve = |
cvlo |
lower
curve = |
nzero |
number of non-zero coefficients
at each |
name |
a text string indicating type of measure (for plotting purposes). |
multiview.fit |
a fitted multiview object for the full data. |
lambda.min |
value of
|
lambda.1se |
largest
value of |
fit.preval |
if |
foldid |
if |
index |
a one column matrix with the indices of |
# Gaussian
# Generate data based on a factor model
set.seed(1)
x = matrix(rnorm(100*20), 100, 20)
z = matrix(rnorm(100*20), 100, 20)
U = matrix(rnorm(100*5), 100, 5)
for (m in seq(5)){
u = rnorm(100)
x[, m] = x[, m] + u
z[, m] = z[, m] + u
U[, m] = U[, m] + u}
x = scale(x, center = TRUE, scale = FALSE)
z = scale(z, center = TRUE, scale = FALSE)
beta_U = c(rep(0.1, 5))
y = U %*% beta_U + 0.1 * rnorm(100)
fit1 = cv.multiview(list(x=x,z=z), y, rho = 0.3)
# plot the cross-validation curve
plot(fit1)
# extract coefficients
coef(fit1, s="lambda.min")
# extract ordered coefficients
coef_ordered(fit1, s="lambda.min")
# make predictions
predict(fit1, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min")
# Binomial
by = 1 * (y > median(y))
fit2 = cv.multiview(list(x=x,z=z), by, family = binomial(), rho = 0.9)
predict(fit2, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min", type = "response")
plot(fit2)
coef(fit2, s="lambda.min")
coef_ordered(fit2, s="lambda.min")
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = cv.multiview(list(x=x,z=z), py, family = poisson(), rho = 0.6)
predict(fit3, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min", type = "response")
plot(fit3)
coef(fit3, s="lambda.min")
coef_ordered(fit3, s="lambda.min")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.