# the code in this chunk enables us to truncate the print output for each # chunk using the `out.lines` option # save the built-in output hook hook_output <- knitr::knit_hooks$get("output") # set a new output hook to truncate text output knitr::knit_hooks$set(output = function(x, options) { if (!is.null(n <- options$out.lines)) { x <- xfun::split_lines(x) if (length(x) > n) { # truncate the output x <- c(head(x, n), "....\n") } x <- paste(x, collapse = "\n") } hook_output(x, options) })
multiview
is a package that fits a supervised learning model called cooperative learning for multiple sets of features ("views"), as described in @cooperative.
The method combines the usual squared error loss of predictions, or more generally, deviance loss, with an "agreement" penalty to encourage the predictions from different data views to agree.
By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches.
Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error.
In addition, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity.
This vignette describes the basic usage of multiview in R.
multiview
is written based on the glmnet
package and maintains the features from glmnet
.
The package includes functions for cross-validation, and making predictions and plots.
For two data views, consider feature matrices $X \in \mathcal R^{n\times p_x}$, $Z \in \mathcal R^{n\times p_z}$, and our target $y \in \mathcal R^{n}$.
We assume that the columns of $X$ and $Z$ have been standardized, and $y$ has mean 0 (hence we can omit the intercept below).
For a fixed value of the hyperparameter $\rho\geq 0$, multiview
finds $\beta_x \in \mathcal R^{p_x}$
and $\beta_Z \in \mathcal R^{p_z}$ that minimize:
$$\frac{1}{2} ||y-X\theta_x- Z\theta_z||^2+ \frac{\rho}{2}||(X\theta_x- Z\theta_z)||^2+
\lambda \Bigl[(1-\alpha)(||\theta_x||_1+||\theta_z||_1)+ \alpha(||\theta_x||_2^2/2+||\theta_z||_2^2/2)\Bigr].
$$
Here the agreement penalty is controlled by $\rho$. When the weight on the agreement term $\rho$ is set to 0, cooperative learning reduces to a form of early fusion: we simply concatenate the columns of different views and apply lasso or another regularized regression method. Moreover, we show in our paper that when $\rho$ is set to 1, the solutions are the average of the marginal fits for $X$ and $Z$, which is a simple form of late fusion.
The elastic net penalty is controlled by $\alpha$, bridging the gap between lasso regression ($\alpha=1$, the default) and ridge regression ($\alpha=0$), and the tuning parameter $\lambda$ controls the strength of the penalty. We can compute a regularization path of solutions indexed by $\lambda$.
For more than two views, this generalizes easily.
Assume that we have M
data matrices $X_1 \in \mathcal R^{n\times p_1},X_2\in \mathcal R^{n\times p_2},\ldots, X_M\in \mathcal R^{n\times p_M}$, multiview
solves the problem
$$\frac{1}{2} ||y- \sum_{m=1}^{M} X_m\rho_m||^2+ \frac{\rho}{2}\sum_{m<m'} ||(X_m\rho_m- X_{m'}\rho_{m'})||^2 + \sum_{m=1}^{M} \lambda_m \Bigl[(1-\alpha)\|\rho_m||_1 + \alpha||\rho_m||_2^2/2\Bigr].$$
multiview
fits linear, logistic, and poisson models.
The purpose of this section is to give users a general sense of the package. We will briefly go over the main functions, basic operations and outputs. After this section, users may have a better idea of what functions are available, which ones to use, or at least where to seek help.
First, we load the multiview
package:
library(multiview)
family = gaussian()
The default model used in the package is the Gaussian linear model or "least squares", which we will demonstrate in this section. We load a set of data created beforehand for illustration:
quick_example <- readRDS(system.file("exdata", "example.RDS", package = "multiview")) x <- quick_example$x z <- quick_example$z y <- quick_example$y
The command loads an input matrix x
and a response vector y
from this saved R data archive.
We fit the model using the most basic call to multiview
.
fit <- multiview(list(x,z), y, family=gaussian(), rho=0)
fit
is an object of class multiview
that contains all the relevant information of the fitted model for further use.
Note that when rho
= 0, cooperative learning reduces to a simple form of early fusion, where we simply concatenate the columns of $X$ and $Z$ and apply lasso.
We do not encourage users to extract the components directly. Instead, various methods are provided for the object such as plot
, print
, coef
and predict
that enable us to execute those tasks more elegantly.
multiview
provides various arguments for users to customize the fit and maintains most of the arguments from glmnet
: we introduce some commonly used arguments here. (For more information, type ?multiview
and ?glmnet
.)
rho
is for the agreement penalty. $\rho = 0$ is early fusion, and $\rho = 1$ gives a form of late fusion. We recommend trying a few values of rho
including 0, 0.1, 0.25, 0.5, and 1 first; sometimes rho
larger than 1 can also be helpful.
alpha
is for the elastic net mixing parameter $\alpha$, with range $\alpha \in [0,1]$. $\alpha = 1$ is lasso regression (default) and $\alpha = 0$ is ridge regression.
weights
is for the observation weights, default is 1 for each observation.
nlambda
is the number of $\lambda$ values in the sequence (default is 100).
lambda
can be provided if the user wants to specify the lambda sequence, but typical usage is for the program to construct the lambda sequence on its own. When automatically generated, the $\lambda$ sequence is determined by lambda.max
and lambda.min.ratio
. The latter is the ratio of the smallest value of the generated $\lambda$ sequence (say lambda.min
) to lambda.max
. The program generates nlambda
values linear on the log scale from lambda.max
down to lambda.min
. lambda.max
is not user-specified but is computed from the input $x$ and $y$: it is the smallest value for lambda
such that all the coefficients are zero. For alpha = 0
(ridge) lambda.max
would be $\infty$: in this case we pick a value corresponding to a small value for alpha
close to zero.)
standardize
is a logical flag for x
variable standardization prior to fitting the model sequence. Default is standardize = TRUE
. The coefficients are always returned on the original scale. If the variables are in the same units already, you might not want to standardize.
In addition, if there are missing values in the feature matrices: we recommend that you center the columns of each feature matrix, and then fill in the missing values with 0.
For example,
x <- scale(x,TRUE,FALSE) x[is.na(x)] <- 0 z <- scale(z,TRUE,FALSE) z[is.na(z)] <- 0
Then you can run multiview
in the usual way. It will exploit the assumed shared latent factors
to make efficient use of the available data.
We can visualize the coefficients by executing the plot
method:
plot(fit)
Each curve corresponds to a variable, with the color indicating the data view the variable comes from.
It shows the path of its coefficient against the $\ell_1$-norm of the whole coefficient vector as $\lambda$ varies. The axis above indicates the number of nonzero coefficients at the current $\lambda$. Users may also wish to annotate the curves: this can be done by setting label = TRUE
in the plot command.
A summary of the path at each step is displayed if we just enter the object name or use the print
function:
print(fit)
It shows from left to right the number of nonzero coefficients,
the percent (of null) deviance explained (%dev
) and the value of
$\lambda$ (Lambda
).
We can make predictions at specific $\lambda$'s with new input data:
set.seed(1) nx <- matrix(rnorm(5 * 50), 5, 50) nz <- matrix(rnorm(5 * 50), 5, 50) predict(fit, newx = list(nx, nz), s = c(0.1, 0.05))
Here s
is for specifying the value(s) of $\lambda$ at which to make predictions.
The type
argument allows users to choose the type of prediction returned:
"link" returns the fitted values.
"response" gives the same output as "link" for the gaussian()
family.
"coefficients" returns the model coefficients.
"nonzero" returns a list of the indices of the nonzero coefficients for each value of s
.
We can obtain the model coefficients at one or more $\lambda$'s within the range of the sequence:
coef(fit, s = 0.1)
Here s
is for specifying the value(s) of $\lambda$ at which to extract coefficients.
We can also obtain a ranked list of standardized model coefficients, where we rank the model coefficients after standardizing each coefficient by the standard deviation of the corresponding feature. Users have to provide the function with a specific value of $\lambda$ at which to extract and rank coefficients.
coef_ordered(fit, s=0.1)
The table shows from left to right the data view each coefficient comes from, the column index of the feature in the corresponding data view, the coefficient after being standardized by the standard deviation of the corresponding feature, and the original fitted coefficient. Here the features are ordered by the magnitude of their standardized coefficients, and we have truncated the printout for brevity.
The function multiview
returns a sequence of models for the users to choose from. In many cases, users may prefer the software to select one of them. Cross-validation is perhaps the simplest and most widely used method for that task. cv.multiview
is the main function to do cross-validation here, along with various supporting methods such as plotting and prediction.
In addition to all the glmnet
parameters, cv.multiview
has its special parameters including nfolds
(the number of folds), foldid
(user-supplied folds), and type.measure
(the loss used for cross-validation):
"deviance" or "mse" for squared loss, and
"mae" uses mean absolute error.
As an example,
cvfit <- cv.multiview(list(x,z), y, rho=0.1, family = gaussian(), type.measure = "mse", nfolds = 20)
cv.multiview
returns a cv.multiview
object, a list with all the ingredients of the cross-validated fit. As with multiview
, we do not encourage users to extract the components directly except for viewing the selected values of $\lambda$. The package provides well-designed functions for potential tasks. For example, we can plot the object:
plot(cvfit)
This plots the cross-validation curve (red dotted line) along with upper and lower standard deviation curves along the $\lambda$ sequence (error bars). Two special values along the $\lambda$ sequence are indicated by the vertical dotted lines. lambda.min
is the value of $\lambda$ that gives minimum mean cross-validated error, while lambda.1se
is the value of $\lambda$ that gives the most regularized model such that the cross-validated error is within one standard error of the minimum.
The coef
and predict
methods for cv.multiview
objects are similar to those for a glmnet
object, except that two special strings are also supported by s
(the values of $\lambda$ requested):
"lambda.min": the $\lambda$ at which the smallest MSE is achieved.
"lambda.1se": the largest $\lambda$ at which the MSE is within one standard error of the smallest MSE (default).
We can use the following code to get the value of lambda.min
and the model coefficients at that value of $\lambda$:
cvfit$lambda.min coef(cvfit, s = "lambda.min")
To get the corresponding values at lambda.1se
, simply replace
lambda.min
with lambda.1se
above, or omit the s
argument, since
lambda.1se
is the default.
Predictions can be made based on the fitted cv.multiview
object as well. The code below gives predictions for the new input matrix newx
at lambda.min
:
predict(cvfit, newx = list(x[1:5,],z[1:5,]), s = "lambda.min")
With multiple data views, we can evaluate the contribution of each data view in making predictions using view.contribution
. The function has two options.
The first option is to set force
to NULL
, and then the evaluation of data view contribution will be benchmarked by the null model. Specifically, it evaluates the marginal contribution of each data view in improving predictions as compared to the null model, as well as the model using all data views with cooperative learning as compared to the null model.
The function takes in a value of $\rho$ for cooperative learning.
It returns a table showing the percentage improvement in reducing error as compared to the null model made by each individual data view and by cooperative learning using all data views.
quick_example <- readRDS(system.file("exdata", "example_contribution.RDS", package = "multiview")) rho <- 0.3 view.contribution(x_list=list(x=quick_example$x, z=quick_example$z), quick_example$y, rho = rho, family = gaussian(), eval_data = 'train')
The alternative option is to provide force
with a list of data views as the baseline mode.
Then the function evaluates the marginal contribution of each additional data view on top of this benchmarking list of views using cooperative learning. The function returns a table showing the percentage improvement in reducing error as compared to the benchmarking model made by each additional data view.
view.contribution(x_list=list(x=quick_example$x, z=quick_example$z, w=quick_example$w), quick_example$y, rho = rho, eval_data = 'train', family = gaussian(), force=list(x=quick_example$x))
Here by setting eval_data
to be "train"
, we are evaluating the contribution of each data view based on the cross-validation error on the training set
We can also evaluate the contribution based on the test set performance by setting eval_data
to be "test"
.
In this case, we need to provide the function with the test data for evaluation.
view.contribution(x_list = list(x = quick_example$x, z = quick_example$z), quick_example$y, rho = rho, x_list_test = list(x = quick_example$test_X, z = quick_example$test_Z), test_y = quick_example$test_y, family = gaussian(), eval_data = 'test')
The exclude
argument to multiview
accepts a filtering function, which indicates which variables should be excluded from the fit, i.e. get zeros for their coefficients.
The idea is that variables can be filtered based on some of their properties (e.g. too sparse or with small variance) before any models are fit.
Of course, one could simply subset these out of x, but sometimes exclude is more useful, since it returns a full vector of coefficients, just with the excluded ones set to zero.
When performing cross-validation (CV), this filtering is done separately inside each fold of cv.multiview
.
To give a motivating example: in genomics, where the $X$ matrix can be very wide, we often filter features based on variance, i.e. filter the genes with no variance or low variance in their expression across samples. Here is a function that efficiently computes the column-wise variance for a wide matrix, followed by a filter function that uses it.
uvar <- function(x, means = FALSE) { # if means = TRUE, the means and variances are returned, # otherwise just the variances m <- colMeans(x) n <- nrow(x) x <- x - outer(rep(1,n),m) v <- colSums(x^2) / (n - 1) if (means) list(mean = m, var = v) else v } uvar_multiple <- function(x_list) lapply(x_list, function(x) uvar(x)) vfilter <- function(x_list, q = 0.3, ...) { v <- uvar_multiple(x_list) lapply(v, function(x) which(x < quantile(x, q))) }
Here, the filter function vfilter()
will exclude the fraction $q$ of the variables with the lowest variance.
This function gets invoked inside the call to multiview
, and uses the supplied x_list
to generate the indices.
Note that some of the arguments in the filter function can be omitted, but the ...
must always be there.
set.seed(1) x <- matrix(rnorm(100 * 20), 100, 20) z <- matrix(rnorm(100 * 20), 100, 20) y <- rnorm(100) fit.filter <- multiview(list(x,z), y, exclude = vfilter)
The nice thing with this approach is that cv.multiview
does the right thing: it applies the filter function separately to the feature matrices of each training fold, hence accounting for any bias that may be incurred by the filtering.
cvfit.filter <- cv.multiview(list(x,z), y, exclude = vfilter)
Ever run a job on a big dataset, and wonder how long it will take?
multiview
and cv.multiview
come equipped with a progress bar, which can by displayed by passing trace.it = TRUE to these functions.
fit <- multiview(list(x,z), y, trace.it = TRUE)
##
|=========== | 33%
This display changes in place as the fit is produced. The progress bar is also very helpful with cv.multiview
:
fit <- cv.multiview(list(x,z), y, trace.it = TRUE)
##
|=============================================| 100%
Fold: 1/10
|=============================================| 100%
Fold: 2/10
|=============================================| 100%
Fold: 3/10
|============================= | 70%
If the user wants multiview
and cv.multiview
to always print the progress bar, this can be achieved (for a session) via a call to multiview.control
with the itrace
argument:
multiview.control(itrace = 1)
To reset it, one makes a similar call and sets itrace = 0
.
With the tools introduced so far, users are able to fit the entire elastic net family, including lasso and ridge regression, using squared-error loss.
There are also additional features of the multiview
package that are inherited from the glmnet
package.
The glmnet
vignette "An Introduction to glmnet
" provides more detailed explanations of the function parameters and features.
This vignette is also written based on the vignette for the glmnet
package.
family = binomial()
Logistic regression is a widely-used model when the response is binary. As a generalized linear model, logistic regression has the following objective to be minimized: $$\ell(X\theta_x + Z\theta_z, y)+\frac{\rho}{2}||(X\theta_x- Z\theta_z)||^2 + \lambda \Bigl[(1-\alpha)(||\theta_x||_1+||\theta_z||_1)+ \alpha(||\theta_x||_2^2/2+||\theta_z||_2^2/2)\Bigr],$$ where $\ell$ is the negative log-likelihood (NLL) of the data.
We make the usual quadratic approximation to the objective, reducing the minimization problem to a weighted least squares (WLS) problem. This leads to an iteratively reweighted least squares (IRLS) algorithm consisting of an outer and inner loop.
For illustration purposes, we load the pre-generated input matrices X
and Z
and the response vector y
from the data file, where $y$ is a binary vector.
example_binomial <- readRDS(system.file("exdata", "example_binomial.RDS", package = "multiview")) x <- example_binomial$x z <- example_binomial$z y <- example_binomial$by
Other optional arguments of multiview
for binomial regression are almost same as those for Gaussian family. Don't forget to set family
option to binomial()
. (For this example, we also need to increase the number of IRLS iterations via multiview.control()
to ensure convergence, and we reset it after the call.)
multiview.control(mxitnr = 100) fit <- multiview(list(x=x,z=z), y, family = binomial(), rho = 0) multiview.control(factory = TRUE)
As before, we can print and plot the fitted object, extract the coefficients at specific $\lambda$'s and also make predictions. Prediction is a little different for family = binomial()
, mainly in the function argument type
:
"link" gives the linear predictors.
"response" gives the fitted probabilities.
"class" produces the class label corresponding to the maximum probability.
In the following example, we make prediction of the class labels at $\lambda = 0.05, 0.01$.
predict(fit, newx = list(x[5:10,],z[5:10,]), type = "class", s = c(0.05, 0.01))
For logistic regression, cv.multiview
has similar arguments and usage as Gaussian.
"mse" uses squared loss.
"deviance" uses actual deviance.
"mae" uses mean absolute error.
For example, the code below uses "deviance" as the criterion for 10-fold cross-validation:
cvfit <- cv.multiview(list(x = x, z = z), y, family = binomial(), type.measure = "deviance", rho = 0.5)
As before, we can plot the object and show the optimal values of $\lambda$.
plot(cvfit)
coef
and predict
for the cv.multiview
object for family = binomial()
are similar to the Gaussian case and we omit the details.
family = poisson()
Poisson regression is used to model count data under the assumption of Poisson error, or otherwise non-negative data where the mean and variance are proportional. Like the Gaussian and binomial models, the Poisson distribution is a member of the exponential family of distributions. As before, we minimize the following objective: $$\ell(X\theta_x + Z\theta_z, y)+\frac{\rho}{2}||(X\theta_x- Z\theta_z)||^2 + \lambda \Bigl[(1-\alpha)(||\theta_x||_1+||\theta_z||_1)+ \alpha(||\theta_x||_2^2/2+||\theta_z||_2^2/2)\Bigr],$$ where $\ell$ is the negative log-likelihood (NLL) of the data.
First, we load a pre-generated set of Poisson data:
example_poisson <- readRDS(system.file("exdata", "example_poisson.RDS", package = "multiview")) x <- example_poisson$x z <- example_poisson$z y <- example_poisson$py
We apply the function multiview
with family = poisson()
:
fit <- multiview(list(x=x, z=z), y, family = poisson(), rho = 0)
The optional input arguments of multiview
for "poisson"
family are similar to those for other families.
Again, we plot the coefficients to have a first sense of the result.
plot(fit)
As before, we can extract the coefficients and make predictions at certain $\lambda$'s using coef
and predict
respectively. The optional input arguments are similar to those for other families. For the predict
method, the argument type
has the same meaning as that for family = binomial()
, except that "response" gives the fitted mean (rather than fitted probabilities in the binomial case). For example, we can do the following:
coef(fit, s = 1) predict(fit, newx = list(x[1:5,],z[1:5,]), type = "response", s = c(0.1,1))
We may also use cross-validation to find the optimal $\lambda$'s and thus make inferences. However, the direct has convergence issues with the default control parameter settings as can be seen below.
cvfit <- cv.multiview(list(x,z), y, family = poisson(), rho = 0.1)
This is because the underlying IRLS algorithm requires more than the default number of iterations to converge.
This can be fixed by setting the appropriate multiview.control()
parameter (see help on multiview.control
).
To allow more iterations:
multiview.control(mxitnr = 100) cvfit <- cv.multiview(list(x,z), y, family = poisson(), rho = 0.1)
To set the parameters back to the factory default:
multiview.control(factory = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.