Multivariate Elastic Net Regression
In joinet: Penalised Multivariate Regression ('Multi-Target Learning')

knitr::opts_chunk$set(echo=TRUE,eval=!grepl('SunOS',Sys.info()['sysname']))
set.seed(1)

Installation

Install the current release from CRAN:

install.packages("joinet")

Or install the latest development version from GitHub:

#install.packages("devtools")
devtools::install_github("rauschenberger/joinet")

Initialisation

Load and attach the package:

library(joinet)

And access the documentation:

?joinet
help(joinet)
browseVignettes("joinet")

Simulation

For n samples, we simulate p inputs (features, covariates) and q outputs (outcomes, responses). We assume high-dimensional inputs (p $\gg$ n) and low-dimensional outputs (q $\ll$ n).

n <- 100
q <- 2
p <- 500

We simulate the p inputs from a multivariate normal distribution. For the mean, we use the p-dimensional vector mu, where all elements equal zero. For the covariance, we use the p $\times$ p matrix Sigma, where the entry in row $i$ and column $j$ equals rho$^{|i-j|}$. The parameter rho determines the strength of the correlation among the inputs, with small rho leading weak correlations, and large rho leading to strong correlations (0 < rho < 1). The input matrix X has n rows and p columns.

mu <- rep(0,times=p)
rho <- 0.90
Sigma <- rho^abs(col(diag(p))-row(diag(p)))
X <- MASS::mvrnorm(n=n,mu=mu,Sigma=Sigma)

We simulate the input-output effects from independent Bernoulli distributions. The parameter pi determines the number of effects, with small pi leading to few effects, and large pi leading to many effects (0 < pi < 1). The scalar alpha represents the intercept, and the p-dimensional vector beta represents the slopes.

pi <- 0.01
alpha <- 0
beta <- rbinom(n=p,size=1,prob=pi)

From the intercept alpha, the slopes beta and the inputs X, we calculate the linear predictor, the n-dimensional vector eta. Rescale the linear predictor to make the effects weaker or stronger. Set the argument family to "gaussian", "binomial", or "poisson" to define the distribution. The n times p matrix Y represents the outputs. We assume the outcomes are positively correlated.

eta <- alpha + X %*% beta
eta <- 1.5*scale(eta)
family <- "gaussian"

if(family=="gaussian"){
  mean <- eta
  Y <- replicate(n=q,expr=rnorm(n=n,mean=mean))
}

if(family=="binomial"){
  prob <- 1/(1+exp(-eta))
  Y <- replicate(n=q,expr=rbinom(n=n,size=1,prob=prob))
}

if(family=="poisson"){
  lambda <- exp(eta)
  Y <- replicate(n=q,expr=rpois(n=n,lambda=lambda))
}

cor(Y)

Application

The function joinet fits univariate and multivariate regression. Set the argument alpha.base to 0 (ridge) or 1 (lasso).

object <- joinet(Y=Y,X=X,family=family)

Standard methods are available. The function predict returns the predicted values from the univariate (base) and multivariate (meta) models. The function coef returns the estimated intercepts (alpha) and slopes (beta) from the multivariate model (input-output effects). And the function weights returns the weights from stacking (output-output effects).

predict(object,newx=X)

coef(object)

weights(object)

The function cv.joinet compares the predictive performance of univariate (base) and multivariate (meta) regression by nested cross-validation. The argument type.measure determines the loss function.

cv.joinet(Y=Y,X=X,family=family)

Reference

Armin Rauschenberger and Enrico Glaab (2021). "Predicting correlated outcomes from molecular data”. Bioinformatics 37(21):3889–3895. doi: 10.1093/bioinformatics/btab576. (Click here to access PDF.)