Basics

Introduction

The package implements a collection of Generalized Elastic Net (GELnet) solvers, as outlined in the following publication: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004790 This vignette covers the basic usage of the gelnet package. The interface to the solvers was designed to be very flexible, by allowing the user to specify a large number of parameters. At the same time, nearly all of the parameters are given reasonable default values, making the interface easy to use if the additional flexibility is not required.

Building your first GELnet model

Let $X$ be a samples-by-features data matrix and $y$ be a column vector of labels. Typically, $X$ and $y$ are determined by the prediction task at hand, but for the purposes of this tutorial, we are going to generate them randomly:

X <- matrix( rnorm( 1000 ), 20, 50 )
y <- rnorm( 20 )

Let's further assume that we are given a feature-feature relationship matrix $A$. If working with genomic features, $A$ might be the adjacency of a gene-gene interaction network. Again, we are going to generate a random matrix for the purposes of this tutorial:

A <- matrix( sample( 0:1, 50*50, repl=TRUE ), 50, 50 )
A <- A & t(A)  ## Make the matrix symmetric

We are now going to utilize the GELnet toolkit to learn a linear regression model, such that the model weights are more similar for the features that share an interaction on $A$. As discussed in the manuscript, this can be achieved by formulating a feature-feature penalty matrix using either the graph Laplacian or $(I-D)$, where $D$ is the graph diffusion matrix and $I$ is the identity matrix. The gelnet package provides a function to compute the graph Laplacian from the adjacency. Here, we utilize the normalized Laplacian to keep the penalty term on the same scale as the traditional ridge regression:

library( gelnet )
L <- adj2nlapl(A)

The model can now be learned via

model <- gelnet( X, y, 0.1, 1, P = L )

where we set the L1-norm and L2-norm penalties to 0.1 and 1, respectively. The response for new samples is computed via the dot product with the weights:

Xnew <- matrix( rnorm( 500 ), 10, 50 )
Xnew %*% model$w + model$b

Other regression problems

Linear regression is one of the three types of prediction problems supported by the package. The other two are binary logistic regression and one-class logistic regression. The latter is outlined in the following paper: http://psb.stanford.edu/psb-online/proceedings/psb16/sokolov.pdf

The package recognizes the problem type based on the class of the $y$ argument. To train a binary predictor, we have to provide $y$ as a two-level factor, where the first level is treated as the positive class.

y <- factor( y > 0, levels=c(TRUE,FALSE) )
model2 <- gelnet( X, y, 0.1, 1, P=L )

If we were to score the training data using this model, we can observe that the positive samples are receiving higher scores than the negative ones

data.frame( scores= X %*% model2$w + model2$b, labels= y )

However, if there is class imbalance, the scores will tend to be skewed towards the class with more samples. This can be addressed by using an additional flag when training the model

model2bal <- gelnet( X, y, 0.1, 1, P=L, balanced=TRUE )
data.frame( scores= X %*% model2bal$w + model2bal$b, labels= y )

Traditionally, the loss function for logistic regression is averaged over $n$, the number of samples. This causes every sample to make the same contribution to the loss, which is what causes the skew towards the larger class. By using the balanced flag, the problem is reformulated slightly such that the loss is averaged over the positive and negative samples separately, and then the mean of both averages is used as the overall loss.

Finally, we can build a one-class logistic regression model using just the positive samples. To train a one-class model we simply provide NULL for the $y$ argument:

j <- which( y == TRUE )
model1 <- gelnet( X[j,], NULL, 0.1, 1, P=L )

The model can now be used as a detector that recognizes the positive samples

data.frame( scores= X %*% model2bal$w + model2bal$b, labels= y )


Try the gelnet package in your browser

Any scripts or data that you put into this service are public.

gelnet documentation built on May 2, 2019, 2:10 p.m.