knitr::opts_chunk$set(echo = TRUE)

DeepNeuralNetworks4R

Regression algorithm package based on Deep Neural Networks for Omic data prediction in brain transcriptomics (although as a regression model, it can be applied to any problem with a dependent continuous variable).

Developer: Óscar González-Velasco [oscargv@usal.es] - Bioinformatics and functional genomics group, Cancer Research Center Salamanca (CIC-IBMCC)

Citing this package: Oscar González-Velasco, et al., BBA - Gene Regulatory Mechanisms, https://doi.org/10.1016/j.bbagrm.2020.194491

Installation

  1. The package binaries are available for download on github: https://github.com/OscarGVelasco/DeepNeuralNetworks4R/blob/master/DeepNeuralNetworks4R_0.1.0.tar.gz
install.packages("DeepNeuralNetworks4R_0.1.0.tar.gz")
  1. Or by installing it using devtools:
install_github("OscarGVelasco/DeepNeuralNetworks4R")

Example using the data available inside the package

We will use a set of transcriptomic data from human brain samples included on the package as an example of a regression model using deep neural networks to predict the biological age. It consist of 2 dataframes: training.data and test.data, composed of gene signal (numeric) and age of every individual:

# We load the Deep Neural Network package:
library(DeepNeuralNetworks4R)

We will try to predict the age of the individuals based on the gene expression of 1078 genes selected because of its implications on brain aging on cortex region (Oscar González-Velasco, et al., BBA - Gene Regulatory Mechanisms, https://doi.org/10.1016/j.bbagrm.2020.194491).

# We inspectionate the data included within the package:
training.data[1:5,1:5]

# We will select the first 3 genes (the most significant genes linked with aging) to build 3 additional
# data matrix using each of these genes as the centroid:
zscore.targets <- as.list(rownames(training.data))[1:3]

# Print the 3 first genes
zscore.targets

Here we can find an except for the training dataset, notice that response variables correspond to rows, meanwhile samples correspond to columns .

Training the regression model

First, we proceed to create the deep neural network model:

model <- deepNeuralNetwork.build(
            x=1:(nrow(training.data)-1),
            y=nrow(training.data),
            outputNeurons = 1,
            HidenLayerNeurons = c(20,20,20,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10),
            traindata=training.data,
            drawDNN = 0,
            standarization = zscore.targets)

x will specify the index positions (row numbers) of our explanatory variables on the matrix training.data

y will specify the index positions (row numbers) of our observed variable on the matrix training.data, here it will correspond with the age.

HidenLayerNeurons will specify the number of neurons that each layer will have. The number of neurons on the very first layer will be the number of variables that we will use to create the regression model (deepNeuralNetwork.build calculate this automatically based on the x parameter).

deepNeuralNetwork.build will create an object of class DeepNNModel that will store all the information about the dnn model.

We train the deep neural network using the following code:

# 3. train model
timeNN <- system.time(
  model.trained <- deepNeuralNetwork.training(
                        x=1:(nrow(training.data)-1),
                        y=nrow(training.data),
                        model = model, #ddn.model.in.use,
                        traindata=training.data,
                        testdata=test.data,
                        iterations  = 100,
                        lr = 0.001,
                        reg = 0.001,
                        display=1000,
                        maxError = 0.1,
                        standarization = zscore.targets))

Testing the results

Once we have the model, we will make use of deepNeuralNetwork.predict function to predict a variable based on the trained regression model:

age.prediction <- deepNeuralNetwork.predict(model.trained = model.trained,
                                            data = test.data[-nrow(test.data),],
                                            standarization = zscore.targets)

Using mplot_linear function included in the package we can plot the performance of our results obtained from the regression model.

mplot_lineal(observed = test.data[nrow(test.data),],
             predicted = age.prediction,
             title = "Biological age prediction using DNN regression from human brain data",
             x.lab="chronological age (years)",y.lab = "transcriptomic age")

We can also plot the residuals from the results of the regression model (a good model will produce random residuals); it also shows the labels of the detected outliers (we consider a studentized residual an outlier if > 1.5*IQR):

residuals_plot(observed = test.data[nrow(test.data),],
               predicted = age.prediction)

Using GPU for large datasets

The DeepNN algorithm has been optimized to be executed on a GPU card using R's matrix/vector arithmetic expressions.

The following example shows how to run the DNN on a GPU using CUDA with R's nvblas config file:

{bash, eval=FALSE} sudo env LD_PRELOAD=/PATH/TO/CUDA/NVBLAS/libnvblas.so NVBLAS_CONFIG_FILE=/PATH/TO/NVBLAS.CONFIG.FILE/nvblas.conf R CMD BATCH ./regression.deepNN.GPU.r /dev/tty



OscarGVelasco/DeepNeuralNetworks4R documentation built on Jan. 24, 2021, 12:42 a.m.