In kaneplusplus/dglm: Deep Versions of Generalized Linear Models (including Proportional Hazard)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Overview

The {dglm} package provide "deep" versions of (generalized) linear models with an interface that is designed to follow those of standard regression routines (lm(), glm(), and coxph()) as closely as possible while providing the ability to fit non-linear relationships among data. The package implements these model fitting routines through two functions that prepend a "d" in front of their linear-model analogues.

dglm() regress variables onto count, continuous, or categorical variables.
dcoxph() regress variables onto survival data.

In addition common associated methods including summary(), update(), and predict() are implemented along with others to facilitate the validating, examining, characterizing, updating, and predicting with these models.

Approach

Deep learners, regressing tensors of order two (matrics) onto tensors of order one (vectors), can be thought of as non-linear generalizations of linear models. For example, consider the linear model:

$$ y_i = x_{i1} \, \beta_1 \ + \ x_{i2} \, \beta_2 \ + \ x_{i3} \, \beta_3 + \ ... $$

We can rewrite this as

$$ y_i = f_0 \left(\, x_{i \cdot} \, \right) $$

where $f_0$ is a linear combination of the values in the vector $x_{i\cdot}$ -- a deep learner with no hidden layers. Likewise, a logistic regression can be written as:

$$ \log \left( \ y_i \ / \ (1-y_i) \ \right) = f_0 \left( \, x_{i \cdot} \, \right) $$

where $y_i$ is a zero-one variable. Any generalized linear model, including survival models, fits into a similar construction and we can extend these models to capture non-linear predictive information with respect to the dependent variable.

Installation

The package is not currently available on CRAN but you can install the development version of this package from GitHub with the following code

devtools::install_github("kaneplusplus/dlm")

Examples with the `penguins` data set

Regression onto a categorical variable

library(keras)
library(dlrm)

# Regress iris variables onto Species (categorical) and get the accuracy.
fit_dlmcat <- dlr(iris, Species ~ ., hidden_layers = 16, epochs = 100)
sum(iris$Species == dlr_predict(iris, fit_dlmcat)) / nrow(iris)

Regression onto a continous variable

# Regress iris variables onto Sepal.Length using a linear model and get the 
# in-sample prediction accuracy.
fit_linear <- lm(Sepal.Length ~ ., iris)
sd(predict(fit_linear, iris))

# Perform the same operation with a deep learner with two hidden layers
# with 24 and 2 nodes respectively.
fit_dlmc <- dlr(iris, Sepal.Length ~., hidden_layers = c(24, 2))
sd(iris$Sepal.Length - dlr_predict(iris, fit_dlmc))

A deep survival analysis

library(survival)
library(dplyr)

data(lung)

# Handle the categorical variables in lung
lung <- lung %>%
  mutate(status = status - 1,
         sex = as.factor(sex),
         ph.ecog = as.factor(ph.ecog))

# Fit the deep survival model.
dc_fit <- dlcp(
  lung,
  Surv(time, status) ~ sex + age + meal.cal + wt.loss + ph.ecog,
  epochs = 150,
  validation_split = .2)

dl_weights <- get_weights(dc_fit$model)[[1]]

# Compare the deep survival model coefficients to coxph.
cfit <- summary(coxph(Surv(time, status) ~ sex + age + meal.cal + wt.loss + ph.ecog, lung))$conf.int
comp <- round(cbind(dl_weights, log(cfit)[,-2]), 3)
colnames(comp) <- c("Deep Learner Coefs", "Cox Coef", "Cox Lower .95", "Cox Upper .95")
comp

# Plot the training history.
plot_training_history(dc_fit)

library(ggplot2)
library(dplyr)

iris %>% 
  metric_space_embedding(fit_dlmc) %>%
  `colnames<-`(c("X", "Y")) %>%
  as_tibble() %>%
  mutate(Species = iris$Species) %>%
  ggplot(aes(x = X, y = Y, color = Species)) +
    geom_point() +
    theme_bw()

kaneplusplus/dglm documentation built on Sept. 8, 2020, 12:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kaneplusplus/dglm
Deep Versions of Generalized Linear Models (including Proportional Hazard)

In kaneplusplus/dglm: Deep Versions of Generalized Linear Models (including Proportional Hazard)

Overview

Approach

Installation

Examples with the `penguins` data set

Regression onto a categorical variable

Regression onto a continous variable

A deep survival analysis

R Package Documentation

Browse R Packages

We want your feedback!

kaneplusplus/dglm Deep Versions of Generalized Linear Models (including Proportional Hazard)

In kaneplusplus/dglm: Deep Versions of Generalized Linear Models (including Proportional Hazard)

Overview

Approach

Installation

Examples with the penguins data set

Regression onto a categorical variable

Regression onto a continous variable

A deep survival analysis

R Package Documentation

Browse R Packages

We want your feedback!

kaneplusplus/dglm
Deep Versions of Generalized Linear Models (including Proportional Hazard)

Examples with the `penguins` data set