knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The {dglm} package provide "deep" versions of (generalized) linear models with an interface that is designed to follow those of standard regression routines (lm()
, glm()
, and coxph()
) as closely as possible while providing the ability to fit non-linear relationships among data. The package implements these model fitting routines through two functions that prepend a "d" in front of their linear-model analogues.
dglm()
regress variables onto count, continuous, or categorical variables.dcoxph()
regress variables onto survival data.In addition common associated methods including summary()
, update()
, and predict()
are implemented along with others to facilitate the validating, examining, characterizing, updating, and predicting with these models.
Deep learners, regressing tensors of order two (matrics) onto tensors of order one (vectors), can be thought of as non-linear generalizations of linear models. For example, consider the linear model:
$$ y_i = x_{i1} \, \beta_1 \ + \ x_{i2} \, \beta_2 \ + \ x_{i3} \, \beta_3 + \ ... $$
We can rewrite this as
$$ y_i = f_0 \left(\, x_{i \cdot} \, \right) $$
where $f_0$ is a linear combination of the values in the vector $x_{i\cdot}$ -- a deep learner with no hidden layers. Likewise, a logistic regression can be written as:
$$ \log \left( \ y_i \ / \ (1-y_i) \ \right) = f_0 \left( \, x_{i \cdot} \, \right) $$
where $y_i$ is a zero-one variable. Any generalized linear model, including survival models, fits into a similar construction and we can extend these models to capture non-linear predictive information with respect to the dependent variable.
The package is not currently available on CRAN but you can install the development version of this package from GitHub with the following code
devtools::install_github("kaneplusplus/dlm")
penguins
data setlibrary(keras) library(dlrm) # Regress iris variables onto Species (categorical) and get the accuracy. fit_dlmcat <- dlr(iris, Species ~ ., hidden_layers = 16, epochs = 100) sum(iris$Species == dlr_predict(iris, fit_dlmcat)) / nrow(iris)
# Regress iris variables onto Sepal.Length using a linear model and get the # in-sample prediction accuracy. fit_linear <- lm(Sepal.Length ~ ., iris) sd(predict(fit_linear, iris)) # Perform the same operation with a deep learner with two hidden layers # with 24 and 2 nodes respectively. fit_dlmc <- dlr(iris, Sepal.Length ~., hidden_layers = c(24, 2)) sd(iris$Sepal.Length - dlr_predict(iris, fit_dlmc))
library(survival) library(dplyr) data(lung) # Handle the categorical variables in lung lung <- lung %>% mutate(status = status - 1, sex = as.factor(sex), ph.ecog = as.factor(ph.ecog)) # Fit the deep survival model. dc_fit <- dlcp( lung, Surv(time, status) ~ sex + age + meal.cal + wt.loss + ph.ecog, epochs = 150, validation_split = .2) dl_weights <- get_weights(dc_fit$model)[[1]] # Compare the deep survival model coefficients to coxph. cfit <- summary(coxph(Surv(time, status) ~ sex + age + meal.cal + wt.loss + ph.ecog, lung))$conf.int comp <- round(cbind(dl_weights, log(cfit)[,-2]), 3) colnames(comp) <- c("Deep Learner Coefs", "Cox Coef", "Cox Lower .95", "Cox Upper .95") comp # Plot the training history. plot_training_history(dc_fit)
library(ggplot2) library(dplyr) iris %>% metric_space_embedding(fit_dlmc) %>% `colnames<-`(c("X", "Y")) %>% as_tibble() %>% mutate(Species = iris$Species) %>% ggplot(aes(x = X, y = Y, color = Species)) + geom_point() + theme_bw()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.