deepregression: Fitting Semi-Structured Deep Distributional Regression

View source: R/deepregression.R

deepregressionR Documentation

Fitting Semi-Structured Deep Distributional Regression

Description

Fitting Semi-Structured Deep Distributional Regression

Usage

deepregression(
  y,
  list_of_formulae,
  list_of_deep_models,
  family = c("normal", "bernoulli", "bernoulli_prob", "beta", "betar", "cauchy",
    "chi2", "chi", "exponential", "gamma_gamma", "gamma", "gammar", "gumbel",
    "half_cauchy", "half_normal", "horseshoe", "inverse_gamma", "inverse_gamma_ls",
    "inverse_gaussian", "laplace", "log_normal", "logistic", "multinomial",
    "multinoulli", "negbinom", "negbinom_ls", "pareto", "pareto_ls", "poisson",
    "poisson_lograte", "student_t", "student_t_ls", "truncated_normal", "uniform",
    "zinb", "zip", "transformation_model"),
  train_together = list(),
  data,
  image_var = list(),
  dim_deep = NULL,
  df = NULL,
  lambda_lasso = NULL,
  lambda_ridge = NULL,
  convert_factors = FALSE,
  defaultSmoothing = NULL,
  cv_folds = NULL,
  validation_data = NULL,
  validation_split = ifelse(is.null(validation_data) & is.null(cv_folds), 0.2, 0),
  dist_fun = NULL,
  learning_rate = 0.01,
  optimizer = optimizer_adam(lr = learning_rate),
  fsbatch_optimizer = FALSE,
  fsbatch_options = fsbatch_control(),
  variational = FALSE,
  monitor_metric = list(),
  seed = 1991 - 5 - 4,
  tf_seed = NULL,
  mixture_dist = 0,
  split_fun = split_model,
  posterior_fun = posterior_mean_field,
  prior_fun = prior_trainable,
  null_space_penalty = variational,
  ind_fun = function(x) tfd_independent(x),
  extend_output_dim = 0,
  offset = NULL,
  offset_val = NULL,
  absorb_cons = FALSE,
  anisotropic = TRUE,
  zero_constraint_for_smooths = TRUE,
  orthog_type = c("tf", "manual"),
  orthogonalize = TRUE,
  hat1 = FALSE,
  sp_scale = NROW(y),
  order_bsp = NULL,
  y_basis_fun = function(y) eval_bsp(y, order = order_bsp, supp = range(y)),
  y_basis_fun_prime = function(y) eval_bsp_prime(y, order = order_bsp, supp =
    range(y))/diff(range(y)),
  split_between_shift_and_theta = NULL,
  addconst_interaction = NULL,
  additional_penalty = NULL,
  penalty_summary = k_sum,
  convertfun = as.matrix,
  compile_model = TRUE,
  base_distribution = "normal",
  ...
)

Arguments

y

response variable

list_of_formulae

a named list of right hand side formulae, one for each parameter of the distribution specified in family; set to ~ 1 if the parameter should be treated as constant. Use the s()-notation from mgcv for specification of non-linear structured effects and d(...) for deep learning predictors (predictors in brackets are separated by commas), where d can be replaced by an name name of the names in list_of_deep_models, e.g., ~ 1 + s(x) + my_deep_mod(a,b,c), where my_deep_mod is the name of the neural net specified in list_of_deep_models and a,b,c are features modeled via this network.

list_of_deep_models

a named list of functions specifying a keras model. See the examples for more details.

family

a character specifying the distribution. For information on possible distribution and parameters, see make_tfd_dist

train_together

a list of formulae of the same length as list_of_formulae specifying the deep predictors that should be trained together and then the results are; fed into different distribution parameters; use the same name for the deep predictor to indicate for which distribution parameter they should be used. For example, if the second and fourth list entry are ~ lstm_mod(text) then the jointly learned lstm_mod network is added to the linear predictor of the second and fourth distribution parameter. Those network names should then be excluded from the list_of_formulae

data

data.frame or named list with input features

image_var

named list; names correspond to image variables, values for each list item corresponds to the input size of the respective image, e.g., list(image = list(c(200,200,3))).

dim_deep

list for each distribution parameter with NULL or integer vector entries; this is an optional argument to manually specify the input dimensions for the unstructured model part(s) and required if placeholders are used for unstructured data sources. E.g., if the formula contains nn(images) where images are ids of images that are loaded using a generator, the size of the image is part of the respective

df

degrees of freedom for all non-linear structural terms; either one common value or a list of the same length as number of parameters and each list item a list of the same length as number of smooth terms in the respective formula where elements can be vectors for two- or multidimensional smooth terms

lambda_lasso

scalar value or list of length(list_of_formulae); smoothing parameter for lasso regression; can be combined with ridge

lambda_ridge

scalar value or list of length(list_of_formulae); smoothing parameter for ridge regression; can be combined with lasso

defaultSmoothing

function applied to all s-terms, per default (NULL) the minimum df of all possible terms is used.

cv_folds

a list of lists, each list element has two elements, one for training indices and one for testing indices; if a single integer number is given, a simple k-fold cross-validation is defined, where k is the supplied number.

validation_data

data for validation during training.

validation_split

percentage of training data used for validation. Per default 0.2.

dist_fun

a custom distribution applied to the last layer, see make_tfd_dist for more details on how to construct a custom distribution function.

learning_rate

learning rate for optimizer

optimizer

optimzer used. Per default ADAM.

fsbatch_optimizer

logical; use a special optimizer conducting a mini-batch variant of the Fellner-Schall algorithm.

fsbatch_options

call to fsbatch_control with list of options for fsbatch_optimizer. See ?fsbatch_control for details.

variational

logical value specifying whether or not to use variational inference. If TRUE, details must be passed to the via the ellipsis to the initialization function (see deepregression_init)

monitor_metric

Further metrics to monitor

seed

integer value used as a seed in data splitting

tf_seed

a seed for tensorflow (only works with R version >= 2.2.0)

mixture_dist

integer either 0 or >= 2. If 0 (default), no mixture distribution is fitted. If >= 2, a network is constructed that outputs a multivariate response for each of the mixture components.

split_fun

a function separating the deep neural network in two parts so that the orthogonalization can be applied to the first part before applying the second network part; per default, the function split_model is used which assumes a dense layer as penultimate layer and separates the network into a first part without this last layer and a second part only consisting of a single dense layer that is fed into the output layer

posterior_fun

function defining the posterior function for the variational verison of the network

prior_fun

function defining the prior function for the variational verison of the network

null_space_penalty

logical value; if TRUE, the null space will also be penalized for smooth effects. Per default, this is equal to the value give in variational.

ind_fun

function applied to the model output before calculating the log-likelihood. Per default independence is assumed by applying tfd_independent.

extend_output_dim

integer value >= 0 for extending the output dimension by an additive constant. If set to a value > 0, a multivariate response with dimension 1 + extend_output_dim is defined.

offset

a list of column vectors (i.e. matrix with ncol = 1) or NULLs for each parameter, in case an offset should be added to the additive predictor; if NULL, no offset is used

offset_val

a list analogous to offset for the validation data

absorb_cons

logical; adds identifiability constraint to the basisi. See ?mgcv::smoothCon for more details.

anisotropic

whether or not use anisotropic smoothing (default is TRUE)

zero_constraint_for_smooths

logical; the same as absorb_cons, but done explicitly. If true a constraint is put on each smooth to have zero mean. Can be a vector of length(list_of_formulae) for each distribution parameter.

orthog_type

one of two options; If "manual", the QR decomposition is calculated before model fitting, otherwise ("tf") a QR is calculated in each batch iteration via TF. The first only works well for larger batch sizes or ideally batch_size == NROW(y).

orthogonalize

logical; if set to FALSE, orthogonalization is deactivated

hat1

logical; if TRUE, the smoothing parameter is defined by the trace of the hat matrix sum(diag(H)), else sum(diag(2*H-HH))

sp_scale

positive constant; for scaling the DRO calculated penalty (1 per default)

order_bsp

NULL or integer; order of Bernstein polynomials; if not NULL, a conditional transformation model (CTM) is fitted.

y_basis_fun, y_basis_fun_prime

basis functions for y transformation for CTM case

split_between_shift_and_theta

if family == 'transformation_model' and !is.null(train_together), split_between_shift_and_theta is supposed to define how many of the last layer's hidden units are used for the shift term and how many for the theta term (an integer vector of length 2).

addconst_interaction

positive constant; a constant added to the additive predictor of the interaction term. If NULL, terms are left unchanged. If 0 and predictors have negative values in their design matrix, the minimum value of all predictors is added to ensure positivity. If > 0, the minimum value plus the addconst_interaction is added to each predictor in the interaction term.

additional_penalty

a penalty that is added to the negative log-likelihood; must be a function of the keras model's trainable_weights (including necessary subsetting) which is always called model

penalty_summary

keras function; summary function for the penalty in the spline layer; default is k_sum. Another option could be k_mean.

convertfun

function to convert objects into a matrix or tensor format.

compile_model

logical; whether to compile the model or not

base_distribution

tfd_distribution or string; the base distribution for transformation models, see ?deeptransformation\_init.

...

further arguments passed to the deepregression\_init function

Examples

library(deepregression)

n <- 1000
data = data.frame(matrix(rnorm(4*n), c(n,4)))
colnames(data) <- c("x1","x2","x3","xa")
formula <- ~ 1 + deep_model(x1,x2,x3) + s(xa) + x1

deep_model <- function(x) x %>%
layer_dense(units = 32, activation = "relu", use_bias = FALSE) %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 8, activation = "relu") %>%
layer_dense(units = 1, activation = "linear")

y <- rnorm(n) + data$xa^2 + data$x1

mod <- deepregression(
  list_of_formulae = list(loc = formula, scale = ~ 1),
  data = data, validation_data = list(data, y), y = y,
  list_of_deep_models = list(deep_model = deep_model), 
  df = 6, 
  tf_seed = 1
)

# train for more than 10 epochs to get a better model
mod %>% fit(epochs = 10, early_stopping = TRUE)
mod %>% plot()
mod %>% coef()


davidruegamer/deepregression documentation built on May 30, 2022, 6:21 p.m.