deepregression: Fitting Semi-Structured Deep Distributional Regression
In davidruegamer/deepregression: Fitting Deep Distributional Regression

deepregression

R Documentation

Fitting Semi-Structured Deep Distributional Regression

Description

Fitting Semi-Structured Deep Distributional Regression

Usage

deepregression(
  y,
  list_of_formulae,
  list_of_deep_models,
  family = c("normal", "bernoulli", "bernoulli_prob", "beta", "betar", "cauchy",
    "chi2", "chi", "exponential", "gamma_gamma", "gamma", "gammar", "gumbel",
    "half_cauchy", "half_normal", "horseshoe", "inverse_gamma", "inverse_gamma_ls",
    "inverse_gaussian", "laplace", "log_normal", "logistic", "multinomial",
    "multinoulli", "negbinom", "negbinom_ls", "pareto", "pareto_ls", "poisson",
    "poisson_lograte", "student_t", "student_t_ls", "truncated_normal", "uniform",
    "zinb", "zip", "transformation_model"),
  train_together = list(),
  data,
  image_var = list(),
  dim_deep = NULL,
  df = NULL,
  lambda_lasso = NULL,
  lambda_ridge = NULL,
  convert_factors = FALSE,
  defaultSmoothing = NULL,
  cv_folds = NULL,
  validation_data = NULL,
  validation_split = ifelse(is.null(validation_data) & is.null(cv_folds), 0.2, 0),
  dist_fun = NULL,
  learning_rate = 0.01,
  optimizer = optimizer_adam(lr = learning_rate),
  fsbatch_optimizer = FALSE,
  fsbatch_options = fsbatch_control(),
  variational = FALSE,
  monitor_metric = list(),
  seed = 1991 - 5 - 4,
  tf_seed = NULL,
  mixture_dist = 0,
  split_fun = split_model,
  posterior_fun = posterior_mean_field,
  prior_fun = prior_trainable,
  null_space_penalty = variational,
  ind_fun = function(x) tfd_independent(x),
  extend_output_dim = 0,
  offset = NULL,
  offset_val = NULL,
  absorb_cons = FALSE,
  anisotropic = TRUE,
  zero_constraint_for_smooths = TRUE,
  orthog_type = c("tf", "manual"),
  orthogonalize = TRUE,
  hat1 = FALSE,
  sp_scale = NROW(y),
  order_bsp = NULL,
  y_basis_fun = function(y) eval_bsp(y, order = order_bsp, supp = range(y)),
  y_basis_fun_prime = function(y) eval_bsp_prime(y, order = order_bsp, supp =
    range(y))/diff(range(y)),
  split_between_shift_and_theta = NULL,
  addconst_interaction = NULL,
  additional_penalty = NULL,
  penalty_summary = k_sum,
  convertfun = as.matrix,
  compile_model = TRUE,
  base_distribution = "normal",
  ...
)

Arguments

`y`	response variable
`list_of_formulae`	a named list of right hand side formulae, one for each parameter of the distribution specified in `family`; set to `~ 1` if the parameter should be treated as constant. Use the `s()`-notation from `mgcv` for specification of non-linear structured effects and `d(...)` for deep learning predictors (predictors in brackets are separated by commas), where `d` can be replaced by an name name of the names in `list_of_deep_models`, e.g., `~ 1 + s(x) + my_deep_mod(a,b,c)`, where my_deep_mod is the name of the neural net specified in `list_of_deep_models` and `a,b,c` are features modeled via this network.
`list_of_deep_models`	a named list of functions specifying a keras model. See the examples for more details.
`family`	a character specifying the distribution. For information on possible distribution and parameters, see `make_tfd_dist`
`train_together`	a list of formulae of the same length as `list_of_formulae` specifying the deep predictors that should be trained together and then the results are; fed into different distribution parameters; use the same name for the deep predictor to indicate for which distribution parameter they should be used. For example, if the second and fourth list entry are `~ lstm_mod(text)` then the jointly learned `lstm_mod` network is added to the linear predictor of the second and fourth distribution parameter. Those network names should then be excluded from the `list_of_formulae`
`data`	data.frame or named list with input features
`image_var`	named list; names correspond to image variables, values for each list item corresponds to the input size of the respective image, e.g., `list(image = list(c(200,200,3)))`.
`dim_deep`	list for each distribution parameter with NULL or integer vector entries; this is an optional argument to manually specify the input dimensions for the unstructured model part(s) and required if placeholders are used for unstructured data sources. E.g., if the formula contains `nn(images)` where `images` are ids of images that are loaded using a generator, the size of the image is part of the respective
`df`	degrees of freedom for all non-linear structural terms; either one common value or a list of the same length as number of parameters and each list item a list of the same length as number of smooth terms in the respective formula where elements can be vectors for two- or multidimensional smooth terms
`lambda_lasso`	scalar value or list of `length(list_of_formulae)`; smoothing parameter for lasso regression; can be combined with ridge
`lambda_ridge`	scalar value or list of `length(list_of_formulae)`; smoothing parameter for ridge regression; can be combined with lasso
`defaultSmoothing`	function applied to all s-terms, per default (NULL) the minimum df of all possible terms is used.
`cv_folds`	a list of lists, each list element has two elements, one for training indices and one for testing indices; if a single integer number is given, a simple k-fold cross-validation is defined, where k is the supplied number.
`validation_data`	data for validation during training.
`validation_split`	percentage of training data used for validation. Per default 0.2.
`dist_fun`	a custom distribution applied to the last layer, see `make_tfd_dist` for more details on how to construct a custom distribution function.
`learning_rate`	learning rate for optimizer
`optimizer`	optimzer used. Per default ADAM.
`fsbatch_optimizer`	logical; use a special optimizer conducting a mini-batch variant of the Fellner-Schall algorithm.
`fsbatch_options`	call to `fsbatch_control` with list of options for fsbatch_optimizer. See `?fsbatch_control` for details.
`variational`	logical value specifying whether or not to use variational inference. If `TRUE`, details must be passed to the via the ellipsis to the initialization function (see `deepregression_init`)
`monitor_metric`	Further metrics to monitor
`seed`	integer value used as a seed in data splitting
`tf_seed`	a seed for tensorflow (only works with R version >= 2.2.0)
`mixture_dist`	integer either 0 or >= 2. If 0 (default), no mixture distribution is fitted. If >= 2, a network is constructed that outputs a multivariate response for each of the mixture components.
`split_fun`	a function separating the deep neural network in two parts so that the orthogonalization can be applied to the first part before applying the second network part; per default, the function `split_model` is used which assumes a dense layer as penultimate layer and separates the network into a first part without this last layer and a second part only consisting of a single dense layer that is fed into the output layer
`posterior_fun`	function defining the posterior function for the variational verison of the network
`prior_fun`	function defining the prior function for the variational verison of the network
`null_space_penalty`	logical value; if TRUE, the null space will also be penalized for smooth effects. Per default, this is equal to the value give in `variational`.
`ind_fun`	function applied to the model output before calculating the log-likelihood. Per default independence is assumed by applying `tfd_independent`.
`extend_output_dim`	integer value >= 0 for extending the output dimension by an additive constant. If set to a value > 0, a multivariate response with dimension `1 + extend_output_dim` is defined.
`offset`	a list of column vectors (i.e. matrix with ncol = 1) or NULLs for each parameter, in case an offset should be added to the additive predictor; if NULL, no offset is used
`offset_val`	a list analogous to offset for the validation data
`absorb_cons`	logical; adds identifiability constraint to the basisi. See `?mgcv::smoothCon` for more details.
`anisotropic`	whether or not use anisotropic smoothing (default is TRUE)
`zero_constraint_for_smooths`	logical; the same as absorb_cons, but done explicitly. If true a constraint is put on each smooth to have zero mean. Can be a vector of `length(list_of_formulae)` for each distribution parameter.
`orthog_type`	one of two options; If `"manual"`, the QR decomposition is calculated before model fitting, otherwise (`"tf"`) a QR is calculated in each batch iteration via TF. The first only works well for larger batch sizes or ideally batch_size == NROW(y).
`orthogonalize`	logical; if set to `FALSE`, orthogonalization is deactivated
`hat1`	logical; if TRUE, the smoothing parameter is defined by the trace of the hat matrix sum(diag(H)), else sum(diag(2*H-HH))
`sp_scale`	positive constant; for scaling the DRO calculated penalty (1 per default)
`order_bsp`	NULL or integer; order of Bernstein polynomials; if not NULL, a conditional transformation model (CTM) is fitted.
`y_basis_fun, y_basis_fun_prime`	basis functions for y transformation for CTM case
`split_between_shift_and_theta`	if `family == 'transformation_model'` and `!is.null(train_together)`, `split_between_shift_and_theta` is supposed to define how many of the last layer's hidden units are used for the shift term and how many for the theta term (an integer vector of length 2).
`addconst_interaction`	positive constant; a constant added to the additive predictor of the interaction term. If `NULL`, terms are left unchanged. If 0 and predictors have negative values in their design matrix, the minimum value of all predictors is added to ensure positivity. If > 0, the minimum value plus the `addconst_interaction` is added to each predictor in the interaction term.
`additional_penalty`	a penalty that is added to the negative log-likelihood; must be a function of the keras model's `trainable_weights` (including necessary subsetting) which is always called `model`
`penalty_summary`	keras function; summary function for the penalty in the spline layer; default is `k_sum`. Another option could be `k_mean`.
`convertfun`	function to convert objects into a matrix or tensor format.
`compile_model`	logical; whether to compile the model or not
`base_distribution`	tfd_distribution or string; the base distribution for transformation models, see `?deeptransformation\_init`.
`...`	further arguments passed to the `deepregression\_init` function

Examples

library(deepregression)

n <- 1000
data = data.frame(matrix(rnorm(4*n), c(n,4)))
colnames(data) <- c("x1","x2","x3","xa")
formula <- ~ 1 + deep_model(x1,x2,x3) + s(xa) + x1

deep_model <- function(x) x %>%
layer_dense(units = 32, activation = "relu", use_bias = FALSE) %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 8, activation = "relu") %>%
layer_dense(units = 1, activation = "linear")

y <- rnorm(n) + data$xa^2 + data$x1

mod <- deepregression(
  list_of_formulae = list(loc = formula, scale = ~ 1),
  data = data, validation_data = list(data, y), y = y,
  list_of_deep_models = list(deep_model = deep_model), 
  df = 6, 
  tf_seed = 1
)

# train for more than 10 epochs to get a better model
mod %>% fit(epochs = 10, early_stopping = TRUE)
mod %>% plot()
mod %>% coef()

davidruegamer/deepregression documentation built on May 30, 2022, 6:21 p.m.