View source: R/deepregression.R
deepregression | R Documentation |
Fitting Semi-Structured Deep Distributional Regression
deepregression( y, list_of_formulae, list_of_deep_models, family = c("normal", "bernoulli", "bernoulli_prob", "beta", "betar", "cauchy", "chi2", "chi", "exponential", "gamma_gamma", "gamma", "gammar", "gumbel", "half_cauchy", "half_normal", "horseshoe", "inverse_gamma", "inverse_gamma_ls", "inverse_gaussian", "laplace", "log_normal", "logistic", "multinomial", "multinoulli", "negbinom", "negbinom_ls", "pareto", "pareto_ls", "poisson", "poisson_lograte", "student_t", "student_t_ls", "truncated_normal", "uniform", "zinb", "zip", "transformation_model"), train_together = list(), data, image_var = list(), dim_deep = NULL, df = NULL, lambda_lasso = NULL, lambda_ridge = NULL, convert_factors = FALSE, defaultSmoothing = NULL, cv_folds = NULL, validation_data = NULL, validation_split = ifelse(is.null(validation_data) & is.null(cv_folds), 0.2, 0), dist_fun = NULL, learning_rate = 0.01, optimizer = optimizer_adam(lr = learning_rate), fsbatch_optimizer = FALSE, fsbatch_options = fsbatch_control(), variational = FALSE, monitor_metric = list(), seed = 1991 - 5 - 4, tf_seed = NULL, mixture_dist = 0, split_fun = split_model, posterior_fun = posterior_mean_field, prior_fun = prior_trainable, null_space_penalty = variational, ind_fun = function(x) tfd_independent(x), extend_output_dim = 0, offset = NULL, offset_val = NULL, absorb_cons = FALSE, anisotropic = TRUE, zero_constraint_for_smooths = TRUE, orthog_type = c("tf", "manual"), orthogonalize = TRUE, hat1 = FALSE, sp_scale = NROW(y), order_bsp = NULL, y_basis_fun = function(y) eval_bsp(y, order = order_bsp, supp = range(y)), y_basis_fun_prime = function(y) eval_bsp_prime(y, order = order_bsp, supp = range(y))/diff(range(y)), split_between_shift_and_theta = NULL, addconst_interaction = NULL, additional_penalty = NULL, penalty_summary = k_sum, convertfun = as.matrix, compile_model = TRUE, base_distribution = "normal", ... )
y |
response variable |
list_of_formulae |
a named list of right hand side formulae,
one for each parameter of the distribution specified in |
list_of_deep_models |
a named list of functions specifying a keras model. See the examples for more details. |
family |
a character specifying the distribution. For information on
possible distribution and parameters, see |
train_together |
a list of formulae of the same length as |
data |
data.frame or named list with input features |
image_var |
named list; names correspond to image variables, values for each list item
corresponds to the input size of the respective image, e.g., |
dim_deep |
list for each distribution parameter with NULL or integer vector entries;
this is an optional argument to manually specify the input dimensions for the unstructured
model part(s) and required if placeholders are used for unstructured data sources. E.g.,
if the formula contains |
df |
degrees of freedom for all non-linear structural terms; either one common value or a list of the same length as number of parameters and each list item a list of the same length as number of smooth terms in the respective formula where elements can be vectors for two- or multidimensional smooth terms |
lambda_lasso |
scalar value or list of |
lambda_ridge |
scalar value or list of |
defaultSmoothing |
function applied to all s-terms, per default (NULL) the minimum df of all possible terms is used. |
cv_folds |
a list of lists, each list element has two elements, one for training indices and one for testing indices; if a single integer number is given, a simple k-fold cross-validation is defined, where k is the supplied number. |
validation_data |
data for validation during training. |
validation_split |
percentage of training data used for validation. Per default 0.2. |
dist_fun |
a custom distribution applied to the last layer,
see |
learning_rate |
learning rate for optimizer |
optimizer |
optimzer used. Per default ADAM. |
fsbatch_optimizer |
logical; use a special optimizer conducting a mini-batch variant of the Fellner-Schall algorithm. |
fsbatch_options |
call to |
variational |
logical value specifying whether or not to use
variational inference. If |
monitor_metric |
Further metrics to monitor |
seed |
integer value used as a seed in data splitting |
tf_seed |
a seed for tensorflow (only works with R version >= 2.2.0) |
mixture_dist |
integer either 0 or >= 2. If 0 (default), no mixture distribution is fitted. If >= 2, a network is constructed that outputs a multivariate response for each of the mixture components. |
split_fun |
a function separating the deep neural network in two parts
so that the orthogonalization can be applied to the first part before
applying the second network part; per default, the function |
posterior_fun |
function defining the posterior function for the variational verison of the network |
prior_fun |
function defining the prior function for the variational verison of the network |
null_space_penalty |
logical value;
if TRUE, the null space will also be penalized for smooth effects.
Per default, this is equal to the value give in |
ind_fun |
function applied to the model output before calculating the
log-likelihood. Per default independence is assumed by applying |
extend_output_dim |
integer value >= 0 for extending the output dimension by an
additive constant. If set to a value > 0, a multivariate response with dimension
|
offset |
a list of column vectors (i.e. matrix with ncol = 1) or NULLs for each parameter, in case an offset should be added to the additive predictor; if NULL, no offset is used |
offset_val |
a list analogous to offset for the validation data |
absorb_cons |
logical; adds identifiability constraint to the basisi.
See |
anisotropic |
whether or not use anisotropic smoothing (default is TRUE) |
zero_constraint_for_smooths |
logical; the same as absorb_cons,
but done explicitly. If true a constraint is put on each smooth to have zero mean. Can
be a vector of |
orthog_type |
one of two options; If |
orthogonalize |
logical; if set to |
hat1 |
logical; if TRUE, the smoothing parameter is defined by the trace of the hat matrix sum(diag(H)), else sum(diag(2*H-HH)) |
sp_scale |
positive constant; for scaling the DRO calculated penalty (1 per default) |
order_bsp |
NULL or integer; order of Bernstein polynomials; if not NULL, a conditional transformation model (CTM) is fitted. |
y_basis_fun, y_basis_fun_prime |
basis functions for y transformation for CTM case |
split_between_shift_and_theta |
if |
addconst_interaction |
positive constant;
a constant added to the additive predictor of the interaction term.
If |
additional_penalty |
a penalty that is added to the negative log-likelihood; must be
a function of the keras model's |
penalty_summary |
keras function; summary function for the penalty in the spline layer;
default is |
convertfun |
function to convert objects into a matrix or tensor format. |
compile_model |
logical; whether to compile the model or not |
base_distribution |
tfd_distribution or string; the base distribution for
transformation models, see |
... |
further arguments passed to the |
library(deepregression) n <- 1000 data = data.frame(matrix(rnorm(4*n), c(n,4))) colnames(data) <- c("x1","x2","x3","xa") formula <- ~ 1 + deep_model(x1,x2,x3) + s(xa) + x1 deep_model <- function(x) x %>% layer_dense(units = 32, activation = "relu", use_bias = FALSE) %>% layer_dropout(rate = 0.2) %>% layer_dense(units = 8, activation = "relu") %>% layer_dense(units = 1, activation = "linear") y <- rnorm(n) + data$xa^2 + data$x1 mod <- deepregression( list_of_formulae = list(loc = formula, scale = ~ 1), data = data, validation_data = list(data, y), y = y, list_of_deep_models = list(deep_model = deep_model), df = 6, tf_seed = 1 ) # train for more than 10 epochs to get a better model mod %>% fit(epochs = 10, early_stopping = TRUE) mod %>% plot() mod %>% coef()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.