EdNetTrain: Train a neural network model
In EdwinGraham/EdNet: Train deep neural networks for classification and regression

Description Usage Arguments Value Author(s) Examples

Train a neural network model

EdNetTrain(
  X,
  Y,
  family=NULL,
  learning_rate=0.05,
  num_epochs,
  hidden_layer_dims=NULL,
  hidden_layer_activations=NULL,
  weight=NULL,
  offset=NULL,
  optimiser="GradientDescent",
  keep_prob=NULL,
  input_keep_prob=NULL,
  tweediePower=ifelse(family=="tweedie", 1.5, NULL),
  alpha=0,
  lambda=0,
  mini_batch_size=NULL,
  dev_set=NULL,
  beta1=ifelse(optimiser %in% c("Momentum", "Adam"), 0.9, NULL),
  beta2=ifelse(optimiser %in% c("RMSProp", "Adam"), 0.999, NULL),
  epsilon=ifelse(optimiser %in% c("RMSProp", "Adam"), 1E-8, NULL),
  initialisation_constant=2,
  print_every_n=NULL,
  seed=1984L,
  plot=TRUE,
  checkpoint=NULL,
  keep=FALSE
)

`X`	A matrix with rows as training examples and columns as input features
`Y`	A matrix with rows as training examples and columns as target values
`family`	Type of regression to be performed. One of "binary", "multiclass", "gaussian", "poisson", "gamma", "tweedie". Will be ignored if starting from a checkpoint model. Alternatively you can specify a named list with the following elements: "family" - a character of length 1 for reference only (must use "multiclass" if target values have dimension > 1); "link.inv" - a function (the inverse link function for activating the output layer); "costfun" - a function with parameters 'Y'and 'Y_hat' representing the cost function to be minimised; "gradfun" - a function with parameters 'Y'and 'Y_hat' representing the gradient of the cost function with respect to the linear, pre-activation, matrix in the output layer.
`learning_rate`	Learning rate to use.
`num_epochs`	Number of epochs (complete pass through training data) to be performed. If using mini-batches the number of iterations may be much higher.
`hidden_layer_dims`	Integer vector representing the dimensions of the hidden layers. Should not be specified if starting from a checkpoint model.
`hidden_layer_activations`	Character vector the same length as the `hidden_layer_dims` vector or length 1. If length is 1 the same activation function will be used for all hidden layers. Should only contain "relu" or "tanh" as these are the only supported activation functions for hidden layers. Should not be specified if starting from a checkpoint model.
`weight`	An optional vector of weights the same length as the number of rows of X or Y.
`offset`	A matrix with the same dimensions of Y to be used as an offset model. The offset needs to be in linear space as the offset is applied before the activation function.
`optimiser`	Type of optimiser to use. One of "GradientDescent", "Momentum", "RMSProp", "Adam".
`keep_prob`	Keep probabilities for applying drop-out in hidden layers. Either a constant or a vector the same length as the `hidden_layer_dims` vector. If NULL no drop-out is applied.
`input_keep_prob`	Keep probabilities for applying drop-out in the input layer. Needs to be a single constant. If NULL no drop-out is applied.
`tweediePower`	Tweedie power parameter. Only applicable in Tweedie regression. Should be a number between 1 and 2.
`alpha`	L1 regularisation term.
`lambda`	L2 regularisation term.
`mini_batch_size`	Size of mini-batches to use. If NULL full training set is used for each iteration.
`dev_set`	Integer vector representing hold-out data. Integers refer to individual training examples in the order presented in X.
`beta1`	Exponential weighting term for gradients when using Momentum or Adam optimisation.
`beta2`	Exponential weighting term for square pf gradients when using RMSProp or Adam optimisation.
`epsilon`	Small number used for numerical stability to prevent division by zero when using RMSProp or Adam optimisation.
`initialisation_constant`	Weights are initialised randomly to have variance of `k / n` where `k` is the `initialisation_constant` and `n` is the dimension of the previous layer. Recommended to use the default of 2 if using relu activations and change to 1 for tanh, although it can be tuned for any specific learning task.
`print_every_n`	Print info to the log every n epochs. If NULL, no printing is done.
`seed`	Random seed to use for repeatability.
`plot`	Plot cost function when printing to log.
`checkpoint`	Rather than initialise new parameters, start from a checkpoint model.
`keep`	keep X and Y data in final output.