pipe_keras_timeseries: Neural network model with keras
In jmaspons/MLTools: Pipes and tools for machine learning

View source: R/pipe_keras_timeseries.r

pipe_keras_timeseries

R Documentation

Neural network model with keras

Description

Neural network model with keras

Usage

pipe_keras_timeseries(
  df,
  predInput = NULL,
  responseVars = 1,
  caseClass = NULL,
  idVars = character(),
  weight = "class",
  timevar = NULL,
  responseTime = "LAST",
  regex_time = ".+",
  staticVars = NULL,
  crossValStrategy = c("Kfold", "bootstrap"),
  k = 5,
  replicates = 10,
  crossValRatio = c(train = 0.6, test = 0.2, validate = 0.2),
  hidden_shape.RNN = c(32, 32),
  hidden_shape.static = c(32, 32),
  hidden_shape.main = 32,
  epochs = 500,
  maskNA = NULL,
  batch_size = "all",
  repVi = 5,
  perm_dim = 2:3,
  comb_dims = FALSE,
  summarizePred = TRUE,
  scaleDataset = FALSE,
  NNmodel = FALSE,
  DALEXexplainer = FALSE,
  variableResponse = FALSE,
  save_validateset = FALSE,
  baseFilenameNN = NULL,
  filenameRasterPred = NULL,
  tempdirRaster = NULL,
  nCoresRaster = parallel::detectCores()%/%2,
  verbose = 0,
  ...
)

Arguments

`df`	a `data.frame` with the data in a long format (time variable in the `timevar` column).
`predInput`	a `data.frame` with the input variables to make predictions. The columns names must match the names of `df` columns.
`responseVars`	response variables as column names or indexes on `df` in wide format (eg. respVar_time).
`caseClass`	class of the samples used to weight cases. Column names or indexes on `df`, or a vector with the class for each rows in `df`.
`idVars`	id column names or indexes on `df`. Should be a unique identifier for a row in wide format, otherwise, values will be averaged.
`weight`	Optional array of the same length as `nrow(df)`, containing weights to apply to the model's loss for each sample.
`timevar`	column name of the variable containing the time.
`responseTime`	a `timevar` value used as a response var for `responseVars` or the default "LAST" for the last timestep available (`max(df[, timevar])`).
`regex_time`	regular expression matching the `timevar` values format.
`staticVars`	predictor variables as column names or indexes on `df` indicating fixed vars that don't change over time.
`crossValStrategy`	`Kfold` or `bootstrap`.
`k`	number of data partitions when `crossValStrategy="Kfold"`.
`replicates`	number of replicates for `crossValStrategy="bootstrap"` and `crossValStrategy="Kfold"` (`replicates * k-1`, 1 fold for validation).
`crossValRatio`	Proportion of the dataset used to train, test and validate the model when `crossValStrategy="bootstrap"`. Default to `c(train=0.6, test=0.2, validate=0.2)`. If there is only one value, will be taken as a train proportion and the test set will be used for validation.
`hidden_shape.RNN`	number of neurons in the hidden layers of the Recursive Neural Network model (time series data). Can be a vector with values for each hidden layer.
`hidden_shape.static`	number of neurons in the hidden layers of the densely connected neural network model (static data). Can be a vector with values for each hidden layer.
`hidden_shape.main`	number of neurons in the hidden layers of the densely connected neural network model connecting static and time series data. Can be a vector with values for each hidden layer.
`epochs`	parameter for `keras::fit()`.
`maskNA`	value to assign to `NA`s after scaling and passed to `keras::layer_masking()`.
`batch_size`	for fit and predict functions. The bigger the better if it fits your available memory. Integer or "all".
`repVi`	replicates of the permutations to calculate the importance of the variables. 0 to avoid calculating variable importance.
`perm_dim`	dimension to perform the permutations to calculate the importance of the variables (data dimensions [case, time, variable]). If `perm_dim = 2:3`, it calculates the importance for each combination of the 2nd and 3rd dimensions.
`comb_dims`	variable importance calculations, if `TRUE`, do the permutations for each combination of the levels of the variables from 2nd and 3rd dimensions for input data with 3 dimensions. By default `FALSE`.
`summarizePred`	if `TRUE`, return the mean, sd and se of the predictors. if `FALSE`, return the predictions for each replicate.
`scaleDataset`	if `TRUE`, scale the whole dataset only once instead of the train set at each replicate. Optimize processing time for predictions with large rasters.
`NNmodel`	if `TRUE`, return the serialized model with the result.
`DALEXexplainer`	if `TRUE`, return a explainer for the models from `DALEX::explain()` function. It doesn't work with multisession future plans.
`variableResponse`	if `TRUE`, return aggregated_profiles_explainer object from `ingredients::partial_dependency()` and the coefficients of the adjusted linear model.
`save_validateset`	save the validateset (independent data not used for training).
`baseFilenameNN`	if no missing, save the NN in hdf5 format on this path with iteration appended.
`filenameRasterPred`	if no missing, save the predictions in a RasterBrick to this file.
`tempdirRaster`	path to a directory to save temporal raster files.
`nCoresRaster`	number of cores used for parallelized raster cores. Use half of the available cores by default.
`verbose`	If > 0, print state and passed to keras functions
`...`	extra parameters for `future.apply::future_replicate()` and `ingredients::feature_importance()`.