layer: Layers and Loss Functions (RNN)
In multiRL: Reinforcement Learning Tools for Multi-Armed Bandit

layer

R Documentation

Layers and Loss Functions (RNN)

Description

Current supported recurrent layer types and available loss functions in the package.

Recurrent Layer

"RNN" (Simple Recurrent Neural Network) and "BiRNN": A fully-connected RNN where the output from the previous time step is fed back to the next time step. It's the most basic type of recurrent layer but can struggle with long-term dependencies due to the vanishing gradient problem.
"GRU" (Gated Recurrent Unit) and "BiGRU": A modern recurrent unit that uses gating mechanisms to control information flow, enabling it to capture long-range dependencies. GRUs are generally simpler and computationally faster than LSTMs while often achieving comparable performance.
"LSTM" (Long Short-Term Memory) and "BiLSTM" : A powerful recurrent unit with dedicated memory cells and gating mechanisms (input, forget, output). LSTMs excel at learning long-term dependencies and are robust against the vanishing gradient problem, making them ideal for very long sequences.

Loss Function

The loss function defines the objective that the model minimizes during training. The choice of loss function is critical as it determines what aspect of the prediction the model prioritizes.

"MSE" (Mean Squared Error): Calculates the average of the squared differences between predicted and true parameter values. By squaring the error, it heavily penalizes large mistakes. It is the standard choice for regression and implicitly assumes that the errors are normally distributed. However, its sensitivity to outliers can sometimes be a drawback.
"MAE" (Mean Absolute Error): Calculates the average of the absolute differences between predicted and true values. It treats all errors equally on a linear scale, making it more robust to outliers than MSE. It is a good choice when the dataset contains anomalies that should not dominate the training process.
"HBR" (Huber Loss): A hybrid loss function that combines the best properties of MSE and MAE. It behaves like MSE for small errors, providing a smooth and stable gradient, but switches to behaving like MAE for large errors. This makes it less sensitive to outliers than MSE while remaining differentiable at zero.
"NLL" (Negative Log-Likelihood): This loss is used for probabilistic regression. Instead of predicting a single value for each parameter, the network predicts the parameters of a probability distribution (here, a Gaussian: its mean mu and variance sigma^2). The loss is the negative log-likelihood of the true parameters under the predicted distribution. This allows the model to learn and express its own uncertainty about its predictions.
"QRL" (Quantile Regression Loss): Allows the model to estimate specific quantiles of the parameter distribution, rather than just its mean. This package's implementation predicts the 5th, 50th (median), and 95th percentiles. It uses a "pinball loss" function that is asymmetric, guiding the model to the desired quantile. It is useful for understanding the full range of parameter uncertainty and is naturally robust to outliers.
"MDN" (Mixture Density Network): The most flexible but complex option. An MDN learns to predict the parameters of a mixture of distributions (e.g., a mix of multiple Gaussians). This allows it to model highly complex, multi-modal (multiple peaks), or skewed posterior distributions. The network outputs the means, variances, and mixing weights for each component in the mixture.

Example

 # supported recurrent layer and loss function
 control = list(
   layer = c("RNN", "GRU", "LSTM", "BiRNN", "BiGRU", "BiLSTM"),
   loss = c("MSE", "MAE", "HBR", "NLL", "QRL", "MDN")
 )

multiRL documentation built on June 9, 2026, 5:09 p.m.