View source: R/baseline_wrappers.R
baseline_gaussian | R Documentation |
Create a baseline evaluation of a test set.
In modelling, a baseline is a result that is meaningful to compare the results from our models to. In regression, we want our model to be better than a model without any predictors. If our model does not perform better than such a simple model, it's unlikely to be useful.
baseline_gaussian()
fits the intercept-only model (y ~ 1
) on `n`
random
subsets of `train_data`
and evaluates each model on `test_data`
. Additionally, it evaluates a
model fitted on all rows in `train_data`
.
baseline_gaussian(
test_data,
train_data,
dependent_col,
n = 100,
metrics = list(),
random_effects = NULL,
min_training_rows = 5,
min_training_rows_left_out = 3,
REML = FALSE,
parallel = FALSE
)
test_data |
|
train_data |
|
dependent_col |
Name of dependent variable in the supplied test and training sets. |
n |
The number of random samplings of |
metrics |
E.g. You can enable/disable all metrics at once by including
The Also accepts the string |
random_effects |
Random effects structure for the baseline model. (Character) E.g. with |
min_training_rows |
Minimum number of rows in the random subsets of |
min_training_rows_left_out |
Minimum number of rows left out of the random subsets of I.e. a subset will maximally have the size:
|
REML |
Whether to use Restricted Maximum Likelihood. (Logical) |
parallel |
Whether to run the Remember to register a parallel backend first.
E.g. with |
Packages used:
stats::lm
, lme4::lmer
r2m : MuMIn::r.squaredGLMM
r2c : MuMIn::r.squaredGLMM
AIC : stats::AIC
AICc : MuMIn::AICc
BIC : stats::BIC
list
containing:
a tibble
with summarized results (called summarized_metrics
)
a tibble
with random evaluations (random_evaluations
)
....................................................................
The Summarized Results tibble
contains:
Average RMSE
, MAE
, NRMSE(IQR)
,
RRSE
, RAE
, RMSLE
.
See the additional metrics (disabled by default) at ?gaussian_metrics
.
The Measure column indicates the statistical descriptor used on the evaluations.
The row where Measure == All_rows
is the evaluation when the baseline model
is trained on all rows in `train_data`
.
The Training Rows column contains the aggregated number of rows used from `train_data`
,
when fitting the baseline models.
....................................................................
The Random Evaluations tibble
contains:
The non-aggregated metrics.
A nested tibble
with the predictions and targets.
A nested tibble
with the coefficients of the baseline models.
Number of training rows used when fitting the baseline model on the training set.
A nested Process information object with information about the evaluation.
Name of dependent variable.
Name of fixed effect (bias term only).
Random effects structure (if specified).
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
Other baseline functions:
baseline()
,
baseline_binomial()
,
baseline_multinomial()
# Attach packages
library(cvms)
library(groupdata2) # partition()
library(dplyr) # %>% arrange()
# Data is part of cvms
data <- participant.scores
# Set seed for reproducibility
set.seed(1)
# Partition data
partitions <- partition(data, p = 0.7, list_out = TRUE)
train_set <- partitions[[1]]
test_set <- partitions[[2]]
# Create baseline evaluations
# Note: usually n=100 is a good setting
baseline_gaussian(
test_data = test_set,
train_data = train_set,
dependent_col = "score",
random_effects = "(1|session)",
n = 2
)
# Parallelize evaluations
# Attach doParallel and register four cores
# Uncomment:
# library(doParallel)
# registerDoParallel(4)
# Make sure to uncomment the parallel argument
baseline_gaussian(
test_data = test_set,
train_data = train_set,
dependent_col = "score",
random_effects = "(1|session)",
n = 4
#, parallel = TRUE # Uncomment
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.