knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_chunk$set(echo = TRUE)

Before starting

Packages required and number notation.

library(modMetricsR)
library(epiR)
library(ggplot2) 
library(dplyr)
library(flextable)

options(scipen = 999) # normal notation for numbers

Package overview

Examples

Here examples of use for the functions included.

opmetrics

With opmetrics, which stands for "observed-predicted-metrics", this function can be run just by providing observed and predicted value from the model, as:

m <- lm(Petal.Length ~ Sepal.Length , iris)

opmetrics(iris$Petal.Length, predict(m))

Great right? But too many digits, we can just add an argument to s =, either a number or a standardize format (like s = "std1")

opmetrics(iris$Petal.Length, predict(m), s = "std1")

If you want observed against prediction just add plot = TRUE. Note that this is made with observed against predicted, as pointed out from [@pineiroHowEvaluateModels2008].

opmetrics(iris$Petal.Length, predict(m), plot = TRUE)

This object can be used as any ggplot, so adding graphics and changing the theme.

opmetrics(iris$Petal.Length, predict(m), plot = TRUE) +
  labs(x = "Predicted sepal length, cm", y = "Observed sepal length, cm") +
  theme_bw()

If you want to plot the residuals instead just add residuals = TRUE. Note residuals are plotted against predicting, as it should be [@st-pierreReassessmentBiasesPredicted2003].

opmetrics(iris$Petal.Length, predict(m), plot = TRUE, residuals = TRUE) +
  geom_hline(yintercept = 0) +
  labs(x = "Predicted sepal length, cm", y = "Residual sepal length, cm") +
  theme_bw()

Not a great model, maybe by adding species it could look better.

m <- lm(Petal.Length ~ Sepal.Length + Species, iris)

opmetrics(iris$Petal.Length, predict(m), plot = TRUE, residuals = TRUE) +
  geom_hline(yintercept = 0) +
  labs(x = "Predicted sepal length, cm", y = "Residual sepal length, cm") +
  theme_bw()

metricsloop

With metricsloop is possible to compare metrics fom different models beloning to the same model class, just by providing the models of interest, thanks to the loop built in the function.

mods <- c("Sepal.Length",
          "Sepal.Length+Species",
          "Sepal.Length*Species")

metricsloop(d = iris, var = "Petal.Length", MT = "lm", m = mods, PF = "predict", s = "std1")

And for better table visualization via flextable, of course is also possible to print it as csv/spreadsheet.

flextable(metricsloop(d = iris, var = "Petal.Length", MT = "lm", m = mods, PF = "predict", s = "std1")
) %>% autofit()

metricsloopmix

With metricsloopmix is possible to compare metrics fom different models beloning to the different model classes, just by providing the models of interest, thanks to the loop built in the function.

modclass <- c("lm", "glm")
modsid <- c("Sepal.Length", "Sepal.Length, family = Gamma")

modsdata <- data.frame(modclass, modsid)

metricsloopmix(d = iris, var = "Petal.Length", MO = modsdata, s = "std1")

Is also possible to use different predict functinos by setting PF = TRUE, and additing a third column the the MO data with the prediction function intended for each model.

modclass <- c("lm", "glm")
modsid <- c("Sepal.Length", "Sepal.Length, family = Gamma")
modpredict <- c("predict.lm", "predict.glm")
predictype <- c("response", "response")

modsdata <- data.frame(modclass, modsid, modpredict,predictype)

metricsloopmix(d = iris, var = "Petal.Length", MO = modsdata, PF = TRUE, s = "std1")

And for better table visualization via flextable.

flextable(metricsloopmix(d = iris, var = "Petal.Length", MO = modsdata, PF = TRUE, s = "std1")) %>% autofit()

Metrics included

Following a list of the metrics included. Where ${Y}_i$ is the observed values, $\hat{Y}_i$ is the predicted values from the model, $n$ is the number of observations, $\mu$ is the average of the observed values, and $\sigma$ is the standard deviation of the observed values.

Mean Square Error {-}

Can be used as such, or to calculate the proportion of the decompose error [@bibbyPredictionImprovedEstimation1977a].

$$ MSE = \frac{\sum_{i=1}^{n}(Y_i-\hat{Y}_i)^2 }{n} $$

Root Mean Square Error (RMSE) {-}

As absolute value and as proportion of the mean observed values. Main indicator of goodness of a model [@bibbyPredictionImprovedEstimation1977a].

$$ RMSE = \sqrt{MSE} \text{ or } \sqrt{MSE}/\mu$$

Mean Absolute Error (MAE) {-}

Simular to the RMSE but with a lower weight of the residuals further from the zero [@chaiRootMeanSquare2014]. As for the RMSE it can be divided by $\mu$ to have the MAE as a proportion of the mean observed values.

$$ MAE = \frac{\sum_{i=1}^n|Y_i-{\hat{Y}_i}|}{n} $$

Mean Bias {-}

Expressed as percentage of MSE, and calculated as squared bias.

$$ MB = \biggr[ \frac{\sum_{i=1}^{n}(Y_i-\hat{Y}_i)}{n} \biggr]^2 $$

Random Bias (DB) {-}

Expressed as percentage of MSE.

From the regression:

$$ Y_i - \hat{Y}_i = \hat{Y}_i + \epsilon_i $$

$$ DB = \frac{\sum_{i=1}^{n}\epsilon_i^2 }{n} $$

Slope Bias (SB) {-}

Expressed as percentage of MSE.

$$ SP = MSE - MB - DB $$

RMSE-observations standard deviation ratio (RSR) {-}

Calcualted as presented in @moriasid.n.ModelEvaluationGuidelines2007.

$$ RSR = RMSE/\sigma $$

Concordance Correlation Coefficient (CCC) {-}

Calculated according to @linConcordanceCorrelationCoefficient1989 using the R package @stevensonEpiRToolsAnalysis2023.

Fomulas not included but to consider {.hidden .unlisted}

Alternative mean bias formulation

$$ MB = \biggr[ \frac{\sum_{i=1}^{n}Y_i}{n}-\frac{\sum_{i=1}^{n}\hat{Y}_i)}{n} \biggr] ^2 $$

References



giuliogiagnoni/modRMSE documentation built on Oct. 3, 2023, 3:29 p.m.