knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
knitr::opts_chunk$set(echo = TRUE)
Packages required and number notation.
library(modMetricsR) library(epiR) library(ggplot2) library(dplyr) library(flextable) options(scipen = 999) # normal notation for numbers
Here examples of use for the functions included.
With opmetrics
, which stands for "observed-predicted-metrics", this function can be run just by providing observed and predicted value from the model, as:
m <- lm(Petal.Length ~ Sepal.Length , iris) opmetrics(iris$Petal.Length, predict(m))
Great right? But too many digits, we can just add an argument to s =
, either a number or a standardize format (like s = "std1"
)
opmetrics(iris$Petal.Length, predict(m), s = "std1")
If you want observed against prediction just add plot = TRUE
.
Note that this is made with observed against predicted, as pointed out from [@pineiroHowEvaluateModels2008].
opmetrics(iris$Petal.Length, predict(m), plot = TRUE)
This object can be used as any ggplot, so adding graphics and changing the theme.
opmetrics(iris$Petal.Length, predict(m), plot = TRUE) + labs(x = "Predicted sepal length, cm", y = "Observed sepal length, cm") + theme_bw()
If you want to plot the residuals instead just add residuals = TRUE
.
Note residuals are plotted against predicting, as it should be [@st-pierreReassessmentBiasesPredicted2003].
opmetrics(iris$Petal.Length, predict(m), plot = TRUE, residuals = TRUE) + geom_hline(yintercept = 0) + labs(x = "Predicted sepal length, cm", y = "Residual sepal length, cm") + theme_bw()
Not a great model, maybe by adding species it could look better.
m <- lm(Petal.Length ~ Sepal.Length + Species, iris) opmetrics(iris$Petal.Length, predict(m), plot = TRUE, residuals = TRUE) + geom_hline(yintercept = 0) + labs(x = "Predicted sepal length, cm", y = "Residual sepal length, cm") + theme_bw()
With metricsloop
is possible to compare metrics fom different models beloning to the same model class, just by providing the models of interest, thanks to the loop built in the function.
mods <- c("Sepal.Length", "Sepal.Length+Species", "Sepal.Length*Species") metricsloop(d = iris, var = "Petal.Length", MT = "lm", m = mods, PF = "predict", s = "std1")
And for better table visualization via flextable, of course is also possible to print it as csv/spreadsheet.
flextable(metricsloop(d = iris, var = "Petal.Length", MT = "lm", m = mods, PF = "predict", s = "std1") ) %>% autofit()
With metricsloopmix
is possible to compare metrics fom different models beloning to the different model classes, just by providing the models of interest, thanks to the loop built in the function.
modclass <- c("lm", "glm") modsid <- c("Sepal.Length", "Sepal.Length, family = Gamma") modsdata <- data.frame(modclass, modsid) metricsloopmix(d = iris, var = "Petal.Length", MO = modsdata, s = "std1")
Is also possible to use different predict functinos
by setting PF = TRUE
, and additing a third column the the MO
data with the prediction function intended for each model.
modclass <- c("lm", "glm") modsid <- c("Sepal.Length", "Sepal.Length, family = Gamma") modpredict <- c("predict.lm", "predict.glm") predictype <- c("response", "response") modsdata <- data.frame(modclass, modsid, modpredict,predictype) metricsloopmix(d = iris, var = "Petal.Length", MO = modsdata, PF = TRUE, s = "std1")
And for better table visualization via flextable.
flextable(metricsloopmix(d = iris, var = "Petal.Length", MO = modsdata, PF = TRUE, s = "std1")) %>% autofit()
Following a list of the metrics included. Where ${Y}_i$ is the observed values, $\hat{Y}_i$ is the predicted values from the model, $n$ is the number of observations, $\mu$ is the average of the observed values, and $\sigma$ is the standard deviation of the observed values.
Can be used as such, or to calculate the proportion of the decompose error [@bibbyPredictionImprovedEstimation1977a].
$$ MSE = \frac{\sum_{i=1}^{n}(Y_i-\hat{Y}_i)^2 }{n} $$
As absolute value and as proportion of the mean observed values. Main indicator of goodness of a model [@bibbyPredictionImprovedEstimation1977a].
$$ RMSE = \sqrt{MSE} \text{ or } \sqrt{MSE}/\mu$$
Simular to the RMSE but with a lower weight of the residuals further from the zero [@chaiRootMeanSquare2014]. As for the RMSE it can be divided by $\mu$ to have the MAE as a proportion of the mean observed values.
$$ MAE = \frac{\sum_{i=1}^n|Y_i-{\hat{Y}_i}|}{n} $$
Expressed as percentage of MSE, and calculated as squared bias.
$$ MB = \biggr[ \frac{\sum_{i=1}^{n}(Y_i-\hat{Y}_i)}{n} \biggr]^2 $$
Expressed as percentage of MSE.
From the regression:
$$ Y_i - \hat{Y}_i = \hat{Y}_i + \epsilon_i $$
$$ DB = \frac{\sum_{i=1}^{n}\epsilon_i^2 }{n} $$
Expressed as percentage of MSE.
$$ SP = MSE - MB - DB $$
Calcualted as presented in @moriasid.n.ModelEvaluationGuidelines2007.
$$ RSR = RMSE/\sigma $$
Calculated according to @linConcordanceCorrelationCoefficient1989 using the R package @stevensonEpiRToolsAnalysis2023.
$$ MB = \biggr[ \frac{\sum_{i=1}^{n}Y_i}{n}-\frac{\sum_{i=1}^{n}\hat{Y}_i)}{n} \biggr] ^2 $$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.