Description Usage Arguments Details Examples
Compares model predictions to the actual value of the response variable. To do this, testing data must be provided with both the input variables and the corresponding response variable. The measure calculated for a quantitative response variable is the mean square prediction error (MSPE). For categorical response variables, an analog of MSPE can be calculated (see details) but by default, a mean log-likelihood (mean per case) is computed instead.
1 2 |
model |
The model whose prediction error is to be estimated. |
testdata |
A data frame giving both model inputs and the actual value of the response variable. If no testing data is provided, the training data will be used and a warning issued. |
error_type |
The measure of error you are interested in. By default, this is mean-square error for regression models and log-likelihood for classifiers. The choices are:
|
When the response variable is categorical, the model (called a 'classifier' in such situations) must be capable of computing probabilities for each output rather than just a bare category. This is true for many commonly encountered classifier model architectures.
The analog of the mean squared error for classifiers is the mean of (1-p)^2, where p is the
probability assigned by the model to the actual output. This is a rough approximation
to the log-likelihood. By default, the log-likelihood will be calculated, but for pedagogical
reasons you may prefer (1-p)^2, in which case set error_type = "mse"
. Classifiers can assign a probability
of zero to the actual output, in which case the log-likelihood is -Inf
. The "mse"
error type avoids this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | mod <- lm(mpg ~ hp + wt, data = mtcars)
mod_error(mod) # In-sample prediction error.
## Not run:
classifier <- rpart::rpart(Species ~ ., data = iris)
mod_error(classifier)
mod_error(classifier, error_type = "LL")
# More typically
inds <- sample(1:nrow(iris), size = 100)
Training <- iris[inds, ]
Testing <- iris[ - inds, ]
classifier <- rpart::rpart(Species ~ ., data = Training)
# This may well assign zero probability to events that appeared in the
# Testing data
mod_error(classifier, testdata = Testing)
mod_error(classifier, testdata = Testing, error_type = "mse")
## End(Not run)
|
Loading required package: mosaicCore
Loading required package: splines
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:mosaicCore’:
count, tally
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Attaching package: ‘mosaicModel’
The following objects are masked from ‘package:mosaicCore’:
ci.mean, ci.median, ci.sd, coverage
mse
NA
Warning messages:
1: In mod_error(mod) : Calculating error from training data.
2: In mean.default((actual - model_output)^2, na.rm = TRUE) :
argument is not numeric or logical: returning NA
LL
-21.47645
Warning message:
In mod_error(classifier) : Calculating error from training data.
LL
-21.47645
Warning message:
In mod_error(classifier, error_type = "LL") :
Calculating error from training data.
LL
-9.886727
mse
0.05465206
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.