mod_error: Mean square prediction error

Description Usage Arguments Details Examples

Description

Compares model predictions to the actual value of the response variable. To do this, testing data must be provided with both the input variables and the corresponding response variable. The measure calculated for a quantitative response variable is the mean square prediction error (MSPE). For categorical response variables, an analog of MSPE can be calculated (see details) but by default, a mean log-likelihood (mean per case) is computed instead.

Usage

1
2
mod_error(model, testdata, error_type = c("default", "mse", "sse", "mad",
  "LL", "mLL", "dev", "class_error"))

Arguments

model

The model whose prediction error is to be estimated.

testdata

A data frame giving both model inputs and the actual value of the response variable. If no testing data is provided, the training data will be used and a warning issued.

error_type

The measure of error you are interested in. By default, this is mean-square error for regression models and log-likelihood for classifiers. The choices are:

  • "mse" – mean square error

  • "sse" – sum of square errors

  • "mad" – mean absolute deviation

  • "LL" – log-likelihood

  • "mLL" – mean log-likehood (per case in the testing data)

  • "dev" – deviance. (Plus a constant, which is often zero. The constant is fixed for a given testing data set, regardless of the model. So differences between deviances of two models are correct.)

  • "class_error" – classification error rate.

Details

When the response variable is categorical, the model (called a 'classifier' in such situations) must be capable of computing probabilities for each output rather than just a bare category. This is true for many commonly encountered classifier model architectures.

The analog of the mean squared error for classifiers is the mean of (1-p)^2, where p is the probability assigned by the model to the actual output. This is a rough approximation to the log-likelihood. By default, the log-likelihood will be calculated, but for pedagogical reasons you may prefer (1-p)^2, in which case set error_type = "mse". Classifiers can assign a probability of zero to the actual output, in which case the log-likelihood is -Inf. The "mse" error type avoids this.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mod <- lm(mpg ~ hp + wt, data = mtcars)
mod_error(mod) # In-sample prediction error.
## Not run: 
classifier <- rpart::rpart(Species ~ ., data = iris)
mod_error(classifier)
mod_error(classifier, error_type = "LL") 
# More typically
inds <- sample(1:nrow(iris), size = 100)
Training <- iris[inds, ]
Testing  <- iris[ - inds, ]
classifier <- rpart::rpart(Species ~ ., data = Training)
# This may well assign zero probability to events that appeared in the
# Testing data 
mod_error(classifier, testdata = Testing)
mod_error(classifier, testdata = Testing, error_type = "mse")

## End(Not run)

Example output

Loading required package: mosaicCore
Loading required package: splines
Loading required package: dplyr

Attaching package:dplyrThe following objects are masked frompackage:mosaicCore:

    count, tally

The following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union


Attaching package:mosaicModelThe following objects are masked frompackage:mosaicCore:

    ci.mean, ci.median, ci.sd, coverage

mse 
 NA 
Warning messages:
1: In mod_error(mod) : Calculating error from training data.
2: In mean.default((actual - model_output)^2, na.rm = TRUE) :
  argument is not numeric or logical: returning NA
       LL 
-21.47645 
Warning message:
In mod_error(classifier) : Calculating error from training data.
       LL 
-21.47645 
Warning message:
In mod_error(classifier, error_type = "LL") :
  Calculating error from training data.
       LL 
-9.886727 
       mse 
0.05465206 

mosaicModel documentation built on May 2, 2019, 7:59 a.m.