gof: Compute goodness of fit measures

Description Usage Arguments Details Value References See Also Examples

View source: R/gof.R

Description

gof() is for general goodness of model fits, gof_continuous() for continuous data, gof_discrete() for discrete data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
gof(
  obs,
  pred,
  type = c("loglikelihood", "mse", "wmse", "rmse", "sse", "wsse", "mape", "mdape",
    "accuracy"),
  na.rm = FALSE,
  pdf = NULL,
  response = NULL,
  saturated = FALSE,
  binomial.coef = FALSE,
  ...,
  n = NULL
)

gof_continuous(obs, pred, type, ...)

gof_discrete(obs, pred, type, ...)

Arguments

obs

A numeric vector or matrix, the observed data. Can be continuous values or dicrete. Can be aggregated, and if so you must supply n (see below). The default assumes raw data.

pred

A numeric vector or matrix with predictions, in the same order as obs.

type

A string (default "loglikelihood") specifying the goodnes-of-fit or error measure, allowed are "sse" (sum of squared error), "mse" (mean squared error), "rmse" (root-mean squared error), "wmse" (weighted mean squared error), "mape" (mean absolute percentage error), and "mdape" (median absolute percentage error), code"accuracy" (percent of obs equal to pred, after applying an cr_argmax choice rule to probabilistic predictions of discrete data).

na.rm

(optional) Logical (default FALSE). TRUE removes all NA rows in pred or obs jointly (list-wise removal).

pdf

(optional) String, probability density function in log likelihood, allowed values see Loglikelihood()

response

(optional) String, the type of observed data, "discrete" or "continuous". Can be eabbreviated. Will be guessed as discrete if "obs" is a factor or character, and as continuous if "pred" is not in 0-1.

saturated

(optional) Logical (default FALSE) TRUE returns saturated log likelihood.

binomial.coef

(optional) Logical (default FALSE), TRUE adds the binomial coefficient to a binomial log likelihood.

...

more arguments to be passed on to the fitting functions, see e.g. loglikelihood()

n

(optional) Integer or integer vector (default: 1), number of observations underlying obs if obs, pred or both are aggregated: n=10 means each aggregate represents 10 data points, a vector n=c(10,20) means the first aggregate represents 10, the second 20 data points, etc.

Details

The observations can be discrete or continuous response data. If response = "discrete" then obs can be either a vector with the different choices, or a matrix with as many columns as there are choice options. Each column contains a 0 if not chosen and 1 for chosen. Predictions can be aggregated or for individual observations. Predictions are individual if n is not supplied: each pred predicts the corresponding row in obs or, if pred is a single value, this value predicts the mean of obs. Predictions are aggregated if n is supplied: Each value in pred predicts n observations. If obs and pred are equally long and n is supplied, then it is assumed that obs represent the mean predictions across n values.

Value

The goodness of fit.

References

Busemeyer, J. R., & Diederich, A. (2010). Nonlinear parameter estimation. In Cognitive Modeling (pp. 43–84). Thousand Oaks, CAL: SAGE Publications.

See Also

Other goodness of fit functions: APE(), Accuracy(), MAPE(), MDAPE(), MSE(), RMSE(), SSE()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
gof(c(.33, .66), c(1,0), "mse")
gof(c(.33, .66), c(1,0), "loglikelihood")
gof(c(.33, .66), c(1,0), "loglikelihood", options = list(response = "d"))
# Example from Busemeyer and Diederich (2010)
# Observed relative frequencies of binary choices
obs <- c(0.9538, 0.9107, 0.9204, 0.9029, 0.8515, 0.9197,
          0.7970, 0.8228, 0.8191, 0.7277, 0.7276)
# Predictions for each of the 11 conditions
pred <- c(.9526, .9168, .8721, .8229, .7736, .7277,
          .6871, .6523, .6232, .5993, .5798)


#
# GOF from aggregated data over 200 observations
# --------------------------------------------------------------------------
gof_discrete(obs, pred, "sse", n=200)   # SSE  (paper: 0.1695)
gof_discrete(obs, pred, "wsse", n=200)  # Weighted SSE (paper: 158.4059)
gof_discrete(obs, pred, "logl", n=200)  # Loglik. (paper: -969.9514 < 0.1% diff)
gof_discrete(obs, obs, "loglik", n=200) # Saturated LL (paper: -879.9013)
# 
# GOF from raw data
# --------------------------------------------------------------------------
# Recreate the raw data (observations 0 or 1)
n <- 200 # number of observations
obsraw  <- rep(rep(0:1, 11), round(c(t(cbind(1-obs, obs))) * n))
predraw <- rep(pred, each = n)
gof_discrete(obsraw, predraw, "acc")          # 85% Accuracy
gof_discrete(obsraw, predraw, "sse")          # SSE (not useful w/ raw data)
gof_discrete(obsraw, predraw, "log")          # Loglik (paper: -969.9514)
gof_discrete(obs, obsraw, "loglik", saturated=TRUE, n=200) # Saturated LL (paper: -879.9013)

# Example from Busemeyer and Diederich (2010)
# Observed relative frequencies of binary choices
obs <- c(0.9538, 0.9107, 0.9204, 0.9029, 0.8515, 0.9197,
          0.7970, 0.8228, 0.8191, 0.7277, 0.7276)
# Predictions for each of the 11 conditions
pred <- c(.9526, .9168, .8721, .8229, .7736, .7277,
          .6871, .6523, .6232, .5993, .5798)


#
# GOF from aggregated data over 200 observations
# --------------------------------------------------------------------------
gof_discrete(obs, pred, "sse", n=200)   # SSE  (paper: 0.1695)
gof_discrete(obs, pred, "wsse", n=200)  # Weighted SSE (paper: 158.4059)
gof_discrete(obs, pred, "logl", n=200)  # Loglik. (paper: -969.9514 < 0.1% diff)
gof_discrete(obs, obs, "loglik", n=200) # Saturated LL (paper: -879.9013)
# 
# GOF from raw data
# --------------------------------------------------------------------------
# Recreate the raw data (observations 0 or 1)
n <- 200 # number of observations
obsraw  <- rep(rep(0:1, 11), round(c(t(cbind(1-obs, obs))) * n))
predraw <- rep(pred, each = n)
gof_discrete(obsraw, predraw, "acc")          # 85% Accuracy
gof_discrete(obsraw, predraw, "sse")          # SSE (not useful w/ raw data)
gof_discrete(obsraw, predraw, "log")          # Loglik (paper: -969.9514)
gof_discrete(obs, obsraw, "loglik", saturated=TRUE, n=200) # Saturated LL (paper: -879.9013)

JanaJarecki/cognitiveutils documentation built on Sept. 9, 2020, 9:11 a.m.