resids: Surrogate Residuals

Description Usage Arguments Value Note References Examples

Description

Surrogate-based residuals for cumulative link and general regression models.

Usage

1
2
3
4
5
resids(object, ...)

## Default S3 method:
resids(object, method = c("latent", "jitter"),
  jitter.scale = c("probability", "response"), nsim = 1L, ...)

Arguments

object

An object of class clm, glm, lrm, orm, polr, vgam (jittering only), or vglm.

...

Additional optional arguments. (Currently ignored.)

method

Character string specifying the type of surrogate to use; for details, see Liu and Zhang (2017). Can be one of "latent" or "jitter".

jitter.scale

Character string specifying the scale on which to perform the jittering. Should be one of "probability" or "response". (Currently ignored for cumulative link models.)

nsim

Integer specifying the number of bootstrap replicates to use. Default is 1L meaning no bootstrap samples.

Value

A numeric vector of class c("numeric", "resid") containing the residuals. Additionally, if nsim > 1, then the result will contain the attributes:

boot.reps

A matrix with nsim columns, one for each bootstrap replicate of the residuals. Note, these are random and do not correspond to the original ordering of the data;

boot.id

A matrix with nsim columns. Each column contains the observation number each residual corresponds to in boot.reps. (This is used for plotting purposes.)

Note

Surrogate residuals require sampling from a continuous distribution; consequently, the result will be different with every call to resids. The internal functions used for sampling from truncated distributions when method = "latent" are based on modified versions of rtrunc and qtrunc.

References

Liu, Dungang and Zhang, Heping. Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach. Journal of the American Statistical Association (accepted). URL http://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1292915?journalCode=uasa20

Nadarajah, Saralees and Kotz, Samuel. R Programs for Truncated Distributions. Journal of Statistical Software, Code Snippet, 16(2), 1-8, 2006. URL https://www.jstatsoft.org/v016/c02.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
#
# Residuals for binary GLMs using the jittering method
#

# Load the MASS package (for the polr function)
library(MASS)

# Simulated probit data with quadratic trend
data(df1)

# Fit logistic regression models (with and without quadratic trend)
fit1 <- polr(y ~ x + I(x ^ 2), data = df1, method = "probit")
fit2 <- polr(y ~ x, data = df1, method = "probit")

# Construct residuals
set.seed(102)  # for reproducibility
res1 <- resids(fit1)
res2 <- resids(fit2)

# Residual-vs-covariate plots
par(mfrow = c(1, 2))
scatter.smooth(df1$x, res1, lpars = list(lwd = 2, col = "red"),
               xlab = expression(x), ylab = "Surrogate residual",
               main = "Correct model")
scatter.smooth(df1$x, res2, lpars = list(lwd = 2, col = "red"),
               xlab = expression(x), ylab = "Surrogate residual",
               main = "Incorrect model")

## Not run: 
#
# Residuals for cumulative link models using the latent method
#

# Load required packages
library(ggplot2)  # for autoplot function
library(MASS)     # for polr function
library(ordinal)  # for clm function

#
# Detecting a misspecified mean structure
#

# Data simulated from a probit model with a quadratic trend
data(df1)
?df1

# Fit a probit model with/without a quadratic trend
fit.bad <- polr(y ~ x, data = df1, method = "probit")
fit.good <- polr(y ~ x + I(x ^ 2), data = df1, method = "probit")

# Some residual plots
p1 <- autoplot(fit.bad, what = "covariate", x = df1$x)
p2 <- autoplot(fit.bad, what = "qq")
p3 <- autoplot(fit.good, what = "covariate", x = df1$x)
p4 <- autoplot(fit.good, what = "qq")

# Display all four plots together (top row corresponds to bad model)
grid.arrange(p1, p2, p3, p4, ncol = 2)

#
# Detecting heteroscedasticity
#

# Data simulated from a probit model with heteroscedasticity.
data(df2)
?df2

# Fit a probit model with/without a quadratic trend
fit <- polr(y ~ x, data = df2, method = "probit")

# Some residual plots
p1 <- autoplot(fit, what = "covariate", x = df1$x)
p2 <- autoplot(fit, what = "qq")
p3 <- autoplot(fit, what = "fitted")

# Display all three plots together
grid.arrange(p1, p2, p3, ncol = 3)

#
# Detecting a misspecified link function
#

# Data simulated from a log-log model with a quadratic trend.
data(df3)
?df3

# Fit models with correctly specified link function
clm.loglog <- clm(y ~ x + I(x ^ 2), data = df3, link = "loglog")
polr.loglog <- polr(y ~ x + I(x ^ 2), data = df3, method = "loglog")

# Fit models with misspecified link function
clm.probit <- clm(y ~ x + I(x ^ 2), data = df3, link = "probit")
polr.probit <- polr(y ~ x + I(x ^ 2), data = df3, method = "probit")

# Q-Q plots of the residuals (with bootstrapping)
p1 <- autoplot(clm.probit, nsim = 50, what = "qq") +
  ggtitle("clm: probit link")
p2 <- autoplot(clm.loglog, nsim = 50, what = "qq") +
  ggtitle("clm: log-log link (correct link function)")
p3 <- autoplot(polr.probit, nsim = 50, what = "qq") +
  ggtitle("polr: probit link")
p4 <- autoplot(polr.loglog, nsim = 50, what = "qq") +
  ggtitle("polr: log-log link (correct link function)")
grid.arrange(p1, p2, p3, p4, ncol = 2)

# We can also try various goodness-of-fit tests
par(mfrow = c(1, 2))
plot(gof(clm.probit, nsim = 50))
plot(gof(clm.loglog, nsim = 50))

## End(Not run)

sure documentation built on May 1, 2019, 10:19 p.m.