cqcheck: Visually checking a fitted quantile model

Description Usage Arguments Details Value Author(s) Examples

View source: R/cqcheck.R

Description

Given an additive quantile model, fitted using qgam, cqcheck provides some plots that allow to check what proportion of responses, y, falls below the fitted quantile.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cqcheck(
  obj,
  v,
  X = NULL,
  y = NULL,
  nbin = c(10, 10),
  bound = NULL,
  lev = 0.05,
  scatter = FALSE,
  ...
)

Arguments

obj

the output of a qgam call.

v

if a 1D plot is required, v should be either a single character or a numeric vector. In the first case v should be the names of one of the variables in the dataframe X. In the second case, the length of v should be equal to the number of rows of X. If a 2D plot is required, v should be either a vector of two characters or a matrix with two columns.

X

a dataframe containing the data used to obtain the conditional quantiles. By default it is NULL, in which case predictions are made using the model matrix in obj$model.

y

vector of responses. Its i-th entry corresponds to the i-th row of X. By default it is NULL, in which case it is internally set to obj$y.

nbin

a vector of integers of length one (1D case) or two (2D case) indicating the number of bins to be used in each direction. Used only if bound==NULL.

bound

in the 1D case it is a numeric vector whose increasing entries represent the bounds of each bin. In the 2D case a list of two vectors should be provided. NULL by default.

lev

the significance levels used in the plots, this determines the width of the confidence intervals. Default is 0.05.

scatter

if TRUE a scatterplot is added (using the points function). FALSE by default.

...

extra graphical parameters to be passed to plot().

Details

Having fitted an additive model for, say, quantile qu=0.4 one would expect that about 40 responses fall below the fitted quantile. This function allows to visually compare the empirical number of responses (qu_hat) falling below the fit with its theoretical value (qu). In particular, the responses are binned, which the bins being constructed along one or two variables (given be arguments v). Let (qu_hat[i]) be the proportion of responses below the fitted quantile in the ith bin. This should be approximately equal to qu, for every i. In the 1D case, when v is a single character or a numeric vector, cqcheck provides a plot where: the horizontal line is qu, the dots correspond to qu_hat[i] and the grey lines are confidence intervals for qu. The confidence intervals are based on qbinom(lev/2, siz, qu), if the dots fall outside them, then qu_hat[i] might be deviating too much from qu. In the 2D case, when v is a vector of two characters or a matrix with two columns, we plot a grid of bins. The responses are divided between the bins as before, but now don't plot the confidence intervals. Instead we report the empirical proportions qu_hat[i] for the non-empty bin, and with colour the bins in red if qu_hat[i]<qu and in green otherwise. If qu_hat[i] falls outside the confidence intervals we put an * next to the numeric qu_hat[i] and we use more intense colours.

Value

Simply produces a plot.

Author(s)

Matteo Fasiolo <matteo.fasiolo@gmail.com>.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#######
# Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1)
#######
## Not run: 
library(qgam)
set.seed(15560)
n <- 500
x <- rnorm(n, 0, 1); z <- rnorm(n)
X <- cbind(1, x, x^2, z, x*z)
beta <- c(0, 1, 1, 1, 0.5)
y <- drop(X %*% beta) + rnorm(n) 
dataf <- data.frame(cbind(y, x, z))
names(dataf) <- c("y", "x", "z")

#### Fit a constant model for median
qu <- 0.5
fit <- qgam(y~1, qu = qu, data = dataf)

# Look at what happens along x: clearly there is non linear pattern here
cqcheck(obj = fit, v = c("x"), X = dataf, y = y) 

#### Add a smooth for x
fit <- qgam(y~s(x), qu = qu, data = dataf)
cqcheck(obj = fit, v = c("x"), X = dataf, y = y) # Better!

# Lets look across x and z. As we move along z (x2 in the plot) 
# the colour changes from green to red
cqcheck(obj = fit, v = c("x", "z"), X = dataf, y = y, nbin = c(5, 5))

# The effect look pretty linear
cqcheck(obj = fit, v = c("z"), X = dataf, y = y, nbin = c(10))

#### Lets add a linear effect for z 
fit <- qgam(y~s(x)+z, qu = qu, data = dataf)

# Looks better!
cqcheck(obj = fit, v = c("z"))

# Lets look across x and y again: green prevails on the top-left to bottom-right
# diagonal, while the other diagonal is mainly red.
cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5))

### Maybe adding an interaction would help?
fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf)

# It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1)
cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5))

## End(Not run)

qgam documentation built on Nov. 23, 2021, 1:07 a.m.