Quantile Regression Transformation Models

Share:

Description

These functions are used to fit quantile regression transformation models.

Usage

1
2
3
4
5
6
7
8
9
tsrq(formula, tsf = "mcjI", symm = TRUE, dbounded = FALSE, lambda = NULL,
	conditional = FALSE, tau = 0.5, data, subset, weights, na.action,
	method = "fn", ...)
tsrq2(formula, dbounded = FALSE, lambda = NULL, delta = NULL, conditional = FALSE,
	tau = 0.5, data, subset, weights, na.action, method = "fn", ...)
rcrq(formula, tsf = "mcjI", symm = TRUE, dbounded = FALSE, lambda = NULL,
	tau = 0.5, data, subset, weights, na.action, method = "fn", ...)
nlrq2(formula, par = NULL, dbounded = FALSE, tau = 0.5, data,
	subset, weights, na.action, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

tsf

transformation to be used. Possible options are mcjI for Proposal I transformation models (default), bc for Box-Cox and ao for Aranda-Ordaz transformation models.

symm

logical flag. If TRUE (default) a symmetric transformation is used.

dbounded

logical flag. If TRUE the response is assumed to be doubly bounded on [a,b]. If FALSE (default) the response is assumed to be singly bounded (ie, strictly positive).

lambda, delta

values of transformation parameters for grid search.

conditional

logical flag. If TRUE, the transformation parameter is assumed to be known and this must be provided via the arguments lambda, delta in vectors of the same length as tau.

par

vector of length p + 2 of initial values for the parameters to be optimized over. The first p values are for the regression coefficients while the last 2 are for the transformation parameters lambda and delta in mcjII. These initial values are passed to optim.

tau

the quantile(s) to be estimated. See rq.

data

a data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector.

na.action

a function which indicates what should happen when the data contain NAs.

method

fitting algorithm for rq (default is Frisch-Newton interior point method "fn").

...

other arguments in summary.rq).

Details

These functions implement quantile regression transformation models as discussed by Geraci and Jones (see references). The general model is assumed to be Q(h(response)) = xb, where Q is the conditional quantile function and h is a one- or two-parameter transformation. A typical model specified in formula has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for the quantile of the transformed response. The response, which is singly or doubly bounded, i.e. response > 0 or 0 <= response <= 1 respectively, undergoes the transformation specified in tsf. If the response is bounded in the generic [a,b] interval, the latter is automatically mapped to [0,1] and no further action is required. If, however, the response is singly bounded and contains negative values, it is left to the user to offset the response or the code will produce an error. The functions tsrq and tsrq2 use a two-stage (TS) estimator (Fitzenberger et al, 2010) for, respectively, one- and two-parameter transformations. The function rcrq (one-parameter tranformations) is based on the residual cusum process estimator proposed by Mu and He (2007), while the function nlrq2 (two-parameter tranformations) is based on Nelder-Mead optimization (Geraci and Jones).

Value

tsrq, tsrq2, rcrq, nlrq2 return an object of class rqt. This is a list that contains as typical components:

the first nt = length(tau) elements of the list store the results from fitting linear quantile models on the tranformed scale of the response.

call

the matched call.

method

the fitting algorithm for rq or optim.

y

the response – untransformed scale.

theta

if dbounded = TRUE, the response mapped to the unit interval.

x

the model matrix.

weights

the weights used in the fitting process (a vector of 1's if weights is missing or NULL).

tau

the order of the estimated quantile(s).

lambda

the estimated parameter lambda.

eta

the estimated parameters lambda and delta in the two-parameter Proposal II tranformation.

lambda.grid

grid of lambda values used for estimation.

delta.grid

grid of delta values used for estimation.

tsf

tranformation used (see also attributes(tsf)).

objective

values of the objective function minimised over the tranformation parameter(s). This is an array of dimension c(nl,nt) or c(nl,nd,nt), where nl = length(lambda.grid), nd = length(delta.grid) and nt = length(tau).

optimum

value of the objective function at solution.

coefficients

quantile regression coefficients – transformed scale.

fitted.values

fitted values.

rejected

proportion of inadmissible observations (Fitzenberger et al, 2010).

terms

the terms used.

term.labels

names of coefficients.

rdf

residual degrees of freedom.

Author(s)

Marco Geraci

References

Aranda-Ordaz FJ. On two families of transformations to additivity for binary response data. Biometrika 1981;68(2):357-363.

Box GEP, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society Series B-Statistical Methodology 1964;26(2):211-252.

Dehbi H-M, Cortina-Borja M, Geraci M. Aranda-Ordaz quantile regression for student performance assessment. Journal of Applied Statistics 2015. http://dx.doi.org/10.1080/02664763.2015.1025724

Fitzenberger B, Wilke R, Zhang X. Implementing Box-Cox quantile regression. Econometric Reviews 2010;29(2):158-181.

Geraci M and Jones MC. Improved transformation-based quantile regression. Canadian Journal of Statistics 2015;43(1):118-132.

Jones MC. Connecting distributions with power tails on the real line, the half line and the interval. International Statistical Review 2007;75(1):58-69.

Koenker R (2013). quantreg: Quantile Regression. R package version 5.05. URL http://CRAN.R-project.org/package=quantreg.

Mu YM, He XM. Power transformation toward a linear regression quantile. Journal of the American Statistical Association 2007;102(477):269-279.

See Also

predict.rqt, summary.rqt

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
###########################################################
## Example 1 - singly bounded (from Geraci and Jones, 2014)

data(trees)

## Not run: 
require(MASS)

dx <- 0.01

lambda0 <- boxcox(Volume ~ log(Height), data = trees,
	lambda = seq(-0.9, 0.5, by = dx))
lambda0 <- lambda0$x[which.max(lambda0$y)]
trees$z <- bc(trees$Volume,lambda0)
trees$y <- trees$Volume
trees$x <- log(trees$Height)
trees$x <- trees$x - mean(log(trees$Height))

fit.lm <- lm(z ~ x, data = trees)
newd <- data.frame(x = log(seq(min(trees$Height),
	max(trees$Height), by = 0.1)))
newd$x <- newd$x - mean(log(trees$Height))
ylm <- invbc(predict(fit.lm, newdata = newd), lambda0)

lambdas <- list(bc = seq(-10, 10, by=dx),
	mcjIs = seq(0,10,by = dx), mcjIa = seq(0,20,by = dx))

taus <- 1:3/4
fit0 <- tsrq(y ~ x, tsf = "bc", symm = FALSE, data = trees,
	lambda = lambdas$bc, tau = taus)
fit1 <- tsrq(y ~ x, tsf = "mcjI", symm = TRUE, dbounded = FALSE,
	data = trees, lambda = lambdas$mcjIs, tau = taus)
fit2 <- tsrq(y ~ x, tsf = "mcjI", symm = FALSE, dbounded = FALSE,
	data = trees, lambda = lambdas$mcjIa, tau = taus)


par(mfrow = c(1,3), mar = c(7.1, 7.1, 5.1, 2.1), mgp = c(5, 2, 0))

cx.lab <- 2.5
cx.ax <- 2
lw <- 2
cx <- 2
xb <- "log(Height)"
yb <- "Volume"
xl <- range(trees$x)
yl <- c(5,80)

yhat <- predict(fit0, newdata = newd)
plot(y ~ x,data = trees, xlim = xl, ylim = yl, main = "Box-Cox",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)

yhat <- predict(fit1, newdata = newd)
plot(y ~ x,data = trees, xlim = xl, ylim = yl, main = "Proposal I (symmetric)",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)

yhat <- predict(fit2, newdata = newd)
plot(y ~ x, data = trees, xlim = xl, ylim = yl, main = "Proposal I (asymmetric)",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)

## End(Not run)

###########################################################
## Example 2 - doubly bounded

data(Chemistry)

Chemistry$gcse_gr <- cut(Chemistry$gcse, c(0,seq(4,8,by=0.5)))
with(Chemistry, plot(score ~ gcse_gr, xlab = "GCSE score",
	ylab = "A-level Chemistry score"))

## Not run: 

# The dataset has > 31000 observations and computation can be slow
set.seed(178)
chemsub <- Chemistry[sample(1:nrow(Chemistry), 2000), ]

# Fit symmetric Aranda-Ordaz quantile 0.9
tsrq(score ~ gcse, tsf = "ao", symm = TRUE, lambda = seq(0,2,by=0.01),
	tau = 0.9, data = chemsub)

# Fit symmetric Proposal I quantile 0.9
tsrq(score ~ gcse, tsf = "mcjI", symm = TRUE, dbounded = TRUE,
	lambda = seq(0,2,by=0.01), tau = 0.9, data = chemsub)

# Fit Proposal II quantile 0.9 (Nelder-Mead)
nlrq2(score ~ gcse, dbounded = TRUE, tau = 0.9, data = chemsub)

# Fit Proposal II quantile 0.9 (grid search)
# This is slower than nlrq2 but more stable numerically
tsrq2(score ~ gcse, dbounded = TRUE, lambda = seq(0, 2, by = 0.1),
	delta = seq(0, 2, by = 0.1), tau = 0.9, data = chemsub)


## End(Not run)

###########################################################
## Example 3 - doubly bounded

data(labor)

new <- labor
new$y <- new$pain
new$x <- (new$time-30)/30
new$x_gr <- as.factor(new$x)

par(mfrow = c(2,2), mai = c(1.3,1.3,0.5,0.5), mgp = c(5, 2, 0))

cx.lab <- 2.5
cx.ax <- 2.5
cx <- 2.5
yl <- c(0,0.06)

hist(new$y[new$treatment == 1], xlab = "Pain score", main = "Treated",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab, axes = TRUE,
	freq = FALSE, ylim = yl)

plot(y ~ x_gr, new, subset = new$treatment == 1, xlab = "Time (min)",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax,
	cex.main = cx.lab, axes = FALSE, range = 0)
axis(1, at = 1:6, labels = c(0:5)*30 + 30, cex.axis = 2)
axis(2, cex.axis = cx.lab)
box()

hist(new$y[new$treatment == 0], xlab = "Pain score", main = "Placebo",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = TRUE, freq = FALSE, ylim = yl)

plot(y ~ x_gr, new, subset = new$treatment == 0, xlab = "Time (min)",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax,
	cex.main = cx.lab, axes = FALSE, range = 0)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = 2)
axis(2, cex.axis = cx.lab)
box()

#

## Not run: 

taus <- c(1:3/4)
ls <- seq(0,3.5,by=0.1)

fit.aos <- tsrq(y ~ x*treatment, tsf = "ao", symm = TRUE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.aoa <- tsrq(y ~ x*treatment, tsf = "ao", symm = FALSE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.mcjs <- tsrq(y ~ x*treatment, tsf = "mcjI", symm = TRUE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.mcja <- tsrq(y ~ x*treatment, tsf = "mcjI", symm = FALSE, dbounded = TRUE,
	data = new, tau = taus, lambda = ls)
fit.mcj2 <- tsrq2(y ~ x*treatment, dbounded = TRUE, data = new, tau = taus,
	lambda = seq(0,2,by=0.1), delta = seq(0,1.5,by=0.3))
fit.nlrq <- nlrq2(y ~ x*treatment, par = coef(fit.mcj2, all = TRUE)[,1],
	dbounded = TRUE, data = new, tau = taus)

sel <- 0
selx <- if(sel == 1) "a" else "b"
x <- new$x
nd <- data.frame(x = seq(min(x),max(x),length=200), treatment = sel)
xx <- nd$x+1

par(mfrow = c(2,2), mar = c(7.1, 7.1, 5.1, 2.1), mgp = c(5, 2, 0))
cx.lab <- 3
cx.ax <- 2
lw <- 3
cx <- 1

fit <- fit.aos
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = FALSE, main = "Aranda-Ordaz (s)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

fit <- fit.aoa
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "", ylab = "",
	cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab, axes = FALSE,
	main = "Aranda-Ordaz (a)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

fit <- fit.mcjs
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "Time (min)",
	ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = FALSE, main = "Proposal I (s)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

fit <- fit.mcj2
yhat <- predict(fit, newdata = nd)

plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "Time (min)",
	ylab = "", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
	axes = FALSE, main = "Proposal II", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.