Description Usage Arguments Details Value Author(s) References See Also Examples
These functions are used to fit quantile regression transformation models.
1 2 3 4 5 6 7 8 9 | tsrq(formula, tsf = "mcjI", symm = TRUE, dbounded = FALSE, lambda = NULL,
conditional = FALSE, tau = 0.5, data, subset, weights, na.action,
method = "fn", ...)
tsrq2(formula, dbounded = FALSE, lambda = NULL, delta = NULL, conditional = FALSE,
tau = 0.5, data, subset, weights, na.action, method = "fn", ...)
rcrq(formula, tsf = "mcjI", symm = TRUE, dbounded = FALSE, lambda = NULL,
tau = 0.5, data, subset, weights, na.action, method = "fn", ...)
nlrq2(formula, par = NULL, dbounded = FALSE, tau = 0.5, data,
subset, weights, na.action, ...)
|
formula |
an object of class " |
tsf |
transformation to be used. Possible options are |
symm |
logical flag. If |
dbounded |
logical flag. If |
lambda, delta |
values of transformation parameters for grid search. |
conditional |
logical flag. If |
par |
vector of length |
tau |
the quantile(s) to be estimated. See |
data |
a data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. |
na.action |
a function which indicates what should happen when the data contain |
method |
fitting algorithm for |
... |
other arguments in |
These functions implement quantile regression transformation models as discussed by Geraci and Jones (see references). The general model is assumed to be Q(h(response)) = xb, where Q is the conditional quantile function and h is a one- or two-parameter transformation. A typical model specified in formula
has the form response ~ terms
where response
is the (numeric) response vector and terms
is a series of terms which specifies a linear predictor for the quantile of the transformed response. The response
, which is singly or doubly bounded, i.e. response > 0
or 0 <= response <= 1
respectively, undergoes the transformation specified in tsf
. If the response is bounded in the generic [a,b] interval, the latter is automatically mapped to [0,1] and no further action is required. If, however, the response is singly bounded and contains negative values, it is left to the user to offset the response or the code will produce an error. The functions tsrq
and tsrq2
use a two-stage (TS) estimator (Fitzenberger et al, 2010) for, respectively, one- and two-parameter transformations. The function rcrq
(one-parameter tranformations) is based on the residual cusum process estimator proposed by Mu and He (2007), while the function nlrq2
(two-parameter tranformations) is based on Nelder-Mead optimization (Geraci and Jones).
tsrq
, tsrq2
, rcrq
, nlrq2
return an object of class
rqt
. This is a list that contains as typical components:
|
the first |
call |
the matched call. |
method |
the fitting algorithm for |
y |
the response – untransformed scale. |
theta |
if |
x |
the model matrix. |
weights |
the weights used in the fitting process (a vector of 1's if |
tau |
the order of the estimated quantile(s). |
lambda |
the estimated parameter lambda. |
eta |
the estimated parameters lambda and delta in the two-parameter Proposal II tranformation. |
lambda.grid |
grid of lambda values used for estimation. |
delta.grid |
grid of delta values used for estimation. |
tsf |
tranformation used (see also |
objective |
values of the objective function minimised over the tranformation parameter(s). This is an array of dimension |
optimum |
value of the objective function at solution. |
coefficients |
quantile regression coefficients – transformed scale. |
fitted.values |
fitted values. |
rejected |
proportion of inadmissible observations (Fitzenberger et al, 2010). |
terms |
the |
term.labels |
names of coefficients. |
rdf |
residual degrees of freedom. |
Marco Geraci
Aranda-Ordaz FJ. On two families of transformations to additivity for binary response data. Biometrika 1981;68(2):357-363.
Box GEP, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society Series B-Statistical Methodology 1964;26(2):211-252.
Dehbi H-M, Cortina-Borja M, Geraci M. Aranda-Ordaz quantile regression for student performance assessment. Journal of Applied Statistics 2015. http://dx.doi.org/10.1080/02664763.2015.1025724
Fitzenberger B, Wilke R, Zhang X. Implementing Box-Cox quantile regression. Econometric Reviews 2010;29(2):158-181.
Geraci M and Jones MC. Improved transformation-based quantile regression. Canadian Journal of Statistics 2015;43(1):118-132.
Jones MC. Connecting distributions with power tails on the real line, the half line and the interval. International Statistical Review 2007;75(1):58-69.
Koenker R (2013). quantreg: Quantile Regression. R package version 5.05. URL http://CRAN.R-project.org/package=quantreg.
Mu YM, He XM. Power transformation toward a linear regression quantile. Journal of the American Statistical Association 2007;102(477):269-279.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | ###########################################################
## Example 1 - singly bounded (from Geraci and Jones, 2014)
data(trees)
## Not run:
require(MASS)
dx <- 0.01
lambda0 <- boxcox(Volume ~ log(Height), data = trees,
lambda = seq(-0.9, 0.5, by = dx))
lambda0 <- lambda0$x[which.max(lambda0$y)]
trees$z <- bc(trees$Volume,lambda0)
trees$y <- trees$Volume
trees$x <- log(trees$Height)
trees$x <- trees$x - mean(log(trees$Height))
fit.lm <- lm(z ~ x, data = trees)
newd <- data.frame(x = log(seq(min(trees$Height),
max(trees$Height), by = 0.1)))
newd$x <- newd$x - mean(log(trees$Height))
ylm <- invbc(predict(fit.lm, newdata = newd), lambda0)
lambdas <- list(bc = seq(-10, 10, by=dx),
mcjIs = seq(0,10,by = dx), mcjIa = seq(0,20,by = dx))
taus <- 1:3/4
fit0 <- tsrq(y ~ x, tsf = "bc", symm = FALSE, data = trees,
lambda = lambdas$bc, tau = taus)
fit1 <- tsrq(y ~ x, tsf = "mcjI", symm = TRUE, dbounded = FALSE,
data = trees, lambda = lambdas$mcjIs, tau = taus)
fit2 <- tsrq(y ~ x, tsf = "mcjI", symm = FALSE, dbounded = FALSE,
data = trees, lambda = lambdas$mcjIa, tau = taus)
par(mfrow = c(1,3), mar = c(7.1, 7.1, 5.1, 2.1), mgp = c(5, 2, 0))
cx.lab <- 2.5
cx.ax <- 2
lw <- 2
cx <- 2
xb <- "log(Height)"
yb <- "Volume"
xl <- range(trees$x)
yl <- c(5,80)
yhat <- predict(fit0, newdata = newd)
plot(y ~ x,data = trees, xlim = xl, ylim = yl, main = "Box-Cox",
cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)
yhat <- predict(fit1, newdata = newd)
plot(y ~ x,data = trees, xlim = xl, ylim = yl, main = "Proposal I (symmetric)",
cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)
yhat <- predict(fit2, newdata = newd)
plot(y ~ x, data = trees, xlim = xl, ylim = yl, main = "Proposal I (asymmetric)",
cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
cex = cx, xlab = xb, ylab = yb)
lines(newd$x, yhat[,1], lwd = lw)
lines(newd$x, yhat[,2], lwd = lw)
lines(newd$x, yhat[,3], lwd = lw)
lines(newd$x, ylm, lwd = lw, lty = 2)
## End(Not run)
###########################################################
## Example 2 - doubly bounded
data(Chemistry)
Chemistry$gcse_gr <- cut(Chemistry$gcse, c(0,seq(4,8,by=0.5)))
with(Chemistry, plot(score ~ gcse_gr, xlab = "GCSE score",
ylab = "A-level Chemistry score"))
## Not run:
# The dataset has > 31000 observations and computation can be slow
set.seed(178)
chemsub <- Chemistry[sample(1:nrow(Chemistry), 2000), ]
# Fit symmetric Aranda-Ordaz quantile 0.9
tsrq(score ~ gcse, tsf = "ao", symm = TRUE, lambda = seq(0,2,by=0.01),
tau = 0.9, data = chemsub)
# Fit symmetric Proposal I quantile 0.9
tsrq(score ~ gcse, tsf = "mcjI", symm = TRUE, dbounded = TRUE,
lambda = seq(0,2,by=0.01), tau = 0.9, data = chemsub)
# Fit Proposal II quantile 0.9 (Nelder-Mead)
nlrq2(score ~ gcse, dbounded = TRUE, tau = 0.9, data = chemsub)
# Fit Proposal II quantile 0.9 (grid search)
# This is slower than nlrq2 but more stable numerically
tsrq2(score ~ gcse, dbounded = TRUE, lambda = seq(0, 2, by = 0.1),
delta = seq(0, 2, by = 0.1), tau = 0.9, data = chemsub)
## End(Not run)
###########################################################
## Example 3 - doubly bounded
data(labor)
new <- labor
new$y <- new$pain
new$x <- (new$time-30)/30
new$x_gr <- as.factor(new$x)
par(mfrow = c(2,2), mai = c(1.3,1.3,0.5,0.5), mgp = c(5, 2, 0))
cx.lab <- 2.5
cx.ax <- 2.5
cx <- 2.5
yl <- c(0,0.06)
hist(new$y[new$treatment == 1], xlab = "Pain score", main = "Treated",
cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab, axes = TRUE,
freq = FALSE, ylim = yl)
plot(y ~ x_gr, new, subset = new$treatment == 1, xlab = "Time (min)",
ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax,
cex.main = cx.lab, axes = FALSE, range = 0)
axis(1, at = 1:6, labels = c(0:5)*30 + 30, cex.axis = 2)
axis(2, cex.axis = cx.lab)
box()
hist(new$y[new$treatment == 0], xlab = "Pain score", main = "Placebo",
cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
axes = TRUE, freq = FALSE, ylim = yl)
plot(y ~ x_gr, new, subset = new$treatment == 0, xlab = "Time (min)",
ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax,
cex.main = cx.lab, axes = FALSE, range = 0)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = 2)
axis(2, cex.axis = cx.lab)
box()
#
## Not run:
taus <- c(1:3/4)
ls <- seq(0,3.5,by=0.1)
fit.aos <- tsrq(y ~ x*treatment, tsf = "ao", symm = TRUE, dbounded = TRUE,
data = new, tau = taus, lambda = ls)
fit.aoa <- tsrq(y ~ x*treatment, tsf = "ao", symm = FALSE, dbounded = TRUE,
data = new, tau = taus, lambda = ls)
fit.mcjs <- tsrq(y ~ x*treatment, tsf = "mcjI", symm = TRUE, dbounded = TRUE,
data = new, tau = taus, lambda = ls)
fit.mcja <- tsrq(y ~ x*treatment, tsf = "mcjI", symm = FALSE, dbounded = TRUE,
data = new, tau = taus, lambda = ls)
fit.mcj2 <- tsrq2(y ~ x*treatment, dbounded = TRUE, data = new, tau = taus,
lambda = seq(0,2,by=0.1), delta = seq(0,1.5,by=0.3))
fit.nlrq <- nlrq2(y ~ x*treatment, par = coef(fit.mcj2, all = TRUE)[,1],
dbounded = TRUE, data = new, tau = taus)
sel <- 0
selx <- if(sel == 1) "a" else "b"
x <- new$x
nd <- data.frame(x = seq(min(x),max(x),length=200), treatment = sel)
xx <- nd$x+1
par(mfrow = c(2,2), mar = c(7.1, 7.1, 5.1, 2.1), mgp = c(5, 2, 0))
cx.lab <- 3
cx.ax <- 2
lw <- 3
cx <- 1
fit <- fit.aos
yhat <- predict(fit, newdata = nd)
plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "",
ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
axes = FALSE, main = "Aranda-Ordaz (s)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()
fit <- fit.aoa
yhat <- predict(fit, newdata = nd)
plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "", ylab = "",
cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab, axes = FALSE,
main = "Aranda-Ordaz (a)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()
fit <- fit.mcjs
yhat <- predict(fit, newdata = nd)
plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "Time (min)",
ylab = "Pain score", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
axes = FALSE, main = "Proposal I (s)", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()
fit <- fit.mcj2
yhat <- predict(fit, newdata = nd)
plot(y ~ x_gr, new, subset = new$treatment == sel, xlab = "Time (min)",
ylab = "", cex.lab = cx.lab, cex.axis = cx.ax, cex.main = cx.lab,
axes = FALSE, main = "Proposal II", range = 0, col = grey(4/5))
apply(yhat, 2, function(y,x) lines(x, y, lwd = lw), x = xx)
axis(1, at = 1:6, labels = (0:5)*30 + 30, cex.axis = cx.lab)
axis(2, at = c(0, 25, 50, 75, 100), cex.axis = cx.lab)
box()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.