iqr: Quantile Regression Coefficients Modeling In qrcm: Quantile Regression Coefficients Modeling

Description

This function implements Frumento and Bottai's (2016) method for quantile regression coefficients modeling (qrcm). Quantile regression coefficients are described by (flexible) parametric functions of the order of the quantile. From version 2.0, this function can also be used with censored and truncated data, as described in Frumento and Bottai (2017).

Usage

 `1` ```iqr(formula, formula.p = ~ slp(p,3), weights, data, s, tol = 1e-6, maxit) ```

Arguments

 `formula` a two-sided formula of the form `y ~ x1 + x2 + ...`: a symbolic description of the quantile regression model. The left side of the formula should be `Surv(time,event)` if the data are right-censored, and `Surv(time,time2,event)` if the data are left-truncated (`time < time2`, `time` can be -Inf). `formula.p` a one-sided formula of the form `~ b1(p, ...) + b2(p, ...) + ...`, describing how quantile regression coefficients depend on p, the order of the quantile. `weights` an optional vector of weights to be used in the fitting process. The weights will always be normalized to sum to the sample size. This implies that, for example, using double weights will not halve the standard errors. `data` an optional data frame, list or environment containing the variables in `formula`. `s` an optional 0/1 matrix that permits excluding some model coefficients (see ‘Examples’). `tol` convergence criterion for numerical optimization. `maxit` maximum number of iterations.

Details

Quantile regression permits modeling conditional quantiles of a response variabile, given a set of covariates. A linear model is used to describe the conditional quantile function:

Q(p | x) = β0(p) + β1(p)*x1 + β2(p)*x2 + ….

The model coefficients β(p) describe the effect of covariates on the p-th quantile of the response variable. Usually, one or more quantiles are estimated, corresponding to different values of p.

Assume that each coefficient can be expressed as a parametric function of p of the form:

β(p | θ) = θ0 + θ1*b1(p) + θ2*b2(p) + …

where b1(p), b2(p), … are known functions of p. If q is the dimension of x = (1, x1, x2, …) and k is that of b(p) = (1, b1(p), b2(p), …), the entire conditional quantile function is described by a q*k matrix θ of model parameters.

Users are required to specify two formulas: `formula` describes the regression model, while `formula.p` identifies the 'basis' b(p). By default, `formula.p = ~ slp(p, k = 3)`, a 3rd-degree shifted Legendre polynomial (see `slp`). Any user-defined function b(p, …) can be used, see ‘Examples’.

Estimation of θ is carried out by minimizing an integrated loss function, corresponding to the integral, over p, of the loss function of standard quantile regression. This motivates the acronym `iqr` (integrated quantile regression).

From version 2.0 of the package, censored and truncated data are also allowed. Data are censored when, instead of a response variable T, one can only observe Y = \min(T,C) and d = I(T ≤ C). Here, C is a censoring variable that is assumed to be conditionally independent of T. Truncation occurs if Y can only be observed when it exceeds another random variable Z. For example, in the prevalent sampling design, subjects with a disease are enrolled; those who died before enrollment are not observed. If the data are censored or truncated, θ is estimated by solving a system of estimating equation as described in Frumento and Bottai (2017).

Value

An object of class “`iqr`”, a list containing the following items:

 `coefficients` a matrix of estimated model parameters describing the fitted quantile function. `converged` logical. The convergence status. `n.it` the number of iterations. `call` the matched call. `obj.function` the value of the minimized integrated loss function (NULL if the data are censored or truncated). `mf` the model frame used. `PDF, CDF` the fitted values of the conditional probability density function (PDF) and cumulative distribution function (CDF). `covar` the estimated covariance matrix. `s` the used ‘s’ matrix.

Use `summary.iqr`, `plot.iqr`, and `predict.iqr` for summary information, plotting, and predictions from the fitted model. The function `test.fit` can be used for goodness-of-fit assessment. The generic accessor functions `coefficients`, `formula`, `terms`, `model.matrix`, `vcov` are available to extract information from the fitted model.

Note

By expressing quantile regression coefficients as functions of p, a parametric model for the conditional quantile function is specified. The induced PDF and CDF can be used as diagnostic tools. Negative values of `PDF` indicate quantile crossing, i.e., the conditional quantile function is not monotonically increasing. Null values of `PDF` indicate observations that lie outside the estimated support of the data, defined by quantiles of order 0 and 1. If null or negative `PDF` values occur for a relatively large proportion of data, the model is probably misspecified or ill-defined. If the model is correct, the fitted `CDF` should approximately follow a Uniform(0,1) distribution. This idea is used to implement a goodness-of-fit test, see `test.fit`.

The intercept can be excluded from `formula`, e.g., `iqr(y ~ -1 + x)`. This, however, implies that when `x = 0`, `y` is always 0. See example 5 in ‘Examples’. The intercept can also be removed from `formula.p`. This is recommended if the data are bounded. For example, for strictly positive data, use `iqr(y ~ 1, formula.p = -1 + slp(p,3))` to force the smallest quantile to be zero. See example 6 in ‘Examples’.

Author(s)

Paolo Frumento [email protected]

References

Frumento, P., and Bottai, M. (2016). Parametric modeling of quantile regression coefficient functions. Biometrics, 72 (1), pp 74-84, doi: 10.1111/biom.12410.

Frumento, P., and Bottai, M. (2017). Parametric modeling of quantile regression coefficient functions with censored and truncated data. Biometrics, doi: 10.1111/biom.12675.

`summary.iqr`, `plot.iqr`, `predict.iqr`, for summary, plotting, and prediction, and `test.fit` for goodness-of-fit assessment. `plf` and `slp` to define b(p) to be a piecewise linear function and a shifted Legendre polynomial basis, respectively.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166``` ``` ##### Using simulated data in all examples ##### Example 1 n <- 1000 x <- runif(n) y <- rnorm(n, 1 + x, 1 + x) # true quantile function: Q(p | x) = beta0(p) + beta1(p)*x, with # beta0(p) = beta1(p) = 1 + qnorm(p) # fit the true model: b(p) = (1 , qnorm(p)) m1 <- iqr(y ~ x, formula.p = ~ I(qnorm(p))) # the fitted quantile regression coefficient functions are # beta0(p) = m1\$coef[1,1] + m1\$coef[1,2]*qnorm(p) # beta1(p) = m1\$coef[2,1] + m1\$coef[2,2]*qnorm(p) # a basis b(p) = (1, p), i.e., beta(p) is assumed to be a linear function of p m2 <- iqr(y ~ x, formula.p = ~ p) # a 'rich' basis b(p) = (1, p, p^2, log(p), log(1 - p)) m3 <- iqr(y ~ x, formula.p = ~ p + I(p^2) + I(log(p)) + I(log(1 - p))) # 'slp' creates an orthogonal spline basis using shifted Legendre polynomials m4 <- iqr(y ~ x, formula.p = ~ slp(p, k = 3)) # note that this is the default # 'plf' creates the basis of a piecewise linear function m5 <- iqr(y ~ x, formula.p = ~ plf(p, knots = c(0.1,0.9))) summary(m1) summary(m1, p = c(0.25,0.5,0.75)) test.fit(m1, R = 25) par(mfrow = c(1,2)); plot(m1, ask = FALSE) # see the documentation for 'summary.iqr', 'test.fit', and 'plot.iqr' ##### Example 2 ### excluding coefficients n <- 1000 x <- runif(n) qy <- function(p,x){(1 + qnorm(p)) + (1 + log(p))*x} # true quantile function: Q(p | x) = beta0(p) + beta1(p)*x, with # beta0(p) = 1 + qnorm(p) # beta1(p) = 1 + log(p) y <- qy(runif(n), x) # to generate y, plug uniform p in qy(p,x) iqr(y ~ x, formula.p = ~ I(qnorm(p)) + I(log(p))) # I would like to exclude log(p) from beta0(p), and qnorm(p) from beta1(p) # I set to 0 the corresponding entries of 's' s <- matrix(1,2,3); s[1,3] <- s[2,2] <- 0 iqr(y ~ x, formula.p = ~ I(qnorm(p)) + I(log(p)), s = s) ##### Example 3 ### excluding coefficients when b(p) is singular n <- 1000 x <- runif(n) qy <- function(p,x){(1 + log(p) - 2*log(1 - p)) + (1 + log(p/(1 - p)))*x} # true quantile function: Q(p | x) = beta0(p) + beta1(p)*x, with # beta0(p) = 1 + log(p) - 2*log(1 - p) # beta1(p) = 1 + log(p/(1 - p)) y <- qy(runif(n), x) # to generate y, plug uniform p in qy(p,x) iqr(y ~ x, formula.p = ~ I(log(p)) + I(log(1 - p)) + I(log(p/(1 - p)))) # log(p/(1 - p)) is dropped due to singularity # I want beta0(p) to be a function of log(p) and log(1 - p), # and beta1(p) to depend on log(p/(1 - p)) alone s <- matrix(1,2,4); s[2,2:3] <- 0 iqr(y ~ x, formula.p = ~ I(log(p)) + I(log(1 - p)) + I(log(p/(1 - p))), s = s) # log(p/(1 - p)) is not dropped ##### Example 4 ### using slp to test deviations from normality n <- 1000 x <- runif(n) y <- rnorm(n, 2 + x) # the true model is normal, i.e., b(p) = (1, qnorm(p)) summary(iqr(y ~ x, formula.p = ~ I(qnorm(p)) + slp(p,3))) # if slp(p,3) is not significant, no deviation from normality ##### Example 5 ### formula without intercept n <- 1000 x <- runif(n) y <- runif(n, 0,x) # True quantile function: Q(p | x) = p*x, i.e., beta0(p) = 0, beta1(p) = p # When x = 0, all quantiles of y are 0, i.e., the distribution is degenerated # To explicitly model this, remove the intercept from 'formula' iqr(y ~ -1 + x, formula.p = ~ p) # the true model does not have intercept in b(p) either: iqr(y ~ -1 + x, formula.p = ~ -1 + p) ##### Example 6 ### no covariates, strictly positive outcome n <- 1000 y <- rgamma(n, 3,1) # you know that Q(0) = 0 # remove intercept from 'formula.p', and use b(p) such that b(0) = 0 summary(iqr(y ~ 1, formula.p = ~ -1 + slp(p,5))) # shifted Legendre polynomials summary(iqr(y ~ 1, formula.p = ~ -1 + sin(p*pi/2) + I(qbeta(p,2,4)))) # unusual basis summary(iqr(y ~ 1, formula.p = ~ -1 + I(sqrt(p))*I(log(1 - p)))) # you can include interactions ##### Example 7 ### revisiting the classical linear model n <- 1000 x <- runif(n) y <- 2 + 3*x + rnorm(n,0,1) # beta0 = 2, beta1 = 3 iqr(y ~ x, formula.p = ~ I(qnorm(p)), s = matrix(c(1,1,1,0),2)) # first column of coefficients: (beta0, beta1) # top-right coefficient: residual standard deviation ##### Example 8 ### censored data n <- 1000 x <- runif(n,0,5) u <- runif(n) beta0 <- -log(1 - u) beta1 <- 0.2*log(1 - u) t <- beta0 + beta1*x # time variable c <- rexp(n,2) # censoring variable y <- pmin(t,c) # observed events d <- (t <= c) # 1 = event, 0 = censored iqr(Surv(y,d) ~ x, formula.p = ~ I(log(1 - p))) ##### Example 8 (cont.) ### censored and truncated data z <- rexp(n,10) # truncation variable w <- which(y > z) # only observe z,y,d,x when y > z z <- z[w]; y <- y[w]; d <- d[w]; x <- x[w] iqr(Surv(z,y,d) ~ x, formula.p = ~ I(log(1 - p))) ```