# gcrq: Growth charts regression quantiles with automatic smoothness... In quantregGrowth: Growth Charts via Smooth Regression Quantiles with Automatic Smoothness Estimation and Additive Terms

## Description

Modelling unspecified nonlinear relationships between covariates and quantiles of the response conditional distribution. Typical example is estimation nonparametric growth charts (via quantile regression). Quantile curves are estimated via B-splines with a L1 penalty on the spline coefficient differences, while non-crossing and possible monotonicity and concavity restrictions are set to obtain estimates more biologically plausible. Linear terms can be specified in the model formula. Multiple smooth terms, including varying coefficients, with automatic selection of corresponding smoothing parameters are allowed.

## Usage

 ```1 2 3 4 5``` ```gcrq(formula, tau=c(.1,.25,.5,.75,.9), data, subset, weights, na.action, transf=NULL, y=TRUE, n.boot=0, eps=0.0001, display=FALSE, method=c("REML","ML"), df.opt=2, lambda0=.1, h=0.8, lambda.max=2000, tol=0.01, it.max=20, single.lambda=TRUE, foldid=NULL, nfolds=10, lambda.ridge=0, contrasts=NULL, ...) ```

## Arguments

 `formula` a standard R formula to specify the response in the left hand side, and the covariates in the right hand side. See Details. `tau` a numeric vector to specify the quantile curves of interest. Default to probability values (.1,.25,.5,.75,.9). `data` the dataframe where the variables required by the formula, subset and weights arguments are stored. `subset` optional. A vector specifying a subset of observations to be used in the fitting process. `weights` optional. A numeric vector specifying weights to be assigned to the observations in the fitting process. Currently unimplemented. `na.action` a function which indicates how the possible ‘NA’s are handled. `transf` an optional character string (with `"y"` as argument) meaning a function to apply to the response variable before fitting. E.g. `"log(y+0.1)"`. It can be useful to guarantee fitted values within a specified range. If provided, the resulting oject fit will include the corresponding inverse (numerically computed) to be used to back transform predictions (see argument `transform` in `predict.gcrq` and `plot.gcrq`). `y` logical. If `TRUE` (default) the returned object includes also the responses vector.
 `n.boot` Number of nonparametric (cases resampling) bootstrap samples to be used. The covariance matrix estimates is obtained as empirical covariance matrix of the bootstrap dostributions. Notice that the smoothing parameter (if relevant) is assumed fixed. Namely it does change throughout the bootstrap replicates. Set `n.boot>0` if you plan to plot the fitted quantiles along with pointwise confidence intervals. `eps` A small positive constant to ensure noncrossing curves (i.e. the minimum distance between two consecutive curves). Use it at your risk! If `eps` is large, the resulting fitted quantile curves could appear unreasonable. `display` Logical. Should the iterative process be printed? Ignored if all the smoothing parameters specified in `ps` terms are fixed. `method` character, `"ML"` or `"REML"` affecting the smoothing parameter estimation. Default is `"REML"` which appears to provide better performance in simulation studies. Ignored if no smoothing parameter has to be estimated. `df.opt` How the model and term-specific degrees of freedom have to be computed. `df.opt=1` means via the null penalized coefficients, and `df.opt=2` via the trace of the approximate hat matrix. Ignored if no smoothing parameter has to be estimated. `lambda0` the starting value for the lambdas to be estimated. Ignored if all the smoothing parameters specified in `ps` terms are fixed. `h` The step halving factor affecting estimation of the smoothing parameters. Lower values lead to slower updates in the lambda values. Ignored if all the smoothing parameters specified in `ps` terms are fixed. `lambda.max` The upper bound for lambda estimation. Ignored if all the smoothing parameters specified in `ps` terms are fixed. `tol` The tolerance value to declare convergence. Ignored if all the smoothing parameters specified in `ps` terms are fixed. `it.max` The maximum number of iterations in lambdas estimation. Ignored if all the smoothing parameters specified in `ps` terms are fixed. `single.lambda` Logical. Should the smoothing parameter (for each smooth term) to be the same across the quantile curves being estimated? Ignored when just a single quantile curve is being estimated. `foldid` optional. A numeric vector identifying the group labels to perform cross validation to select the smoothing parameter. Ignored if the `lambda` argument in `ps()` is not a vector. `nfolds` optional. If `foldid` is not provided, it is scalar specifying the number of ‘folds’ (groups) which should be used to perform cross validation to select the smoothing parameter. Default to 10, but it is ignored if the `lambda` argument in `ps()` is not a vector. `lambda.ridge` Numerical value (typically very small) to stabilize model estimation. `contrasts` an optional list. See argument `contrasts.arg` in `model.matrix.default`. `...` further arguments.

## Details

The function fits regression quantiles at specified percentiles given in `tau` as a function of covariates specified in the `formula` argument. The `formula` may include linear terms and one or several `ps` terms to model nonlinear relationships with quantitative covariates, usually age in growth charts. When the `lambda` argument in `ps()` is a negative scalar, the smoothing parameter is estimated iteratively as discussed in Muggeo et al. (2020). If a positive scalar, it represents the actual smoothing parameter value. When it is a vector (and the model includes a single `ps` term), 'K-fold' cross validation is performed to select the ‘optimal’ lambda value and the model is fitted at such selected lambda value. To select the smoothing parameter via CV, `foldid` or `nfolds` may be supplied. If provided `foldid` overwrites `nfolds`, otherwise `foldid` is obtained via random extraction, namely `sample(rep(seq(nfolds), length = n))`. However selection of smoothing parameter via CV is allowed only with a unique ps() term in the formula.

## Value

This function returns an object of class `gcrq`, that is a list with the following components

 `coefficients` The matrix of estimated regression parameters; the number of columns equals the number of the fitted quantile curves. `x` the design matrix of the final fit (including the dummy rows used by penalty). `df` a vector reporting the df values for each quantile curve. See the section 'Warning' below. `rho` a vector including the values of the objective functions at the solution for each quantile curve. `fitted.values` a matrix of fitted quantiles (a column for each `tau` value) `residuals` a matrix of residuals (a column for each `tau` value) `D.matrix` the penalty matrix (multiplied by the smoothing parameter value). `D.matrix.nolambda` the penalty matrix. `pLin` number of linear covariates in the model. `info.smooth` some information on the smoothing term (if included in the formula via `ps`). `BB` further information on the smoothing term (if present in the formula via `ps`), including stuff useful for plotting via `plot.gcrq()`. `Bderiv` if the smooth term is included, the first derivative of the B spline basis. `boot.coef` The array including the estimated coefficients at different bootstrap samples (provided that `n.boot`>0 has been set). `y` the response vector (if gcrq() has been called with `y=TRUE`). `contrasts` the contrasts used, when the model contains a factor. `xlevels` the levels of the factors (when included) used in fitting. `taus` a vector of values between 0 and 1 indicating the estimated quantile curves. `call` the matched call.

## Warning

Currently, standard errors are obtained via the nonparametric bootstrap (case resampling) or the sandwich formula. Both methods ignore uncertainty in the smoothing parameter selection.

Using `ps(.., center=TRUE)` in the formula leads to lower uncertainty in the fitted curve, but noncrossing is no longer guaranteed.

## Note

This function is based upon the package quantreg by R. Koenker. Currently methods specific to the class `"gcrq"` are `print.gcrq`, `summary.gcrq`, `vcov.gcrq`, `plot.gcrq`, and `predict.gcrq`.

If the sample is not large, and/or the basis rank is large (i.e. a large number of columns) and/or there are relatively few distinct values in the covariate distribution, the fitting algorithm may fail returning error messages like the following

`> Error info = 20 in stepy2: singular design`

To remedy it, it suffices to change some arguments in `ps()`: to decrease `ndx` or `deg` (even by a small amount) or to increase (even by a small amount) the lambda value. Sometimes even by changing slightly the tau probability value (for instance from 0.80 to 0.79) can bypass the aforementioned errors.

## Author(s)

Vito M. R. Muggeo, vito.muggeo@unipa.it

## References

V.M.R. Muggeo, F. Torretta, P.H.C. Eilers, M. Sciandra, M. Attanasio (2020). Multiple smoothing parameters selection in additive regression quantiles, Statistical Modelling, to appear.

V. M. R. Muggeo, M. Sciandra, A. Tomasello, S. Calvo (2013). Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology, Environ Ecol Stat, 20, 519-531.

V. M. R. Muggeo (2018). Using the R package quantregGrowth: some examples.
https://www.researchgate.net/publication/323573492

`ps`, `plot.gcrq`, `predict.gcrq`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63``` ```## Not run: #An additive examples.. from ?mgcv::gam d<-mgcv::gamSim(n=200, eg=1) o<-gcrq(y ~ ps(x0) + ps(x1)+ ps(x2) + ps(x3), data=d, tau=.5, n.boot=50) plot(o, res=TRUE, col=2, conf.level=.9, shade=TRUE, split=TRUE) #some simple examples involving just a single smooth data(growthData) #load data tauss<-seq(.1,.9,by=.1) #fix the percentiles of interest m1<-gcrq(y~ps(x), tau=tauss, data=growthData) #lambda estimated.. m2<-gcrq(y~ps(x, lambda=0), tau=tauss, data=growthData) #unpenalized.. very wiggly curves #strongly penalized models m3<-gcrq(y~ps(x, lambda=1000, d=2), tau=tauss, data=growthData) #linear m4<-gcrq(y~ps(x, lambda=1000, d=3), tau=tauss, data=growthData) #quadratic #penalized model with monotonicity restrictions m5<-gcrq(y~ps(x, monotone=1, lambda=10), tau=tauss, data=growthData) #monotonicity constraints,lambda estimated, and varying penalty m6<-gcrq(y~ps(x, monotone=1, lambda=10, var.pen="(1:k)"), tau=tauss, data=growthData) m6a<-gcrq(y~ps(x, monotone=1, lambda=10, var.pen="(1:k)^2"), tau=tauss, data=growthData) par(mfrow=c(2,3)) plot(m1, pch=20, res=TRUE) plot(m2, pch=20, res=TRUE) plot(m3, add=TRUE, lwd=2) plot(m4, pch=20, res=TRUE) plot(m5, pch=20, res=TRUE, legend=TRUE, col=2) plot(m6, lwd=2, col=3) plot(m6a, lwd=2, col=4) #select lambda via 'K-fold' CV (only with a single smooth term) m7<-gcrq(y~ps(x, lambda=seq(0.02,100,l=20)), tau=tauss, data=growthData) par(mfrow=c(1,2)) plot(m7, cv=TRUE) #display CV score versus lambda values plot(m7, res=TRUE, grid=list(x=5, y=8), col=4) #fitted curves at the best lambda value #===VC example n=50 x<-1:n/n y0<-10+sin(2*pi*x) y1<-seq(7,11,l=n) y<-c(y0,y1)+rnorm(2*n)*.1 x<-c(x,x) z<-rep(0:1, each=n) #include as a continuous covariate.. o<-gcrq(y~ ps(x, ndx=5) + ps(x,by=z,ndx=5), n.boot=0, tau=.5) predict(o, newdata=data.frame(x=c(.3,.7), z=c(0,1))) #.. as a factor g<-factor(z) o1<-gcrq(y~ 0+ps(x,by=g,ndx=5), tau=.5) #you have to remove the model intercept! predict(o1, newdata=data.frame(x=c(.3,.7), g=factor(c(0,1)))) ## End(Not run) ```