# lms.bcn: LMS Quantile Regression with a Box-Cox Transformation to... In VGAM: Vector Generalized Linear and Additive Models

## Description

LMS quantile regression with the Box-Cox transformation to normality.

## Usage

 ```1 2 3 4``` ```lms.bcn(percentiles = c(25, 50, 75), zero = c("lambda", "sigma"), llambda = "identitylink", lmu = "identitylink", lsigma = "loglink", idf.mu = 4, idf.sigma = 2, ilambda = 1, isigma = NULL, tol0 = 0.001) ```

## Arguments

 `percentiles` A numerical vector containing values between 0 and 100, which are the quantiles. They will be returned as ‘fitted values’. `zero` Can be an integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,3}. The default value usually increases the chance of successful convergence. Setting `zero = NULL` means they all are functions of the covariates. For more information see `CommonVGAMffArguments`. `llambda, lmu, lsigma` Parameter link functions applied to the first, second and third linear/additive predictors. See `Links` for more choices, and `CommonVGAMffArguments`. `idf.mu` Degrees of freedom for the cubic smoothing spline fit applied to get an initial estimate of mu. See `vsmooth.spline`. `idf.sigma` Degrees of freedom for the cubic smoothing spline fit applied to get an initial estimate of sigma. See `vsmooth.spline`. This argument may be assigned `NULL` to get an initial value using some other algorithm. `ilambda` Initial value for lambda. If necessary, it is recycled to be a vector of length n where n is the number of (independent) observations. `isigma` Optional initial value for sigma. If necessary, it is recycled to be a vector of length n. The default value, `NULL`, means an initial value is computed in the `@initialize` slot of the family function. `tol0` Small positive number, the tolerance for testing if lambda is equal to zero.

## Details

Given a value of the covariate, this function applies a Box-Cox transformation to the response to best obtain normality. The parameters chosen to do this are estimated by maximum likelihood or penalized maximum likelihood.

In more detail, the basic idea behind this method is that, for a fixed value of x, a Box-Cox transformation of the response Y is applied to obtain standard normality. The 3 parameters (lambda, mu, sigma, which start with the letters “L-M-S” respectively, hence its name) are chosen to maximize a penalized log-likelihood (with `vgam`). Then the appropriate quantiles of the standard normal distribution are back-transformed onto the original scale to get the desired quantiles. The three parameters may vary as a smooth function of x.

The Box-Cox power transformation here of the Y, given x, is

Z = [(Y / mu(x))^{lambda(x)} - 1] / (sigma(x) * lambda(x))

for lambda(x) != 0. (The singularity at lambda(x) = 0 is handled by a simple function involving a logarithm.) Then Z is assumed to have a standard normal distribution. The parameter sigma(x) must be positive, therefore VGAM chooses eta(x)^T = (lambda(x), mu(x), log(sigma(x))) by default. The parameter mu is also positive, but while log(mu) is available, it is not the default because mu is more directly interpretable. Given the estimated linear/additive predictors, the 100*alpha percentile can be estimated by inverting the Box-Cox power transformation at the 100*alpha percentile of the standard normal distribution.

Of the three functions, it is often a good idea to allow mu(x) to be more flexible because the functions lambda(x) and sigma(x) usually vary more smoothly with x. This is somewhat reflected in the default value for the argument `zero`, viz. `zero = c(1, 3)`.

## Value

An object of class `"vglmff"` (see `vglmff-class`). The object is used by modelling functions such as `vglm`, `rrvglm` and `vgam`.

## Warning

The computations are not simple, therefore convergence may fail. Set `trace = TRUE` to monitor convergence if it isn't set already. Convergence failure will occur if, e.g., the response is bimodal at any particular value of x. In case of convergence failure, try different starting values. Also, the estimate may diverge quickly near the solution, in which case try prematurely stopping the iterations by assigning `maxits` to be the iteration number corresponding to the highest likelihood value.

One trick is to fit a simple model and use it to provide initial values for a more complex model; see in the examples below.

## Note

The response must be positive because the Box-Cox transformation cannot handle negative values. In theory, the LMS-Yeo-Johnson-normal method can handle both positive and negative values.

In general, the lambda and sigma functions should be more smoother than the mean function. Having `zero = 1`, `zero = 3` or `zero = c(1, 3)` is often a good idea. See the example below.

Thomas W. Yee

## References

Cole, T. J. and Green, P. J. (1992). Smoothing Reference Centile Curves: The LMS Method and Penalized Likelihood. Statistics in Medicine, 11, 1305–1319.

Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach, London: Chapman & Hall.

Yee, T. W. (2004). Quantile regression via vector generalized additive models. Statistics in Medicine, 23, 2295–2315.

`lms.bcg`, `lms.yjn`, `qtplot.lmscreg`, `deplot.lmscreg`, `cdf.lmscreg`, `eCDF`, `extlogF1`, `alaplace1`, `amlnormal`, `denorm`, `CommonVGAMffArguments`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55``` ```## Not run: require("VGAMdata") mysubset <- subset(xs.nz, sex == "M" & ethnicity == "Maori" & study1) mysubset <- transform(mysubset, BMI = weight / height^2) BMIdata <- na.omit(mysubset) BMIdata <- subset(BMIdata, BMI < 80 & age < 65, select = c(age, BMI)) # Delete an outlier summary(BMIdata) fit <- vgam(BMI ~ s(age, df = c(4, 2)), lms.bcn(zero = 1), data = BMIdata) par(mfrow = c(1, 2)) plot(fit, scol = "blue", se = TRUE) # The two centered smooths head(predict(fit)) head(fitted(fit)) head(BMIdata) head(cdf(fit)) # Person 46 is probably overweight, given his age 100 * colMeans(depvar(fit, drop = TRUE) < fitted(fit)) # Empirical proportions # Correct for "vgam" objects but not very elegant: fit@family@linkinv(eta = predict(fit, data.frame(age = 60)), extra = list(percentiles = c(10, 50))) if (FALSE) { # These work for "vglm" objects: fit2 <- vglm(BMI ~ bs(age, df = 4), lms.bcn(zero = 3), data = BMIdata) predict(fit2, percentiles = c(10, 50), newdata = data.frame(age = 60), type = "response") # A specific age head(fitted(fit2, percentiles = c(10, 50))) # Get different percentiles } # Convergence problems? Try this trick: fit0 is a simpler model used for fit1 fit0 <- vgam(BMI ~ s(age, df = 4), lms.bcn(zero = c(1, 3)), data = BMIdata) fit1 <- vgam(BMI ~ s(age, df = c(4, 2)), lms.bcn(zero = 1), data = BMIdata, etastart = predict(fit0)) ## End(Not run) ## Not run: # Quantile plot par(bty = "l", mar = c(5, 4, 4, 3) + 0.1, xpd = TRUE) qtplot(fit, percentiles = c(5, 50, 90, 99), main = "Quantiles", xlim = c(15, 66), las = 1, ylab = "BMI", lwd = 2, lcol = 4) # Density plot ygrid <- seq(15, 43, len = 100) # BMI ranges par(mfrow = c(1, 1), lwd = 2) (aa <- deplot(fit, x0 = 20, y = ygrid, xlab = "BMI", col = "black", main = "Density functions at Age = 20 (black), 42 (red) and 55 (blue)")) aa <- deplot(fit, x0 = 42, y = ygrid, add = TRUE, llty = 2, col = "red") aa <- deplot(fit, x0 = 55, y = ygrid, add = TRUE, llty = 4, col = "blue", Attach = TRUE) aa@post\$deplot # Contains density function values ## End(Not run) ```