# extlogF1: Extended log-F Distribution Family Function In VGAM: Vector Generalized Linear and Additive Models

## Description

Maximum likelihood estimation of the 1-parameter extended log-F distribution.

## Usage

 ```1 2 3 4 5``` ``` extlogF1(tau = c(0.25, 0.5, 0.75), parallel = TRUE ~ 0, seppar = 0, tol0 = -0.001, llocation = "identitylink", ilocation = NULL, lambda.arg = NULL, scale.arg = 1, ishrinkage = 0.95, digt = 4, idf.mu = 3, imethod = 1) ```

## Arguments

 `tau` Numeric, the desired quantiles. A strictly increasing sequence, each value must be in (0, 1). The default values are the three quartiles, matching `lms.bcn`. `parallel` Similar to `alaplace1`, applying to the location parameters. One can try fix up the quantile-crossing problem after fitting the model by calling `fix.crossing`. Use `is.crossing` to see if there is a problem. The default for `parallel` is totally `FALSE`, i.e., `FALSE` for every variable including the intercept. Quantile-crossing can occur when values of `tau` are too close, given the data. How the quantiles are modelled with respect to the covariates also has a big effect, e.g., if they are too flexible or too inflexible then the problem is likely to occur. For example, using `bs` with `df = 10` is likely to create problems. Setting `parallel = TRUE` results in a totally parallel model; all quantiles are parallel and this assumption can be too strong for some data sets. Instead, `fix.crossing` only repairs the quantiles that cross. So one must carefully choose values of `tau` for fitting the original fit. `seppar, tol0` Numeric, both of unit length and nonnegative, the separation and shift parameters. If `seppar` is positive then any crossing quantile is penalized by the difference cubed multiplied by `seppar`. The log-likelihood subtracts the penalty. The shift parameter ensures that the result is strictly noncrossing when `seppar` is large enough; otherwise if `tol0 = 0` and `seppar` is large then the crossing quantiles remain crossed even though the offending amount becomes small but never exactly 0. Informally, `tol0` pushes the adjustment enough so that `is.crossing` should return `FALSE`. If `tol0` is positive then that is the shift in absolute terms. But `tol0` may be assigned a negative value, in which case it is interpreted multiplicatively relative to the midspread of the response; `tol0 <- abs(tol0) * midspread`. Regardless, `fit@extra\$tol0` is the amount in absolute terms. If avoiding the quantile crossing problem is of concern to you, try increasing `seppar` to decrease the amount of crossing. Probably it is best to choose the smallest value of `seppar` so that `is.crossing` returns `FALSE`. Increasing `tol0` relatively or absolutely means the fitted quantiles are allowed to move apart more. However, `tau` must be considered when choosing `tol0`.
 `llocation, ilocation` See `Links` for more choices and `CommonVGAMffArguments` for more information. Choosing `loglink` should usually be good for counts. And choosing `logitlink` should be a reasonable for proportions. However, avoid choosing `tau` values close to the boundary, for example, if p0 is the proportion of 0s then choose p0 << tau. For proportions grouped data is much better than ungrouped data, and the bigger the groups the more the granularity so that the empirical proportion can approximate `tau` more closely. `lambda.arg` Positive tuning parameter which controls the sharpness of the cusp. The limit as it approaches 0 is probably very similar to `dalap`. The default is to choose the value internally. If `scale.arg` increases, then probably `lambda.arg` needs to increase accordingly. If `lambda.arg` is too large then the empirical quantiles may not be very close to `tau`. If `lambda.arg` is too close to 0 then the convergence behaviour will not be good and local solutions found, as well as numerical problems in general. Monitoring convergence is recommended when varying `lambda.arg`. `scale.arg` Positive scale parameter and sometimes called `scale`. The transformation used is `(y - location) / scale`. This function should be okay for response variables having a moderate range (0–100, say), but if very different from this then experimenting with this argument will be a good idea. `ishrinkage, idf.mu, digt` Similar to `alaplace1`. `imethod` Initialization method. Either the value 1, 2, or .... See `CommonVGAMffArguments` for more information.

## Details

This is an experimental family function for quantile regression. Fasiolo et al. (2020) propose an extended log-F distribution (ELF) however this family function only estimates the location parameter. The distribution has a scale parameter which can be inputted (default value is unity). One location parameter is estimated for each `tau` value and these are the estimated quantiles. For quantile regression it is not necessary to estimate the scale parameter since the log-likelihood function is triangle shaped.

The ELF is used as an approximation of the asymmetric Laplace distribution (ALD). The latter cannot be estimated properly using Fisher scoring/IRLS but the ELF holds promise because it has continuous derivatives and therefore fewer problems with the regularity conditions. Because the ELF is fitted to data to obtain an empirical result the convergence behaviour may not be gentle and smooth. Hence there is a function-specific control function called `extlogF1.control` which has something like `stepsize = 0.5` and `maxits = 100`. It has been found that slowing down the rate of convergence produces greater stability during the estimation process. Regardless, convergence should be monitored carefully always.

This function accepts a vector response but not a matrix response.

## Value

An object of class `"vglmff"` (see `vglmff-class`). The object is used by modelling functions such as `vglm` and `vgam`.

## Note

Changes will occur in the future to fine-tune things. In general setting `trace = TRUE` is strongly encouraged because it is needful to check that convergence occurs properly.

If `seppar > 0` then `logLik(fit)` will return the penalized log-likelihood.

Thomas W. Yee

## References

Fasiolo, M., Wood, S. N., Zaffran, M., Nedellec, R. and Goude, Y. (2020). Fast calibrated additive quantile regression. J. Amer. Statist. Assoc., in press.

Yee, T. W. (2020). On quantile regression based on the 1-parameter extended log-F distribution. In preparation.

`dextlogF`, `is.crossing`, `fix.crossing`, `eCDF`, `vglm.control`, `logF`, `alaplace1`, `dalap`, `lms.bcn`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```nn <- 1000; mytau <- c(0.25, 0.75) edata <- data.frame(x2 = sort(rnorm(nn))) edata <- transform(edata, y1 = 1 + x2 + rnorm(nn, sd = exp(-1)), y2 = cos(x2) / (1 + abs(x2)) + rnorm(nn, sd = exp(-1))) fit1 <- vglm(y1 ~ x2, extlogF1(tau = mytau), data = edata) # trace = TRUE fit2 <- vglm(y2 ~ bs(x2, 6), extlogF1(tau = mytau), data = edata) coef(fit1, matrix = TRUE) fit2@extra\$percentile # Empirical percentiles here summary(fit2) c(is.crossing(fit1), is.crossing(fit2)) head(fitted(fit1)) ## Not run: plot(y2 ~ x2, edata, col = "blue") matlines(with(edata, x2), fitted(fit2), col="orange", lty = 1, lwd = 2) ## End(Not run) ```