fqr: Fast Quantile Regression
In be-green/fqr: Fast Quantile Regression

Description Usage Arguments Details Examples

Fast Quantile Regression

fit_fqr(
  X,
  y,
  tau,
  se = T,
  init_beta = rep(0, ncol(X)),
  smoothing_window = .Machine$double.eps,
  maxiter = 100,
  beta_tol = 1e-05,
  check_tol = 1e-05,
  intercept = 1,
  nsubsamples = 100,
  nwarmup_samples = 1000,
  warm_start = 1
)

fqr(
  formula,
  data,
  tau = 0.5,
  se = T,
  smoothing_window = .Machine$double.eps,
  maxiter = 1000,
  beta_tol = 1e-05,
  check_tol = 1e-05,
  nwarmup_samples = pmin(pmax(100, 0.1 * nrow(data)), nrow(data)),
  warm_start = 1,
  nsubsamples = 100
)

`X`	design matrix
`y`	outcome variable
`tau`	vector of target quantile(s)
`se`	whether to calculate standard errors
`init_beta`	initial betas for gradient descent (optional, default is random normal initial values)
`smoothing_window`	neighborhood around 0 to smooth with tilted least-squares loss function
`maxiter`	maximum number of allowed iterations for gradient descent
`beta_tol`	stopping criterion based on largest value of the gradient
`check_tol`	stopping criterion based on the change in the value of the check function between iterations
`intercept`	what column the intercept is, defaults to 1, 0 to indicate no intercept
`nsubsamples`	number of subsamples to use when calculating standard errors
`nwarmup_samples`	number of samples to use for warmup regression
`warm_start`	whether to run initial warmup regression or just default inits for full data gradient descent
`formula`	Regression formula
`data`	data to use when fitting regression

This package performs quantile regression by approximating the check loss function with a least-squares loss function in a small neighbor around 0. Since the only point where the check function is not differentiable is at 0, this allows for first-order gradient descent methods to work.

This package uses "accelerated" gradient descent, which moves the future guess at the coefficients not only by a step size * the gradient, but also based on the prior momentum of the changes in the coefficient which leads to faster convergence. Gradient-based methods work at scale (both in terms of observations and dimension), and are much faster than the interior point algorithms in the quantreg package for large problems. Still, they are sometimes less exact for small problems.

The algorithm employs two early stopping rules: the 'check_tol' argument stops based on the scaled change in the check function loss. The 'beta_tol' argument stops based on the largest value of the gradient vector.

Before using the full dataset, the optimizer "warms up" on a random subset of the dataset. nwarmup_samples controls the size of that. 'warm_start' is an integer which controls whether that happens at all (it is _strongly_ recommended).

fit <- fqr(area ~ peri, data = rock, tau = c(0.25, 0.5, 0.75))

# print coefficients & SEs
print(fit)

# grab coefficient vector
coef(fit)

# predict values
predict(fit)

# predict values with new data
predict(fit, newdata = head(rock))