fk_regression: Fast univariate kernel regression

View source: R/fk_regression.R

fk_regressionR Documentation

Fast univariate kernel regression

Description

Uses recursive formulation for kernel sums as described in Hofmeyr (2021) for exact evaluation of kernel regression estimate. Both local linear and Nadaraya-Watson (Watson, 1964; Nadaraya, 1964) estimators are available. Binning approximation also available for faster computation if needed. Default is exact evaluation on a grid, but evaluation at sample points is also possible.

Usage

fk_regression(x, y, h = 'amise', beta = NULL, from = NULL,
    to = NULL, ngrid = 1000, nbin = NULL, type = 'loc-lin')

Arguments

x

vector of covariates.

y

vector of response values.

h

(optional) bandwidth to be used in estimate. Can be either positive numeric, or one of "amise" for a rough estimate of the asymptotic mean integrated square error minimiser, or "cv" for leave-one-out cross validation error minimiser based on squared error. Default is "amise". Cross validation not available for binning approximation.

beta

(optional) numeric vector of kernel coefficients. See Hofmeyr (2019) for details. The default is the smooth order one kernel described in the paper.

from

(optional) lower end of evaluation interval if evaluation on a grid is desired. Default is min(x)

to

(optional) upper end of evaluation interval if evaluation on a grid is desired. Default is max(x)

ngrid

(optional) integer number of grid points for evaluation. Default is 1000.

nbin

(optional) integer number of bins if binning estimator is to be used. The default is to compute the exact density on a grid of 1000 points.

type

(optional) one of "loc-lin" and "NW" if local linear or Nadaraya-Watson is desired, respectively. Default is local linear estimator.

Value

A named list with fields

$x

the vector of points at which the regression function is estimated.

$y

the estimated function values.

$h

the value of the bandwidth.

References

Hofmeyr, D.P. (2021) "Fast exact evaluation of univariate kernel sums", IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2), 447-458.

Nadaraya, E.A. (1964) "On estimating regression." Theory of Probability and Its Applications,9(1), 141–142.

Watson, G.S. (1964) "Smooth regression analysis." Sankhya: The Indian Journal of Statistics, Series A, pp. 359–372.

Examples

op <- par(no.readonly = TRUE)

set.seed(1)

n <- 2000
x <- rbeta(n, 2, 2) * 10
y <- 3 * sin(2 * x) + 10 * (x > 5) * (x - 5)
y <- y + rnorm(n) + (rgamma(n, 2, 2) - 1) * (abs(x - 5) + 3)

xs <- seq(min(x), max(x), length = 1000)
ftrue <- 3 * sin(2 * xs) + 10 * (xs > 5) * (xs - 5)

fhat_loc_lin <- fk_regression(x, y)
fhat_NW <- fk_regression(x, y, type = 'NW')

par(mfrow = c(2, 2))
plot(fhat_loc_lin, main = 'Local linear estimator with amise bandwidth')

plot(fhat_NW, main = 'NW estimator with amise bandwidth')


fhat_loc_lin <- fk_regression(x, y, h = 'cv')
fhat_NW <- fk_regression(x, y, type = 'NW', h = 'cv')

plot(fhat_loc_lin, main = 'Local linear estimator with cv bandwidth')

plot(fhat_NW, main = 'NW estimator with cv bandwidth')

par(op)

FKSUM documentation built on April 15, 2023, 5:06 p.m.