Nothing
#' Cross-Validation for bandwidth selection of local linear quantile regression
#'
#' \code{llqrcv} estimates the bandwidth necessary for the local linear fit of
#' the \eqn{\tau}th conditional quantile of \code{y} given \code{x}. The
#' estimation is performed using the Cross-Validation criterion.
#'
#' A grid of bandwidth values is created and the local linear fit is estimated
#' using all the data points except for one point, which is used to make the
#' prediction. This procedure is repeated \code{n} times, where \code{n} is the
#' number of observations. Then, the bandwidth is selected as the one with the
#' smallest average error.
#'
#' When the dimension of the predictor variable is large compared with the sample
#' size, local linear fitting meets the 'curse of dimensionality' problem. In
#' situations like that, the grid bandwidth values might be too small and cause
#' the function to fail. For these cases, we advice the user to directly use the
#' \code{llqr} function of the package and specify a bandwidth in the function.
#'
#' @param x A design matrix (n x p). The rows represent observations and the
#' columns represent predictor variables.
#' @param y A vector of the response variable.
#' @param tau A quantile level, a number strictly between 0 and 1.
#' @return \code{llqrcv} returns the optimal bandwidth selected using
#' Cross-Validation criterion for the local linear fit of the \eqn{\tau}th
#' conditional quantile of \code{y} given \code{x}.
#' @include llqr.R
#' @examples
#' set.seed(1234)
#' n <- 100
#' x <- rnorm(n)
#' error <- rnorm(n)
#' y <- x^2 + error
#' tau <- 0.5
#' llqrcv(x, y, tau = tau)
#' @export
llqrcv <- function(x, y, tau=0.5) {
x <- as.matrix(x)
y <- as.matrix(y)
# compatibility checks
# checks if y is univariate
if (dim(y)[2] > 1) {
stop(paste("y needs to be a univariate response. y is a", dim(y)[2], "-dimensional response in this case."))
}
# checks if the number of observations for x and y agree
if (length(y) != dim(x)[1]) {
stop(paste("number of observations of y (", length(y), ") not equal to the number of rows of x (", dim(x)[1], ").", sep = ""))
}
# checks for NAs
if (sum(is.na(y)) > 0 | sum(is.na(x)) > 0) {
stop(paste("Data include missing values."))
}
# checks if n>p
if (length(y) <= dim(x)[2]) {
stop(paste("number of observations of y (", length(y), ") should be greater than the number of columns of x (", dim(x)[2], ").", sep = ""))
}
n <- length(y)
# create grid values for bandwidth
h_lower <- min(n^ (- 1 / 5), min(2, sd(y)) * n^ (- 1 / 5))
h_upper <- max(n^ (- 1 / 5), min(2, sd(y)) * n^ (- 1 / 5))
h <- seq(h_lower, h_upper, by = 0.1)
qhat <- matrix(0, n, length(h))
cv <- list()
for (i in 1:length(h)) {
for (j in 1:n) {
qhat[j, i] <- llqr(x = x[-j, ], y = y[-j], tau = tau, h = h[i],
x0 = x[j, ])$ll_est
}
cv[i] <- mean((y - qhat[, i]) * (tau - (y < qhat[, i])))
}
min(h[which(unlist(cv) == min(unlist(cv)))])
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.