qreglwr: Locally Weighted Quantile Regression
In McSpatial: Nonparametric spatial data analysis

Description Usage Arguments Details Value References See Also Examples

Estimates a model of the form y = f(x) using locally weighted quantile regression for a set of user-provided quantiles. x can include either one or two variables. Returns estimated values, derivatives, and standard errors for both f(x) and df(x)/dx.

1
2
3

 
qreglwr(form,taumat=c(.10,.25,.50,.75,.90), window=.25,bandwidth=0,
  kern="tcub", distance="Mahal",target=NULL,data=NULL)

`form`	Model formula
`taumat`	Vector of target quantiles. Default: taumat=c(.10,.25,.50,.75,.90)

`window`	Window size. Default: 0.25.
`bandwidth`	Bandwidth. Default: not used.
`kern`	Kernel weighting functions. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss".
`distance`	Options: "Euclid", "Mahal", or "Latlong" for Euclidean, Mahalanobis, or "great-circle" geographic distance. May be abbreviated to the first letter but must be capitalized. Note: qreglwr looks for the first two letters to determine which variable is latitude and which is longitude, so the data set must be attached first or specified using the data option; options like data$latitude will not work. Default: Mahal.
`target`	If target = NULL, uses the maketarget command to form targets using the values specified for window, bandwidth, and kern. If target="alldata", each observation is used as a target value for x. A set of target values can be supplied directly.
`data`	A data frame containing the data. Default: use data in the current working directory.

Serves as an interface to the quantreg package. Uses a kernel weight function in quantreg's "weight" option to estimate quantile regressions at a series of target values of x. x may include either one or two variables. The target values are found using locfit's adaptive decision tree approach. The predictions are then interpolated to the full set of x values using the smooth12 command. If alldata=T, the procedure is applied to every value of x rather than a set of target points.

The weights at a target value x0 are given by K(ψ/h), where ψ is a measure of the distance between x and x0 and h is the bandwidth or window. When x includes a single variable, ψ = x-x0. When x includes two variables, the method for specifying ψ depends on the distance option. If distance="Mahal" or distance="Euclid", the ith row of the matrix X = (x1, x2) is transformed such that x_i = sqrt(x_i * V * t(x_i)). Under the "Mahal" option, V is the inverse of cov(X). Under the "Euclid" option, V is the inverse of diag(cov(X)). By reducing x from two dimensions to one, this transformation leads again to the simple kernel weighting function K((x- x0 )/(sd(x)*h)). h is specified by the bandwidth or window options.

The great circle formula is used to define K when distance = "Latlong"; in this case, the explanatory variable list must be specified as ~latitude+longitude (or ~lo+la or ~lat+long, etc), with the longitude and latitude variables expressed in degrees (e.g., -87.627800 and 41.881998 for one observation of longitude and latitude, respectively). The order in which latitude and longitude are listed does not matter and the function only looks for the first two letters to determine which variable is latitude and which is longitude. It is important to note that the great circle distance measure is left in miles rather than being standardized. Thus, the window option should be specified when distance = "Latlong" or the bandwidth should be adjusted to account for the scale. The kernel weighting function becomes K(distance/h) under the "Latlong" option.

Since qreglwr estimates weighted quantile regressions of the dependent variable, y, on x-x0, the intercept provides an estimate of y at x0 and β provides an estimate of the slope of the quantile line, dy/dx, at x0. quantreg's standard error for the intercept is stored in ytarget.se (target points) and yhat.se (all observations). The standard errors for the slopes are stored in dtarget1.se, dtarget2.se, dhat1.se, and dhat2.se.

When alldata=T, each data point in turn is used as a target point, x0. Fixed bandwidths may prove too small if there are regions where x is sparse. A nearest neighbor approach is generally preferable (e.g, window=.50). Estimation can be very slow when target = "alldata". The maketarget command can be used to identify target points. The smooth12 command is then used to interpolate the coefficient estimates and standard errors.

Available kernel weighting functions include the following:

Kernel	Call abbreviation	Kernel function K(z)
Rectangular	``rect''	1/2 I(\|z\|<1)*
Triangular	``tria''	(1-\|z\|) I(\|z\|<1)*
Epanechnikov	``epan''	3/4 (1-z^2)I(\|z\| < 1)
Bi-Square	``bisq''	15/16 (1-z^2)^2 * I(\|z\| < 1)*
Tri-Cube	``tcub''	70/81 (1-\|z\|^3)^3 * I(\|z\| < 1)*
Tri-Weight	``trwt''	35/32 (1-z^2)^3 * I(\|z\| < 1)*
Gaussian	``gauss''	2pi^{-.5} exp(-z^2/2)

`target`	The target points for the original estimation of the function.
`ytarget`	The matrix of predicted values of y at the target points, by quantile. Rows represent targets; columns are quantiles.
`dtarget1`	The matrix of estimated derivatives dy/dx1 at the target points, by quantile. Rows represent targets; columns are quantiles.
`dtarget2`	The matrix of estimated derivatives dy/dx2 at the target points, by quantile. Rows represent targets; columns are quantiles. All zeros if the model has only one explanatory variable.
`ytarget.se`	The matrix of standard errors for the predicted values of y at the target points, by quantile. Rows represent targets; columns are quantiles.
`dtarget1.se`	The matrix of standard errors for the derivatives dy/dx1 at the target points, by quantile. Rows represent targets; columns are quantiles.
`dtarget2.se`	The matrix of standard errors for the derivatives dy/dx2 at the target points, by quantile. Rows represent targets; columns are quantiles. All zeros if the model has only one explanatory variable.
`yhat`	The matrix of predicted values of y for the full data set, by quantile. Dimension = n x length(taumat).
`dhat1`	The matrix of estimated derivatives dy/dx1 for the full data set, by quantile. Dimension = n x length(taumat).
`dhat2`	The matrix of estimated derivatives dy/dx2 for the full data set, by quantile. Dimension = n x length(taumat). All zeros if the model has only one explanatory variable.
`yhat.se`	The matrix of standard errors for the predicted values of y for the full data set, by quantile. Dimension = n x length(taumat).
`dhat1.se`	The matrix of standard errors for the estimated derivatives dy/dx1 for the full data set, by quantile. Dimension = n x length(taumat).
`dhat2.se`	The matrix of standard errors for the estimated derivatives dy/dx2 for the full data set, by quantile. Dimension = n x length(taumat). All zeros if the model has only one explanatory variable.

Cleveland, William S. and Susan J. Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting," Journal of the American Statistical Association 83 (1988), 596-610.

Koenker, Roger. Quantile Regression. New York: Cambridge University Press, 2005. Chapter 7 and Appendix A.9.

Loader, Clive. Local Regression and Likelihood. New York: Springer, 1999.

lwr

data(cookdata)
cookdata <- cookdata[cookdata$CHICAGO==1,]
cookdata$obs <- seq(1:nrow(cookdata))
cookdata <- cookdata[cookdata$CHICAGO==1&cookdata$POPULATION>0,]
par(ask=TRUE)

# lndens = f(dcbd)
fit <- lwr(LNDENS~DCBD,window=.20,data=cookdata)
fit1 <- qreglwr(LNDENS~DCBD,taumat=c(.10,.50,.90),window=.30,kern="rect",data=cookdata)
o <- order(cookdata$DCBD)
ymin = min(fit1$yhat)
ymax = max(fit1$yhat)
plot(cookdata$DCBD[o], fit$yhat[o], type="l", ylim=c(ymin,ymax),
  xlab="Distance to CBD", ylab="Log of Population Density")
lines(cookdata$DCBD[o], fit1$yhat[o,1], col="red", lty="dashed")
lines(cookdata$DCBD[o], fit1$yhat[o,2], col="red")
lines(cookdata$DCBD[o], fit1$yhat[o,3], col="red", lty="dashed")
legend("topright", c("LWR", "tau = 50", "tau = 10, 90"), col=c("black","red", "red"), 
  lwd=1, lty=c("solid","solid","dashed"))

## Not run: 
library(RColorBrewer)
cmap <- readShapePoly(system.file("maps/CookCensusTracts.shp",
  package="McSpatial"))
cmap <- cmap[cmap$CHICAGO==1,]
# lndens = f(longitude, latitude), weights are function of straight-line distance
fit <- qreglwr(LNDENS~LONGITUDE+LATITUDE,taumat=c(.10,.50,.90),window=.20,data=cookdata)
cmap$lwr10[cookdata$obs] <- fit$yhat[,1]
cmap$lwr50[cookdata$obs] <- fit$yhat[,2]
cmap$lwr90[cookdata$obs] <- fit$yhat[,3]
cmap$lwr1090[cookdata$obs] <- fit$yhat[,3] - fit$yhat[,1]
brks <- seq(min(cmap$lwr10,na.rm=TRUE),max(cmap$lwr10,na.rm=TRUE),length=9)
spplot(cmap,"lwr10",at=brks,col.regions=rev(brewer.pal(8,"RdBu")),
   main="Log Density Estimates, tau = .10")
brks <- seq(min(cmap$lwr50,na.rm=TRUE),max(cmap$lwr50,na.rm=TRUE),length=9)
spplot(cmap,"lwr50",at=brks,col.regions=rev(brewer.pal(8,"RdBu")),
   main="Log Density Estimates, tau = .50")
brks <- seq(min(cmap$lwr90,na.rm=TRUE),max(cmap$lwr90,na.rm=TRUE),length=9)
spplot(cmap,"lwr90",at=brks,col.regions=rev(brewer.pal(8,"RdBu")),
   main="Log Density Estimates, tau = .90")
brks <- seq(min(cmap$lwr1090,na.rm=TRUE),max(cmap$lwr1090,na.rm=TRUE),length=9)
spplot(cmap,"lwr1090",at=brks,col.regions=rev(brewer.pal(8,"RdBu")),
   main="Difference in Log Density, tau = .90 - .10")

## End(Not run)