Description Usage Arguments Details Value References See Also Examples
Estimates a semi-parametric model with the form y = X β + f(z) + u, where f(z) is either fully nonparametric with f(z) = f(z_1) or conditionally parametric with f(z) = z_2 λ (z_1).
1 2 3 |
form |
Model formula. Specifies the base parametric form of the model, y = X β. Any number of variables can be included in X. Format: semip(y~x1+x2..., ...). |
nonpar |
List of variables in z_1. Formats: semip(..., nonpar=~z1a, ...) or semip(..., nonpar=~z1a+zb, ...). Important: note the "~" before the first z1 variable. At most two variables can be included in z_1. |
conpar |
List of variables in z_2. By default, conpar = NULL and f(z) has the fully nonparametric form f(z) = f(z_1); in this case the variables in z_1 are taken from the list provided by nonpar. If a list of variables is provided for nonpar, the conditionally parametric form f(z) = z_2 λ (z_1) is assumed for f(z), and the variables for z_2 are provided by conpar. Any number of variables can be included in conpar. Format: semip(..., conpar=~z2a+z2b+z2c+..., ...). Important: note the "~" before the first z2 variable. |
window1 |
Window size for the LWR or CPAR regressions of y and x on z. Default = .25. |
window2 |
Window size for the LWR or CPAR regression of y-X β on z. Default = .25. |
bandwidth1 |
Bandwidth for the LWR or CPAR regressions of y and x on z. Default: not specified. |
bandwidth2 |
Bandwidth for the LWR or CPAR regression of y-X β on z. Default: not specified. |
kern |
Kernel weighting functions. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss". |
distance |
Options: "Euclid", "Mahal", or "Latlong" for Euclidean, Mahalanobis, or "great-circle" geographic distance. May be abbreviated to the first letter but must be capitalized. Note: semip looks for the first two letters to determine which variable is latitude and which is longitude, so data set must be attached first or specified using the data option; options like data$latitude will not work. Default: Mahal. |
targetfull |
Target options to be passed to the lwr command if conpar = NULL or the cparlwr command if a list of variables is provided for conpar. Options include NULL, "alldata", or the full output of the maketarget command. The appropriate argument will then be passed on to the lwr or cparlwr command. |
print.summary |
If print.summary=T, prints a summary of the regression results for ey on ex, i.e., the parametric portion of the model. Default: print.summary=T. |
data |
A data frame containing the data. Default: use data in the current working directory |
If conpar = NULL, the function implements Robinson's (1988) semi-parametric estimator for the model y = X β + f(z) + u. In this case, the list of variables in z is taken from nonpar and z can have at most two variables. If a list of variables is provided for conpar, the function implements the semi-parametric estimator for the model f(z) = z_2 λ (z_1). In this case, the list of variables in z1 is taken from nonpar and the list of variables in z_2 is taken from conpar. z_1 can have at most two variables. There is no limit on the number of variables in z_2.
The estimation procedure has the following three steps under either specification:
1. Nonparametric regressions of y on z and each X on z using the lwr function when conpar=NULL and the cparlwr function when a list of variables is provided for cparlwr. The window or bandwidth for these regressions is set by window1 or bandwidth1.
2. OLS regression of y-fitted(y) on the k-1 variables in X - fitted(X), omitting the intercept. The coefficients from this regression are the estimated values of β.
3. Nonparametric regression of y-X β on z using the lwr function when conpar=NULL and the cparlwr function when a list of variables is provided for cparlwr. The window or bandwidth for these regressions is set by window2 or bandwidth2.
The stage-two OLS regressions use k degrees of freedom. The stage-three nonparametric regression uses 2*df1-df2 degrees of freedom, where df1 = tr(L) and df2 = tr(L'L) and L is the nxn matrix for the lwr or cparlwr regression L(Y - X β). The estimated variance is sig2 = rss/(n-2*df1+df2), where rss = sum(y-XB-f(z))^2 . The covariance matrix estimate for β is sig2*((X-fitted(X))'(X-fitted(X)))^(-1). The covariance matrix is stored as vmat.
The nonparametric regressions are estimated using either the lwr or cparlwr function. See their descriptions for more information.
xcoef |
The estimated coefficients for the parametric part of the model, β. |
vmat |
The covariance matrix for the estimates of β. |
xbhat |
The predicted values of y for the full data set. |
nphat |
The predicted values of f(z) for the full data set. mean(xbhat)+mean(nphat) will be close but not necessarily identical to mean(y). |
nphat.se |
Standard errors for the predicted values of y for the full data set. |
npfit |
The complete set of lwr or cparlwr results from the nonparametric regression of y - X β on Z. |
df1 |
k + tr(L), where k is the number of explanatory variables in X β (including the constant) and L is the nxn matrix used to calculate the final-stage nonparametric or conditionally parametric regression of Y - X β on Z. df1 is one measure of the degrees of freedom used in estimation. |
df2 |
An alternative measure of the degrees of freedom used in estimation, df2 = k + tr(L'L). |
sig2 |
Estimated residual variance, sig2 = rss/(n-2*df1+df2). |
Cleveland, William S. and Susan J. Devlin, "Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting," Journal of the American Statistical Association 83 (1988), 596-610.
Loader, Clive. Local Regression and Likelihood. New York: Springer, 1999.
McMillen, Daniel P., "Issues in Spatial Data Analysis," Journal of Regional Science 50 (2010), 119-141.
McMillen, Daniel P., "Employment Densities, Spatial Autocorrelation, and Subcenters in Large Metropolitan Areas," Journal of Regional Science 44 (2004), 225-243.
McMillen, Daniel P. and Christian Redfearn, "Estimation and Hypothesis Testing for Nonparametric Hedonic House Price Functions," Journal of Regional Science 50 (2010), 712-733.
Pagan, Adrian and Aman Ullah. Nonparametric Econometrics. New York: Cambridge University Press, 1999.
Robinson, Paul M. 1988. "Root-N-Consistent Semiparametric Regression," Econometrica, 56, 931-954.
cparlwr
lwr
maketarget
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | # Single variable in f(z)
par(ask=TRUE)
n = 1000
x <- runif(n,0,2*pi)
x <- sort(x)
z <- runif(n,0,2*pi)
xsq <- x^2
sinx <- sin(x)
cosx <- cos(x)
sin2x <- sin(2*x)
cos2x <- cos(2*x)
ybase1 <- x - .1*xsq + sinx - cosx - .5*sin2x + .5*cos2x
ybase2 <- -z + .1*(z^2) - sin(z) + cos(z) + .5*sin(2*z) - .5*cos(2*z)
ybase <- ybase1+ybase2
sig = sd(ybase)/2
y <- ybase + rnorm(n,0,sig)
# Correct specification for x; z in f(z)
fit <- semip(y~x+xsq+sinx+cosx+sin2x+cos2x,nonpar=~z,window1=.20,window2=.20)
2*fit$df1 - fit$df2
yvect <- c(min(ybase1,fit$xbhat), max(ybase1, fit$xbhat))
xbhat <- fit$xbhat - mean(fit$xbhat) + mean(ybase1)
plot(x,ybase1,type="l",xlab="x",ylab="ybase1",ylim=yvect, main="Predictions for XB")
lines(x, xbhat, col="red")
predse <- sqrt(fit$sig2 + fit$nphat.se^2)
nphat <- fit$nphat - mean(fit$nphat) + mean(ybase2)
lower <- nphat + qnorm(.025)*fit$nphat.se
upper <- nphat + qnorm(.975)*fit$nphat.se
o <- order(z)
yvect <- c(min(lower), max(upper))
plot(z[o], ybase2[o], type="l", xlab="z", ylab="f(z) ",
main="Predictions for f(z) ", ylim=yvect)
lines(z[o], nphat[o], col="red")
lines(z[o], lower[o], col="red", lty="dashed")
lines(z[o], upper[o], col="red", lty="dashed")
## Not run:
# Chicago Housing Sales
data(matchdata)
match05 <- data.frame(matchdata[matchdata$year==2005,])
match05$age <- 2005-match05$yrbuilt
tfit1 <- maketarget(~dcbd,window=.3,data=match05)
tfit2 <- maketarget(~longitude+latitude,window=.5,data=match05)
# nonparametric control for dcbd
fit <- semip(lnprice~lnland+lnbldg+rooms+bedrooms+bathrooms+centair+fireplace+brick+
garage1+garage2+ age+rr, nonpar=~dcbd, data=match05,targetfull=tfit1)
# nonparametric controls for longitude and latitude
fit <- semip(lnprice~lnland+lnbldg+rooms+bedrooms+bathrooms+centair+fireplace+brick+
garage1+garage2+ age+rr+dcbd, nonpar=~longitude+latitude, data=match05, targetfull=tfit2,
distance="Latlong")
# Conditionally parametric model: y = XB + dcbd*lambda(longitude,latitude) + u
fit <- semip(lnprice~lnland+lnbldg+rooms+bedrooms+bathrooms+centair+fireplace+
brick+garage1+garage2+age+rr, nonpar=~longitude+latitude, conpar=~dcbd,
data=match05, distance="Latlong",targetfull=tfit1)
# Conditional parametric model: y = XB + Z*lambda(longitude,latitude) + u
# Z = (dcbd,lnland,lnbldg,age)
fit <- semip(lnprice~rooms+bedrooms+bathrooms+centair+fireplace+brick+
garage1+garage2+rr, nonpar=~longitude+latitude, conpar=~dcbd+lnland+lnbldg+age,
data=match05, distance="Latlong",targetfull=tfit2)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.