spprobit: Linearized GMM spatial probit
In McSpatial: Nonparametric spatial data analysis

Description Usage Arguments Details Value References See Also Examples

Implements the Klier-McMillen (2008) linearized GMM probit model for a 0-1 dependent variable and an underlying latent variable of the form Y^* = ρ WY^* + X β +u

 
spprobit(form,inst=NULL,winst=NULL,wmat=NULL,shpfile=NULL,blockid=NULL,
         minblock=NULL,maxblock=NULL,data=NULL,silent=FALSE,minp=NULL)

`form`	Model formula
`inst`	List of instruments not to be pre-multiplied by W. Entered as inst=~w1+w2 ... Default: inst=NULL. See details for more information.
`winst`	List of instruments to be pre-multiplied by W before use. Entered as winst=~w1+w2 ... Default: inst=NULL. See details for more information.
`wmat`	Directly enter wmat rather than creating it from a shape file. Default: not specified.
`shpfile`	Shape file to be used for creating the W matrix. The order of the observations in wmat must be the same as the order of observations in data.
`blockid`	A variable identifying groups used to specify a block diagonal structure for the W matrix, e.g., blockid=state or blockid=region. Calculates a separate W matrix for each block. The shpfile option must be specified; wmat is ignored.
`minblock`	Groups with fewer than minblock observations are omitted. Default is the number of explanatory variables, including WXB. This option helps to avoid singularity since the instrumental variables are constructed by a separate regression for each block.
`maxblock`	Groups with more than maxblock observations are omitted. Unlimited by default. This option may be useful for very large data sets as full nblock x nblock matrices must be constructed for each block, where nblock is the number of observations in the block.
`data`	A data frame containing the data. Default: use data in the current working directory
`silent`	If silent=T, no output is printed
`minp`	Specifies a limit for the estimated probability. Any estimated probability lower than minp will be set to minp and any probability higher than 1-minp will be set to 1-minp. By default, the estimated probabilities are bounded by 0 and 1.

The linearized model is a three-step estimation procedure. Let y be the indicator value: y = 1 when y* > 0 and y = 0 when y* < 0. The first stage is standard probit of y on X. The probability estimates from this regression are p = Φ(X β) and the generalized error is e = (y-p)*φ(X β)/(p(1-p)). The second/third stage of the procedure is standard 2SLS estimation of u = e + gX β on gX and gWX β using Z as instruments, where g is the gradient vector, -de/d β. The covariance matrix (equation 3 in Klier-McMillen, 2008) is estimated using the car package. The final estimates minimize e'Z(Z'Z)^{-1}Z'e with e linearized around β-probit and p = 0.

spprobit provides flexibility in specifying the list of instruments. By default, the instrument list includes X and WX, where X is the original explanatory variable list and W is the spatial weight matrix. Either wmat or shpfile must be specified if inst and winst are set to their default values.

It is also possible to directly specify the full instrument list or to include only a subset of the X variables in the list that is to be pre-multiplied by W. Let list1 and list2 be user-provided lists of the form list=~z1+z2. The combinations of defaults (NULL) and lists for inst produce the following results for Z:

1. inst = NULL, winst = NULL, and either shpfile or wmat specified: Z = (X, WX)

2. inst = list1, winst = NULL, and either shpfile or wmat specified: Z = list1

3. inst = NULL, winst = list2, and either shpfile or wmat specified: Z = (X, W*list2)

4. inst = list1, winst = list2, and either shpfile or wmat specified: Z = (list1, W*list2)

5. inst = list1, winst = list2, and both shpfile and wmat NOT specified: Z = (list1, list2)

Note that when inst=list1 and winst=NULL it is up to the user to specify at least one variable in list1 that is not also included in X.

The difference between cases (4) and (5) is that the list2 variables are left unaltered in case (5) rather than being pre-multiplied by W. The case (5) option makes it possible to avoid manipulations of large matrices from within spprobit. The idea is that W*list2 should be calculated prior to running spprobit, with the variables implied by W*list2 being provided directly to spprobit using the winst option.

`coef`	Coefficient estimates.
`se`	Standard error estimates.
`u`	The generalized error term.
`gmat`	The matrix of gradient terms, G.

Klier, Thomas and Daniel P. McMillen, "Clustering of Auto Supplier Plants in the United States: Generalized Method of Moments Spatial Logit for Large Samples," Journal of Business and Economic Statistics 26 (2008), 460-471.

cparlogit

cparprobit

cparmlogit

gmmlogit

gmmprobit

splogit

spprobitml

set.seed(9947)
cmap <- readShapePoly(system.file("maps/CookCensusTracts.shp",
  package="McSpatial"))
cmap <- cmap[cmap$CHICAGO==1&cmap$CAREA!="O'Hare",]
wmat <- makew(cmap)$wmat
n = nrow(wmat)
rho = .4
x <- runif(n,0,10)
ystar <- as.numeric(solve(diag(n) - rho*wmat)%*%(x + rnorm(n,0,2)))
y <- ystar>quantile(ystar,.4)
fit <- spprobit(y~x,  wmat=wmat)

Loading required package: lattice
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Loading required package: maptools
Loading required package: sp
Checking rgeos availability: TRUE
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: RANN
Warning message:
use rgdal::readOGR or sf::st_read 
Loading required package: Matrix

Call:
glm(formula = form, family = binomial(link = "probit"), data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9120  -0.5265   0.1089   0.5291   2.6652  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.99691    0.13396  -14.91   <2e-16 ***
x            0.49558    0.02884   17.19   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1159.41  on 860  degrees of freedom
Residual deviance:  628.63  on 859  degrees of freedom
AIC: 632.63

Number of Fisher Scoring iterations: 6

STANDARD PROBIT ESTIMATES 
LINEARIZED GMM PROBIT ESTIMATES 
            Estimate Std. Error   z-value Pr(>|z|)
(Intercept) -2.27695    0.12206 -18.65500    0e+00
x            0.50788    0.02434  20.86551    0e+00
WXB          0.44670    0.09812   4.55247    1e-05
Number of observations =  861