ksim: Estimates K-density functions with local or global confidence...
In McSpatial: Nonparametric spatial data analysis

Description Usage Arguments Details Value References See Also Examples

Calculates K-density functions for lat-long coordinates. Calculates the distance, d, between every pair of observations and plots the density, f(d_0) at a set of target distances, d_0. Also uses the Duranton-Overman bootstrap method to construct local or global confidence intervals for the density of distances between pairs of observations if the same number of points were allocated across another set of possible locations.

1
2
3

ksim(long1,lat1,long2,lat2,kilometer=FALSE,noplot=FALSE,
  dmin=0,dmax=0,dlength=512,h=0,kern="gaussian",
  nsim=2000,nsamp=0,pval=.05,cglobal=FALSE)

`long1`	Longitude variable, in degrees.
`lat1`	Latitude variable, in degrees.
`long2`	Longitude variable for counter-factual locations, in degrees.
`lat2`	Latitude variable for counter-factual locations, in degrees.
`kilometer`	If kilometer = T, measurements are in kilometers rather than miles. Default: kilometer = F.
`noplot`	If noplot = T, does not show the K-density graph.
`dmin`	Minimum value for target distances. Default: dmin=0.
`dmax`	Maximum value for target distances. Default: dmin = max(distance).
`dlength`	Number of target values for density calculations. Default: dlength = 512.
`h`	Bandwidth. Default: (.9(quantile(distance,.75)-quantile(distance,.25))/1.34)(n^(-.20)), where n = 2*length(dvect).
`kern`	Kernel. Default: "gaussian". Other options from the density function are also available, including "epanechnikov", "rectangular", "triangular", "biweight", "cosine", "optcosine".
`nsim`	Number of simulations for constructing the confidence intervals. Default: nsim=2000.
`nsamp`	If nsamp>0, uses a random sample of lat-long pairs for calculations rather than full data set. Takes random draws from long1, lat1 pairs; the long2, lat2 remain as specified by the user. Can be much faster for large samples. Default: use full sample.
`pval`	Significance level for confidence intervals. Default: pval = .05, i.e., a 95 percent confidence interval.
`cglobal`	If cglobal=T, calculates global confidence intervals. Default: cglobal=F, calculates local confidence intervals.

Let n be the number of observations in the long1, lat1 data set. ksim draws n observations from the long2, lat2 pairs and re-calculates the K-density function using the new, simulated data set. The process is repeated nsim times, producing nsim bootstrap K-density functions. The local confidence interval treats each target distance as a separate observation, and calculates the confidence interval at each distance using the standard bootstrap percentile method. In contrast, the global confidence interval treats the full K-density function as an observations and shifts the interval outward at each data point until 95 percent of the density functions lie within the interval. Large values of nsim - perhaps greater than the default of 2000 - are necessary to get accurate global confidence intervals.

The ksim function is intended for cases where the counterfactual data set has more observations than the base, i.e., n2>n1. In this case, observations are drawn without replacement from the counterfactual data set. When the counterfactual data set has fewer observations than the base (i.e., n2<=n1), n1 observations are drawn with replacement from the counterfactual data set.

Duranton and Overman (2005) proposed this method for constructing global confidence intervals for K-density functions. See Klier and McMillen (2008) for a description of the procedures used here. See the description of the kdensity function for more details on the estimation procedure of the K-density functions.

`distance`	The vector of target distances
`dhat`	The vector of densities for the target distances
`h`	The bandwidth
`local.lo`	The local confidence interval at each target distance, if calculated.
`local.hi`	The local confidence interval at each target distance, if calculated.
`global.lo`	The global confidence interval at each target distance, if calculated.
`global.hi`	The global confidence interval at each target distance, if calculated.

Duranton, Gilles and Henry G. Overman, "Testing for Localisation using Microgeographic Data", Review of Economic Studies 72 (2005), 1077-1106.

Klier Thomas and Daniel P. McMillen, "Evolving Agglomeration in the U.S. Auto Industry," Journal of Regional Science 48 (2008), 245-267.

Silverman, A. W., Density Estimation for Statistics and Data Analysis, Chapman and Hall, New York (1986).

kdensity

data(matchdata)
lmat <- cbind(matchdata$longitude,matchdata$latitude)
lmat1 <- lmat[matchdata$carea=="Rogers Park"|matchdata$carea=="Albany Park",]
lmat2 <- lmat[matchdata$carea!="Rogers Park"&matchdata$carea!="Albany Park",]
# smaller samples to reduce time for examples
set.seed(4941)
obs <- sample(seq(1,nrow(lmat1)),200)
lmat1 <- lmat1[obs,]
obs <- sample(seq(1,nrow(lmat2)),400)
lmat2 <- lmat2[obs,]

fit <- ksim(lmat1[,1],lmat1[,2],lmat2[,1],lmat2[,2],dmax=9,nsim=100,
  nsamp=100,noplot=TRUE,cglobal=FALSE)
ymin = min(fit$dhat,fit$local.lo)
ymax = max(fit$dhat,fit$local.hi)
plot(fit$distance, fit$dhat, xlab="Distance", ylab="Density", ylim = c(ymin,ymax), 
  type="l", main="Albany Park & Rogers Park v. Other Areas")
lines(fit$distance, fit$local.lo, col="red")
lines(fit$distance, fit$local.hi, col="red")