proxistat: Proximity Statistic for Each Location and Nearby Points

View source: R/proxistat.R

proxistatR Documentation

Proximity Statistic for Each Location and Nearby Points

Description

Calculate proximity statistic for each location, quantifying number of and proximities of nearby points. proxistat returns a proximity statistic (score) for each location (e.g., Census block).

Usage

proxistat(
  frompoints,
  topoints,
  area = 0,
  radius = 5,
  units = "km",
  decay = "1/d",
  wts,
  return.count = FALSE,
  return.nearest = FALSE,
  FIPS,
  pop,
  testing = FALSE,
  dfunc = "sp"
)

Arguments

frompoints

Locations of internal points of Census subunits. A matrix or data.frame with two cols, 'lat' and 'lon' with datum=WGS84 assumed. Decimal degrees. Required.

topoints

Locations of nearby points of interest, proximity to which is the basis of each Census unit's score. A matrix or data.frame with two cols, 'lat' and 'lon' with datum=WGS84 assumed. Decimal degrees. Required.

area

A number or vector of numbers giving size of each spatial unit with FIPS.pop, in square miles or square kilometers depending on the units parameter. Optional. Default is 0, in which case no adjustment is made for small or even zero distance, which can cause unrealistically large or even infinite/undefined scores. For zero distance if area=0, Inf will be returned for the score.

radius

NOTE: This default is not the same as the default in get.distances! Optional, a number giving distance defining nearby, i.e. the search radius, in km or miles depending on the codeunits parameter. Default is 5 (km if units='km'). Max is 5200 miles (roughly the distance from Hawaii to Maine).

units

A string that is 'miles' or 'km' for kilometers (default is 'km'), specifying units for distances returned and for radius input.

decay

A string specifying type of function to use when weighting by distance. The Default is '1/d' For '1/d' decay weighting (default), score is count of points within radius, divided by harmonic mean of distances (when count>0). Decay weighting also can be '1/d^2' or '1/1' to represent decay by inverse of squared distance, or no decay (equal weighting for all points).

wts

Optional vector of numbers same length as number of topoints. If wts is specified, the score for each of the frompoints will be the weighted sum of influences of topoints. For example, if decay='1/d' (default), proximity score = sum(wts/d) for all the topoints nearby. If decay='1/1', proximity score = sum(wts) for all the topoints nearby.

return.count

Optional, logical, defaults to FALSE, specifies if results returned should include a column with the count of topoints that were within radius, for each of the frompoints

return.nearest

Optional, logical, defaults to FALSE, specifies if results returned should include a column with the distance to the nearest single of the topoints, for each of the frompoints

FIPS

NOT USED CURRENTLY - COULD BE USED LATER TO AGGREGATE (rollup) TO BLOCK GROUPS FROM BLOCKS, FOR EXAMPLE. A vector of strings designating places that will be assigned scores where each is the Census FIPS code or other ID. Optional. Might want to have this be a factor not string to be faster, or ensure it is indexed on fips, or have separate FIPS.BG passed to this function.

pop

NOT USED CURRENTLY - COULD BE USED LATER TO AGGREGATE (rollup) TO BLOCK GROUPS FROM BLOCKS, FOR EXAMPLE. A number or vector of numbers giving population count of each spatial unit. Default is 1, which would give the unweighted average.

testing

Logical during work in progress

dfunc

Optional character element hf or slc to specify distance function Haversine or spherical law of cosines. If sp (default, fastest), it uses the sp package to find distances more accurately and more quickly.

Details

This uses get.distances with return.crosstab=TRUE. This function returns a vector of proximity scores, one for each location such as a Census block. For example, the proximity score may be used to represent how many hazardous waste sites are near any given neighborhood and how close they are. A proximity score quantifies the proximity and count of nearby points using a specified formula.
Proximity Score = distance-weighted count of points nearby (within search radius) (or with another optional weight for each topoint)
(or weighted distance to nearest single point if there are none within the radius).
This is the sum of 1/d or 1/d^2 or 1/1, depending on the decay weighting, (or with another optional weight for each topoint instead of the number 1) where d is the distance from census unit's internal point to user-defined point. The default proximity score, using 1/d, is the count of nearby points divided by the harmonic mean of their distances (n/harmean), (but adjusted when distance is very small, and using the nearest single one if none are nearby). This is the same as the sum of inverse distances. Note that Inf is returned as score if 1/d used and distance is ever 0 like where to and from are same point. Does not do a 1/d score for only the ones at nonzero distance. The harmonic mean distance (see harmean) is the inverse of the arithmetic mean of the inverses, or n / (sum of inverses).

Nearby is defined as a user-specified parameter, so only points within the specified distance are counted, except if none are nearby, the single nearest point (at any distance) is used.

Default relies on the sp package for the spDistsN1 and SpatialPoints functions. Other values of dfunc parameter are slower.

IMPORTANT:

To create a proximity score for a block group, one can find the score for each block in the block group and then find the population-weighted average of those block scores, for a single block group.
FIPS for blocks can be used to find FIPS for block groups. FIPS for block groups can be used to find FIPS for tracts.

ADJUSTMENT FOR SMALL DISTANCES:

The adjustment for small distances ensure that each distance represents roughly the distance to the average resident within a spatial unit like a block, rather than just the distance to the center or internal point. The adjustment uses the area of the spatial unit and assumes residents are evenly spread across the unit. Distance is adjusted in each place if area of each spatial unit is specified, to ensure it represents roughly distance to average resident in the unit: The distance is capped to be no less than 0.9 x radius of a circle of area equal to census unit's area. This approximation treats unit as if it were a circle and assumes pop is evenly distributed within that circle's area, since
0.9r = 0.9 x sqrt(area/pi) = approx solution to dist from avg point (resident) in circle to a random point in the circle (facility or point of interest). The use of a minimum distance per areal unit is intended to help approximate the distance from the average resident rather than from the internal point or center of the areal unit. The approximation assumes distance to the average resident can be estimated as if homes and facilities were on average uniformly distributed within blocks (or whatever units are used) that were roughly circular on average. It relies on the fact that the average distance between two random points in a circle of radius R is 90 percent of R (Weisstein, Eric W. Disk Line Picking. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/DiskLinePicking.html ). This means that if a population is randomly spread over a roughly circular block, a facility inside the block (i.e., very close to the internal point) typically would be 0.9R from the average person. The same math shows that the average point in the circle is 0.67R from the center, and 1.13R from the edge of the circle. We can describe this relationship using an equation that is a portion of the formula for the distance between two random points in a circle of radius = 1. The formula uses b = the distance of the facility from the center as a fraction of the radius, and the integral over a represents distances of residences from the center. We can solve the equation using http://WolframAlpha.com, for b = 0, 0.5, or 1, representing points at the center, halfway to the edge, and at the edge of the circle. For example, we can use this equation for b = 0.5 to find that the average person, if randomly located in a circle of radius R, is a distance of about 0.8 R from a facility that is halfway between the center and edge of the circle. Note this is not the same as the expected location of a randomly placed facility, which would use b = sqrt(0.5) instead and gives a distance of about 0.9R. The following would be used as the input to WolframAlpha to derive the 0.9 approximation: Integrate((1/Pi) Sqrt(a + (Sqrt(0.5))^2 - 2(Sqrt(0.5)) Sqrt(a) cos(t)), a, 0, 1, t, 0, pi) http://bit.ly/1GJ9UID

Value

By default, returns a vector of numbers, the proximity scores, one for each of the frompoints (or if testing, a matrix with 2 columns: fromrow and d for distance). Based on miles by default, or km depending on units. Returns +Inf for a unit if that area's area and distance are both zero.

See Also

get.distances and get.distances.all for distances between points, and get.nearest which finds the distance to the single nearest point within a specified search radius instead of all topoints.

Examples

test.from <- structure(list(fromlat = c(38.9567309094, 38.9507043428),
  fromlon = c(-77.0896572305, -77.0896199948)), .Names = c("lat", "lon"),
  row.names = c("6054762", "6054764"), class = "data.frame")
test.to <- structure(list(tolat = c(38.9575019287, 38.9507043428, 38.9514152435),
  tolon = c(-77.0892818598, -77.0896199948, -77.0972395245)), .Names = c("lat", "lon"),
  class = "data.frame", row.names = c("6054762", "6054763", "6054764"))

set.seed(999)
t1=testpoints(1)
t10=testpoints(10)  # t1=t10[3,] 
t100=testpoints(100) # t10[2,] <- t100[5,] 
t1k=testpoints(1e3)
t10k=testpoints(1e4)
t100k=testpoints(1e5)
t1m=testpoints(1e6)

proxistat(t1, t10k, radius=50, units='km')

proxistat(t10, t10k)

subunitscores = proxistat(frompoints=test.from, topoints=test.to,
  area=rep(0.2, length(test.from[,1])), radius=50, units='km')
print(subunitscores)
subunitpop = rep(1000, length(test.from$lat))
subunits = data.frame(FIPS=substr(rownames(test.from), 1, 5), 
  pop=subunitpop, stringsAsFactors=FALSE )
unitscores = aggregate(subunits,
  by=list(subunits$FIPS), FUN=function(x) {Hmisc::wtd.mean(x$score, weights=x$pop, na.rm=TRUE)}
)
print(unitscores)
## Not run: 
output = proxistat.chunked(blocks[ , c('lon','lat')], topoints=rmp, fromchunksize=10000, area=blocks$area / 1e6,
   return.count=TRUE, return.nearest=TRUE )
output=as.data.frame(output)
if (class(blocks$fips)!='character') {blocks$fips <- lead.zeroes(blocks$fips, 15)}
blocks$FIPS.BG <- get.fips.bg(blocks$fips)
bg.proxi <- data.frame()
bg.proxi$scores  <-  aggregate( cbind(d=output$scores, pop=blocks$pop), by=list(blocks$FIPS.BG), function(x) Hmisc::wtd.mean(1/x[,'d'], x[,'pop']))
if ('nearestone.d' %in% colnames(output)) { bg.proxi$nearestone.d <- aggregate( output$d, by=list(blocks$FIPS.BG), min) }
if ('count.near' %in% colnames(output))   { bg.proxi$count.near   <- aggregate( cbind(d=output$count.near, pop=blocks$pop), by=list(blocks$FIPS.BG), function(x) Hmisc::wtd.mean(1/x[,'d'], x[,'pop'])) }

## End(Not run)


ejanalysis/proxistat documentation built on April 2, 2024, 10:13 a.m.