proxistat | R Documentation |
Calculate proximity statistic for each location,
quantifying number of and proximities of nearby points.
proxistat
returns a proximity statistic (score) for each location (e.g., Census block).
proxistat(
frompoints,
topoints,
area = 0,
radius = 5,
units = "km",
decay = "1/d",
wts,
return.count = FALSE,
return.nearest = FALSE,
FIPS,
pop,
testing = FALSE,
dfunc = "sp"
)
frompoints |
Locations of internal points of Census subunits. A matrix or data.frame with two cols, 'lat' and 'lon' with datum=WGS84 assumed. Decimal degrees. Required. |
topoints |
Locations of nearby points of interest, proximity to which is the basis of each Census unit's score. A matrix or data.frame with two cols, 'lat' and 'lon' with datum=WGS84 assumed. Decimal degrees. Required. |
area |
A number or vector of numbers giving size of each spatial unit with FIPS.pop,
in square miles or square kilometers depending on the |
radius |
NOTE: This default is not the same as the default in |
units |
A string that is 'miles' or 'km' for kilometers (default is 'km'), specifying units for distances returned and for radius input. |
decay |
A string specifying type of function to use when weighting by distance. The Default is '1/d' For '1/d' decay weighting (default), score is count of points within radius, divided by harmonic mean of distances (when count>0). Decay weighting also can be '1/d^2' or '1/1' to represent decay by inverse of squared distance, or no decay (equal weighting for all points). |
wts |
Optional vector of numbers same length as number of topoints. If wts is specified, the score for each of the frompoints will be the weighted sum of influences of topoints. For example, if decay='1/d' (default), proximity score = sum(wts/d) for all the topoints nearby. If decay='1/1', proximity score = sum(wts) for all the topoints nearby. |
return.count |
Optional, logical, defaults to FALSE, specifies if results returned should include a column with the count of topoints that were within radius, for each of the frompoints |
return.nearest |
Optional, logical, defaults to FALSE, specifies if results returned should include a column with the distance to the nearest single of the topoints, for each of the frompoints |
FIPS |
NOT USED CURRENTLY - COULD BE USED LATER TO AGGREGATE (rollup) TO BLOCK GROUPS FROM BLOCKS, FOR EXAMPLE. A vector of strings designating places that will be assigned scores where each is the Census FIPS code or other ID. Optional. Might want to have this be a factor not string to be faster, or ensure it is indexed on fips, or have separate FIPS.BG passed to this function. |
pop |
NOT USED CURRENTLY - COULD BE USED LATER TO AGGREGATE (rollup) TO BLOCK GROUPS FROM BLOCKS, FOR EXAMPLE. A number or vector of numbers giving population count of each spatial unit. Default is 1, which would give the unweighted average. |
testing |
Logical during work in progress |
dfunc |
Optional character element hf or slc to specify distance function Haversine or spherical law of cosines. If sp (default, fastest), it uses the sp package to find distances more accurately and more quickly. |
This uses get.distances
with return.crosstab=TRUE.
This function returns a vector of proximity scores, one for each location such as a Census block.
For example, the proximity score may be used to represent how many hazardous waste sites are near any given neighborhood and how close they are.
A proximity score quantifies the proximity and count of nearby points using a specified formula.
Proximity Score = distance-weighted count of points nearby (within search radius) (or with another optional weight for each topoint)
(or weighted distance to nearest single point if there are none within the radius).
This is the sum of 1/d or 1/d^2 or 1/1, depending on the decay weighting, (or with another optional weight for each topoint instead of the number 1)
where d is the distance from census unit's internal point to user-defined point.
The default proximity score, using 1/d, is the count of nearby points divided by the harmonic mean of their distances (n/harmean),
(but adjusted when distance is very small, and using the nearest single one if none are nearby).
This is the same as the sum of inverse distances.
Note that Inf is returned as score if 1/d used and distance is ever 0 like where to and from are same point. Does not do a 1/d score for only the ones at nonzero distance.
The harmonic mean distance (see harmean
) is the inverse of the arithmetic mean of the inverses, or n / (sum of inverses).
Nearby is defined as a user-specified parameter, so only points within the specified distance are counted, except if none are nearby,
the single nearest point (at any distance) is used.
Default relies on the sp package for the spDistsN1
and SpatialPoints
functions.
Other values of dfunc parameter are slower.
IMPORTANT:
To create a proximity score for a block group, one can find the score for each block in the block group
and then find the population-weighted average of those block scores, for a single block group.
FIPS for blocks can be used to find FIPS for block groups. FIPS for block groups can be used to find FIPS for tracts.
ADJUSTMENT FOR SMALL DISTANCES:
The adjustment for small distances ensure that each distance represents roughly the distance to the average resident within a spatial unit like a block,
rather than just the distance to the center or internal point.
The adjustment uses the area of the spatial unit and assumes residents are evenly spread across the unit.
Distance is adjusted in each place if area of each spatial unit is specified, to ensure it represents roughly distance to average resident in the unit:
The distance is capped to be no less than 0.9 x radius of a circle of area equal to census unit's area.
This approximation treats unit as if it were a circle and assumes pop is evenly distributed within that circle's area, since
0.9r = 0.9 x sqrt(area/pi) = approx solution to dist from avg point (resident) in circle
to a random point in the circle (facility or point of interest).
The use of a minimum distance per areal unit is intended to help approximate the distance from the average resident
rather than from the internal point or center of the areal unit. The approximation assumes distance to the average resident can be estimated
as if homes and facilities were on average uniformly distributed within blocks (or whatever units are used) that were roughly circular on average.
It relies on the fact that the average distance between two random points in a circle of radius R is 90 percent of R
(Weisstein, Eric W. Disk Line Picking. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/DiskLinePicking.html ).
This means that if a population is randomly spread over a roughly circular block, a facility inside the block (i.e., very close to the internal point)
typically would be 0.9R from the average person. The same math shows that the average point in the circle is 0.67R from the center,
and 1.13R from the edge of the circle. We can describe this relationship using an equation that is a portion of the formula for the
distance between two random points in a circle of radius = 1. The formula uses b = the distance of the facility from the center as a fraction of the radius,
and the integral over a represents distances of residences from the center.
We can solve the equation using http://WolframAlpha.com, for b = 0, 0.5, or 1, representing points at the center, halfway to the edge,
and at the edge of the circle. For example, we can use this equation for b = 0.5 to find that the average person, if randomly located in a circle of radius R,
is a distance of about 0.8 R from a facility that is halfway between the center and edge of the circle.
Note this is not the same as the expected location of a randomly placed facility, which would use b = sqrt(0.5) instead and gives a distance of about 0.9R.
The following would be used as the input to WolframAlpha to derive the 0.9 approximation:
Integrate((1/Pi) Sqrt(a + (Sqrt(0.5))^2 - 2(Sqrt(0.5)) Sqrt(a) cos(t)), a, 0, 1, t, 0, pi)
http://bit.ly/1GJ9UID
By default, returns a vector of numbers, the proximity scores, one for each of the frompoints (or if testing, a matrix with 2 columns: fromrow and d for distance). Based on miles by default, or km depending on units. Returns +Inf for a unit if that area's area and distance are both zero.
get.distances
and get.distances.all
for distances between points, and
get.nearest
which finds the distance to the single nearest point
within a specified search radius instead of all topoints.
test.from <- structure(list(fromlat = c(38.9567309094, 38.9507043428),
fromlon = c(-77.0896572305, -77.0896199948)), .Names = c("lat", "lon"),
row.names = c("6054762", "6054764"), class = "data.frame")
test.to <- structure(list(tolat = c(38.9575019287, 38.9507043428, 38.9514152435),
tolon = c(-77.0892818598, -77.0896199948, -77.0972395245)), .Names = c("lat", "lon"),
class = "data.frame", row.names = c("6054762", "6054763", "6054764"))
set.seed(999)
t1=testpoints(1)
t10=testpoints(10) # t1=t10[3,]
t100=testpoints(100) # t10[2,] <- t100[5,]
t1k=testpoints(1e3)
t10k=testpoints(1e4)
t100k=testpoints(1e5)
t1m=testpoints(1e6)
proxistat(t1, t10k, radius=50, units='km')
proxistat(t10, t10k)
subunitscores = proxistat(frompoints=test.from, topoints=test.to,
area=rep(0.2, length(test.from[,1])), radius=50, units='km')
print(subunitscores)
subunitpop = rep(1000, length(test.from$lat))
subunits = data.frame(FIPS=substr(rownames(test.from), 1, 5),
pop=subunitpop, stringsAsFactors=FALSE )
unitscores = aggregate(subunits,
by=list(subunits$FIPS), FUN=function(x) {Hmisc::wtd.mean(x$score, weights=x$pop, na.rm=TRUE)}
)
print(unitscores)
## Not run:
output = proxistat.chunked(blocks[ , c('lon','lat')], topoints=rmp, fromchunksize=10000, area=blocks$area / 1e6,
return.count=TRUE, return.nearest=TRUE )
output=as.data.frame(output)
if (class(blocks$fips)!='character') {blocks$fips <- lead.zeroes(blocks$fips, 15)}
blocks$FIPS.BG <- get.fips.bg(blocks$fips)
bg.proxi <- data.frame()
bg.proxi$scores <- aggregate( cbind(d=output$scores, pop=blocks$pop), by=list(blocks$FIPS.BG), function(x) Hmisc::wtd.mean(1/x[,'d'], x[,'pop']))
if ('nearestone.d' %in% colnames(output)) { bg.proxi$nearestone.d <- aggregate( output$d, by=list(blocks$FIPS.BG), min) }
if ('count.near' %in% colnames(output)) { bg.proxi$count.near <- aggregate( cbind(d=output$count.near, pop=blocks$pop), by=list(blocks$FIPS.BG), function(x) Hmisc::wtd.mean(1/x[,'d'], x[,'pop'])) }
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.