get.distances.all: Find all distances between two sets of points (based on...

View source: R/get.distances.all.R

get.distances.allR Documentation

Find all distances between two sets of points (based on lat/lon)

Description

Returns all the distances from one set of geographic points to another set of points. Can return a matrix of distances (m x n points) or vector or data.frame with one row per pair. Lets you specify units and whether you need lat/lon etc, but essentially just a wrapper for the sp package for the spDistsN1 and SpatialPoints functions.

Usage

get.distances.all(
  frompoints,
  topoints,
  units = "miles",
  return.crosstab = FALSE,
  return.rownums = TRUE,
  return.latlons = TRUE,
  as.df = TRUE
)

Arguments

frompoints

A matrix or data.frame with two cols, 'lat' and 'lon' with datum=WGS84 assumed.

topoints

A matrix or data.frame with two cols, 'lat' and 'lon' with datum=WGS84 assumed.

units

A string that is 'miles' by default, or 'km' for kilometers, specifying units for distances returned.

return.crosstab

Logical value, FALSE by default. If TRUE, value returned is a matrix of the distances, with a row per frompoint and col per topoint.

return.rownums

Logical value, TRUE by default. If TRUE, value returned also includes two extra columns: a col of index numbers starting at 1 specifying the frompoint and a similar col specifying the topoint. If crosstab=TRUE, ignores return.rownums and return.latlons

return.latlons

Logical value, TRUE by default. If TRUE, value returned also includes four extra columns, showing fromlat, fromlon, tolat, tolon. If crosstab=TRUE, ignores return.rownums and return.latlons

as.df

Logical, default is TRUE, in which case returns a data.frame (unless vector), otherwise a matrix (unless vector).

Details

*** Probably slower than it needs to be partly by using data.frame instead of matrix class? Roughly 10-20 Just using get.distances.all is reasonably fast? (30-40 seconds for 100 million distances, but slow working with results so large), Sys.time(); x=get.distances.all(testpoints(1e5), testpoints(1000), return.crosstab=TRUE); Sys.time()
[1] "2015-03-10 18:59:08 EDT"
[1] "2015-03-10 18:59:31 EDT" 23 SECONDS for 100 million distances IF NO PROCESSING OTHER THAN CROSSTAB
Sys.time(); x=get.distances.all(testpoints(1e6), testpoints(100), return.crosstab=TRUE); Sys.time()
[1] "2015-03-10 21:54:11 EDT"
[1] "2015-03-10 21:54:34 EDT" 23 SECONDS for 100 million distances (1m x 100, or 100k x 1000)
Sys.time(); x=get.distances.all(testpoints(1e6), testpoints(300), return.crosstab=TRUE); Sys.time()
[1] "2015-03-10 21:56:11 EDT"
[1] "2015-03-10 21:57:18 EDT" 67 seconds for 300 million pairs.
plus 20 seconds or so for x[x>100] <- Inf
#' so 11m blocks to 1k points could take >40 minutes! (you would want to more quickly remove the ones outside some radius)
>3 minutes per 100 sites?
About 2.6 seconds per site for 11m blocks?

> Sys.time(); x=get.distances.all(testpoints(1e5), testpoints(1000), units='miles',return.rownums=TRUE); Sys.time()
[1] "2015-03-09 21:23:04 EDT"
[1] "2015-03-09 21:23:40 EDT" 36 SECONDS IF DATA.FRAME ETC. DONE TO FORMAT RESULTS AND GET ROWNUMS
> Sys.time(); x=get.distances.all(testpoints(1e5), testpoints(1000), units='miles',return.rownums=TRUE)$d; Sys.time()
[1] "2015-03-09 21:18:47 EDT"
[1] "2015-03-09 21:19:26 EDT" 49 SECONDS IF DATA.FRAME ETC. DONE TO FORMAT RESULTS AND GET ROWNUMS IN get.distances.all

Value

By default, returns a dataframe that has 3 columns: fromrow, torow, distance (where fromrow or torow is the row number of the corresponding input, starting at 1). If return.crosstab=FALSE, which is default, and return.rownums and/or return.latlons is TRUE, returns a row per from-to pair, and columns depending on parameters, sorted first cycling through all topoints for first frompoint, and so on. If return.crosstab=FALSE and return.rownums and return.latlons are FALSE, returns a vector of distances in same order as rows described above. If return.crosstab=TRUE, returns a matrix of distances, with one row per frompoint and one column per topoint.

See Also

get.distances which allows you to specify a search radius and get distances only within that radius which can be faster, get.distances.prepaired for finding distances when data are already formatted as pairs of points, get.nearest which finds the distance to the single nearest point within a specified search radius instead of all topoints, and proxistat which calculates a proximity score for each spatial unit based on distances to nearby points.

Examples

set.seed(999)
t1=testpoints(1)
t10=testpoints(10)
t100=testpoints(100, minlat=25,maxlat=48)
t1k=testpoints(1e3)
t10k=testpoints(1e4)
t100k=testpoints(1e5)
t1m=testpoints(1e6)
#t10m=testpoints(1e7)

get.distances.all(t1, t1)
get.distances.all(t1, t10[2, ,drop=FALSE])
x=get.distances.all(t10, t100[1:20 , ], units='km')
 plot(x$tolon, x$tolat,pch='.')
 points(x$fromlon, x$fromlat)
 with(x, linesegments(fromlon, fromlat, tolon, tolat ))
 with(x[x$d<500, ], linesegments(fromlon, fromlat, tolon, tolat ,col='red'))
x=get.distances.all(t10, t1k); head(x);summary(x$d)
x=get.distances.all(t10, t1k, units='km'); head(x);summary(x$d)
x=get.distances.all(t10, t1k, units='km'); head(x);summary(x$d)

## Not run: 
require(UScensus2010blocks) # for the get.blocks() function and dataset
blocks <- get.blocks(fields=c('fips','lat','lon'),charfips = FALSE)


## End(Not run)

   test.from <- structure(list(fromlat = c(38.9567309094, 45),
     fromlon = c(-77.0896572305, -100)), .Names = c("lat", "lon"),
     row.names = c("1", "2"), class = "data.frame")

   test.to <- structure(list(tolat = c(38.9575019287, 38.9507043428, 45),
    tolon = c(-77.0892818598, -77.2, -90)),
    .Names = c("lat", "lon"), class = "data.frame",
    row.names = c("1", "2", "3"))
 test.to.NA = rbind(c(NA,NA), test.to[2:3,])
 test.from.NA = rbind(test.from[1,], c(NA,NA))

get.distances.all(test.from, test.to)
get.distances.all(test.from, test.to, return.crosstab=TRUE)
get.distances.all(test.from, test.to, return.rownums=FALSE)
get.distances.all(test.from, test.to, return.latlons=FALSE)
get.distances.all(test.from, test.to, return.latlons=FALSE, return.rownums=FALSE)

     # test cases

get.distances.all(test.from,    test.to.NA)
get.distances.all(test.from.NA, test.to)
get.distances.all(test.from.NA, test.to.NA)
get.distances.all(test.from[1,],test.to[1,],return.rownums=F,return.latlons=F)
get.distances.all(test.from[1,],test.to[1,],return.rownums=FALSE,return.latlons=TRUE)
get.distances.all(test.from[1,],test.to[1,],return.rownums=TRUE,return.latlons=FALSE)
get.distances.all(test.from[1,],test.to[1,],return.rownums=TRUE,return.latlons=TRUE)

get.distances.all(test.from[1,],test.to[1:3,],return.rownums=F,return.latlons=F)
get.distances.all(test.from[1,],test.to[1:3,],return.rownums=FALSE,return.latlons=TRUE)
get.distances.all(test.from[1,],test.to[1:3,],return.rownums=TRUE,return.latlons=FALSE)
get.distances.all(test.from[1,],test.to[1:3,],return.rownums=TRUE,return.latlons=TRUE)

get.distances.all(test.from[1:2,],test.to[1,],return.rownums=F,return.latlons=F)
get.distances.all(test.from[1:2,],test.to[1,],return.rownums=FALSE,return.latlons=TRUE)
get.distances.all(test.from[1:2,],test.to[1,],return.rownums=TRUE,return.latlons=FALSE)
get.distances.all(test.from[1:2,],test.to[1,],return.rownums=TRUE,return.latlons=TRUE)

round(get.distances.all(test.from[1:2,],test.to[1:3,],return.rownums=F,return.latlons=F),1)
get.distances.all(test.from[1:2,],test.to[1:3,],return.rownums=FALSE,return.latlons=T)
get.distances.all(test.from[1:2,],test.to[1:3,],return.rownums=TRUE,return.latlons=F)
get.distances.all(test.from[1:2,],test.to[1:3,],return.rownums=TRUE,return.latlons=TRUE)
get.distances.all(test.from[1:2,],test.to[1:3,], return.rownums=TRUE,
  return.latlons=TRUE, units='km')
get.distances.all(test.from[1:2,],test.to[1:3,], return.rownums=TRUE,
  return.latlons=TRUE, units='miles')

get.distances.all(test.from[1,],test.to[1:3, ], return.crosstab=TRUE)
get.distances.all(test.from[1:2,],test.to[1, ], return.crosstab=TRUE)
round(get.distances.all(test.from[1:2,],test.to[1:3, ],return.crosstab=TRUE, units='miles'),2)
round(get.distances.all(test.from[1:2,],test.to[1:3, ],return.crosstab=TRUE, units='km'),2)

ejanalysis/proxistat documentation built on April 2, 2024, 10:13 a.m.