near_strings1 | R Documentation |
Identifies cases that are nearby each other in space/time
near_strings1(dat, id, x, y, tim, DistThresh, TimeThresh)
dat |
data frame |
id |
string for id variable in data frame (should be unique) |
x |
string for variable that has the x coordinates |
y |
string for variable that has the y coordinates |
tim |
string for variable that has the time stamp (should be numeric or datetime) |
DistThresh |
scaler for distance threshold (in whatever units x/y are in) |
TimeThresh |
scaler for time threshold (in whatever units tim is in) |
This function returns strings of cases nearby in space and time. Useful for near-repeat analysis, or to
identify potentially duplicate cases. This particular function is memory safe, although uses loops and will be
approximately O(n^2) time (or more specifically choose(n,2)
). Tests I have done
on my machine
5k rows take only ~10 seconds, but ~100k rows takes around 12 minutes with this code.
A data frame that contains the ids as row.names, and two columns:
CompId
, a unique identifier that lets you collapse original cases together
CompNum
, the number of linked cases inside of a component
Wheeler, A. P., Riddell, J. R., & Haberman, C. P. (2021). Breaking the chain: How arrests reduce the probability of near repeat crimes. Criminal Justice Review, 46(2), 236-258.
near_strings2()
, which uses kdtrees, so should be faster with larger data frames, although still may run out of memory, and is not 100% guaranteed to return all nearby strings.
# Simplified example showing two clusters s <- c(0,0,0,4,4) ccheck <- c(1,1,1,2,2) dat <- data.frame(x=1:5,y=0, ti=s, id=1:5) res1 <- near_strings1(dat,'id','x','y','ti',2,1) print(res1) #Full nyc_shoot data with this function takes ~40 seconds library(sp) data(nyc_shoot) nyc_shoot$id <- 1:nrow(nyc_shoot) #incident ID can have dups mh <- nyc_shoot[nyc_shoot$BORO == 'MANHATTAN',] print(Sys.time()) res <- near_strings1(mh@data,id='id',x='X_COORD_CD',y='Y_COORD_CD', tim='OCCUR_DATE',DistThresh=1500,TimeThresh=3) print(Sys.time()) #3k shootings takes only ~1 second on my machine
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.