cluster_locs | R Documentation |
cluster_locs()
accepts as input a
dataframe with X/Y coordinates, or an sf
object
of geometry type POINT
.
The function adds an integer variable that defines cluster membership.
The intention is to detect spatial groundwater well clusters; hence it uses a
sensible method of spatial clustering and default euclidean distance
to cut the cluster tree.
cluster_locs(
input,
max_dist = 2,
output_var = "cluster_id",
xvar = "x",
yvar = "y"
)
input |
A dataframe with X/Y coordinates, or an |
max_dist |
The maximum geospatial distance between two points to make them belong to the same cluster. The default value is sensible for many usecases, supposing meter is the unit of the coordinate reference system, as is the case for the 'Belge 1972 / Belgian Lambert 72' CRS (EPSG 31370). |
output_var |
Name of the new variable to be added to
|
xvar |
String.
The X coordinate variable name; only considered when |
yvar |
String.
The Y coordinate variable name; only considered when |
The function performs agglomerative clustering with the
complete linkage method.
This way, the application of a tree cutoff (max_dist
) means that each
cluster is a collection of locations with a maximum distance - between any
two locations of the cluster - not larger than the cutoff value.
All locations that can be clustered under this condition, will be.
Locations that can not be clustered receive a unique cluster value.
The function's code was partly inspired by unpublished code from Ivy Jansen.
The original object with an extra variable added (by default:
cluster_id
) to define
cluster membership.
library(dplyr)
set.seed(123456789)
mydata <-
tibble(
a = runif(10),
x = rnorm(10, 155763, 2),
y = rnorm(10, 132693, 2)
)
cluster_locs(mydata) %>%
arrange(cluster_id)
mydata %>%
as_points(remove = TRUE) %>%
cluster_locs %>%
arrange(cluster_id)
## Not run:
watina <- connect_watina()
clusters <-
get_locs(watina,
area_codes = "KBR",
collect = TRUE) %>%
cluster_locs
# inspect result:
clusters %>%
select(loc_code, x, y, cluster_id) %>%
arrange(cluster_id)
# frequency of cluster sizes:
clusters %>%
count(cluster_id) %>%
pull(n) %>%
table
# Disconnect:
dbDisconnect(watina)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.