cluster_locs: Detect (spatial) groundwater well clusters
In inbo/watina: Querying and Processing Data from the INBO Watina Database

cluster_locs

R Documentation

Detect (spatial) groundwater well clusters

Description

cluster_locs() accepts as input a dataframe with X/Y coordinates, or an sf object of geometry type POINT. The function adds an integer variable that defines cluster membership. The intention is to detect spatial groundwater well clusters; hence it uses a sensible method of spatial clustering and default euclidean distance to cut the cluster tree.

Usage

cluster_locs(
  input,
  max_dist = 2,
  output_var = "cluster_id",
  xvar = "x",
  yvar = "y"
)

Arguments

`input`	A dataframe with X/Y coordinates, or an `sf` object of geometry type `POINT`. A typical input dataframe is the collected output of `get_locs`.
`max_dist`	The maximum geospatial distance between two points to make them belong to the same cluster. The default value is sensible for many usecases, supposing meter is the unit of the coordinate reference system, as is the case for the 'Belge 1972 / Belgian Lambert 72' CRS (EPSG 31370).
`output_var`	Name of the new variable to be added to `input`.
`xvar`	String. The X coordinate variable name; only considered when `input` is a dataframe. Defaults to `"x"`.
`yvar`	String. The Y coordinate variable name; only considered when `input` is a dataframe. Defaults to `"y"`.

Details

The function performs agglomerative clustering with the complete linkage method. This way, the application of a tree cutoff (max_dist) means that each cluster is a collection of locations with a maximum distance - between any two locations of the cluster - not larger than the cutoff value. All locations that can be clustered under this condition, will be. Locations that can not be clustered receive a unique cluster value.

The function's code was partly inspired by unpublished code from Ivy Jansen.

Value

The original object with an extra variable added (by default: cluster_id) to define cluster membership.

Examples

library(dplyr)
set.seed(123456789)
mydata <-
  tibble(
    a = runif(10),
    x = rnorm(10, 155763, 2),
    y = rnorm(10, 132693, 2)
  )
cluster_locs(mydata) %>%
  arrange(cluster_id)
mydata %>%
  as_points(remove = TRUE) %>%
  cluster_locs %>%
  arrange(cluster_id)

## Not run: 
watina <- connect_watina()

clusters <-
  get_locs(watina,
           area_codes = "KBR",
           collect = TRUE) %>%
  cluster_locs

# inspect result:
clusters %>%
  select(loc_code, x, y, cluster_id) %>%
  arrange(cluster_id)

# frequency of cluster sizes:
clusters %>%
  count(cluster_id) %>%
  pull(n) %>%
  table

# Disconnect:
dbDisconnect(watina)

## End(Not run)

inbo/watina documentation built on Dec. 2, 2024, 4:02 a.m.