addClustering: Add clustering information to a dataframe
In pnwairfire/AirMonitorIngest: Data Ingest of Air Quality Data

addClustering

R Documentation

Add clustering information to a dataframe

Description

Clustering is used to assign individual measurements to deployment locations.

A temporary monitor will be moved around from time to time, sometimes across the country and sometimes across the street. We need to assign unique identifiers to each new "deployment" but not when the monitor is moved a short distance.

We use clustering to find an appropriate number of unique "deployments". The sensitivity of this algorithm can be adjused with the clusterDiameter argument.

Standard kmeans clustering does not work well when clusters can have widely differing numbers of members. A much better result is acheived with the Partitioning Around Medoids method available in cluster::pam().

The value of clusterRadius is compared with the output of cluster::pam(...)$clusinfo[,'av_diss'] to determine the number of clusters.

Usage

addClustering(
  tbl,
  clusterDiameter = 1000,
  lonVar = "longitude",
  latVar = "latitude",
  maxClusters = 50,
  flagAndKeep = FALSE
)

Arguments

`tbl`	Tibble with geolocation information (e.g. created by `wrcc_qualityControl()` or `airsis_qualityControl`).
`clusterDiameter`	Diameter in meters used to determine the number of clusters (see description).
`lonVar`	Name of longitude variable in the incoming tibble.
`latVar`	Name of the latitude variable in the incoming tibble.
`maxClusters`	Maximum number of clusters to try.
`flagAndKeep`	Logical specifying flagging, rather than removal, of bad data during the QC process.