cluster_locid | R Documentation |
Spatial clustering based on correlation or other metrics.
cluster_locid(
x,
varname,
locid = "locid",
time = "UTC",
locid_info = NULL,
weight = NULL,
group = NULL,
k = c(1:20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 500, 1000, 10000),
max_loss = 0.05,
distance = "cor",
cores = 1,
plot = FALSE,
verbose = TRUE,
...
)
x |
'data.frame' (merra subset) with location and time identifiers, and a time-series variable to cluster. |
varname |
name of column with data to be used to cluster locations. |
locid |
name of column of location identifiers. |
time |
name of column with time dimension |
locid_info |
(optional) 'data.frame' or 'sf' object with weights and/or spatial groups (regions) of location identifiers. |
weight |
(optional) name of column with (positive) weights in 'locid_info', used in calculating weighted 'mean' and 'sd' metrics. |
group |
(optional) name of column with group-names of locations (such as regions). If provided, clustering will be made for each group separately. |
k |
(optional) integer vector of number of clusters to test. By default ('NULL') clustering process start from '1' to the number of locations and terminates when 'max_loss' condition is met. |
max_loss |
maximum loss of variation (standard deviation) of clustered variable, measured as '1 - sd(clustered_variable) / sd(original_variable)'. Default value is '0.05', meaning up to '5' percent of variability of original, non-clustered variable is allowed to be lost by clustering. |
distance |
character name of a selected distance measure to use ‘TSdist::KMedoids'. Default metrics is 'cor' - Pearson’s correlation between the time series variable in different locations. Alternative, allowed methasures: '"euclidean", "manhattan", "minkowski", "infnorm", "ccor", "sts", "dtw", "keogh_lb", "edr", "erp", "lcss", "fourier", "tquest", "dissimfull", "dissimapprox", "acf", "pacf", "ar.lpc.ceps", "ar.mah", "ar.mah.statistic", "ar.mah.pvalue", "ar.pic", "cdm", "cid", "cor", "cort", "wav", "int.per", "per", "mindist.sax", "ncd", "pred", "spec.glk", "spec.isd", "spec.llr", "pdc", "frechet"'. |
cores |
integer number of processor cores to use, currently ignored. |
verbose |
logical, should the clustering process be reported, TRUE by default. |
... |
additional parameters to pass to 'TSdist::KMedoids', might be required for some distance measures. |
'data.frame' with alternative number of clusters with columns:
Number of clusters
Total number of time series
location identifier in 'merra2ools' datasets
(if provided) column with locid-groups
cluster number in every 'k'-group
weight of the cluster in the 'k'-group
standard deviation of the whole sample of (N) time-series
standard deviation of clustered time series with 'k' clusters
loss of standard deviation as result of clusterisation, for each 'k'
# see "Cluster locations" in "Get started"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.