Note: Weight scaling in cluster analysis
In Ckmeans.1d.dp: Optimal, Fast, and Reproducible Univariate Clustering

The function Ckmeans.1d.dp() can perform optimal weighted univariate clustering. Depending on the application, weights can indicate sample size, certainty, or signal intensity. Relative values of weights are always consequential on cluster results. Absolute values of weights have an impact on the number of clusters when it must be estimated.

The linear scale of weights is consequential when estimating the number of clusters

When the number of clusters must be estimated, the linear scale of weights heavily influences the estimated number of clusters $k$. The reason is that linear scaling has a nonlinear effect when calculating the Bayesian information criterion. A large scale will promote more clusters to be used.

Here is a guideline on how to scale the weights:

If weights are the numbers of repeated observations at each data point, they should be used as is and not linearly scaled.
If the weights are not related to sample size but are some measure of emphasis, they should be scaled to sum up to the observed sample size of the entire data set.
If the weights sum up to one, it implies that the sample size of the data set is one. In this case, the number of clusters may be severely underestimated.

Linear weight scaling is uninfluential when the number of clusters is given

When an exact number of clusters $k$ is given by the user, linear weight scaling does not influence cluster analysis in theory. The clustering results are expected to be identical for any linear scaling of weights. However, a large numerical weight can cause overflow and thus should be linearly scaled down to a more tractable range.

Any scripts or data that you put into this service are public.

Ckmeans.1d.dp documentation built on Aug. 20, 2023, 1:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Ckmeans.1d.dp
Optimal, Fast, and Reproducible Univariate Clustering

Note: Weight scaling in cluster analysis
In Ckmeans.1d.dp: Optimal, Fast, and Reproducible Univariate Clustering

The linear scale of weights is consequential when estimating the number of clusters

Linear weight scaling is uninfluential when the number of clusters is given

Try the Ckmeans.1d.dp package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Ckmeans.1d.dp Optimal, Fast, and Reproducible Univariate Clustering

Note: Weight scaling in cluster analysis In Ckmeans.1d.dp: Optimal, Fast, and Reproducible Univariate Clustering

The linear scale of weights is consequential when estimating the number of clusters

Linear weight scaling is uninfluential when the number of clusters is given

Try the Ckmeans.1d.dp package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Ckmeans.1d.dp
Optimal, Fast, and Reproducible Univariate Clustering

Note: Weight scaling in cluster analysis
In Ckmeans.1d.dp: Optimal, Fast, and Reproducible Univariate Clustering