View source: R/bootclustrange.R
bootclustrange | R Documentation |
bootclustrange
estimates the quality of the clustering based on subsamples of the data to avoid computational overload.
bootclustrange(object, seqdata, seqdist.args = list(method = "LCS"),
R = 100, sample.size = 1000, parallel = FALSE,
progressbar = FALSE, sampling = "clustering",
strata = NULL)
## S3 method for class 'bootclustrange'
plot(x, stat = "noCH", legendpos = "bottomright",
norm = "none", withlegend = TRUE, lwd = 1,
col = NULL, ylab = "Indicators",
xlab = "N clusters", conf.int = 0.95,
ci.method = "perc", ci.alpha = 0.3,
line = "median", ...)
## S3 method for class 'bootclustrange'
print(x, digits = 2, bootstat = c("mean"), ...)
object |
A |
seqdata |
State sequence object of class |
seqdist.args |
List of arguments passed to |
R |
Numeric. The number of subsamples to use. |
sample.size |
Numeric. The size of the subsamples, values between 1000 and 10 000 are recommended. |
parallel |
Logical. Whether to initialize the parallel processing of the |
progressbar |
Logical. Whether to initialize a progressbar using the |
sampling |
Character. The sampling procedure to be used: |
strata |
An optional stratification variable. |
x |
A |
stat |
Character. The list of statistics to plot or "noCH" to plot all statistics except "CH" and "CHsq" or "all" for all statistics. See |
legendpos |
Character. legend position, see |
norm |
Character. Normalization method of the statistics can be one of "none" (no normalization), "range" (given as (value -min)/(max-min), "zscore" (adjusted by mean and standard deviation) or "zscoremed" (adjusted by median and median of the difference to the median). |
withlegend |
Logical. If |
lwd |
Numeric. Line width, see |
col |
A vector of line colors, see |
xlab |
x axis label. |
ylab |
y axis label. |
conf.int |
Confidence to build the confidence interval (default: 0.95). |
ci.method |
Method used to build the confidence interval (only if bootstrap has been used, see R above). One of "none" (do not plot confidence interval), "norm" (based on normal approximation), "perc" (default, based on percentile).) |
ci.alpha |
alpha color value used to plot the interval. |
line |
Which value should be plotted by the line? One of "mean" (average over all bootstraps), "median"(default, median over all bootstraps). |
digits |
Number of digits to be printed. |
bootstat |
The summary statistic to use |
... |
Additionnal parameters passed to/from methods. |
bootclustrange
estimates the quality of the clustering based on subsamples of the data to avoid computational overload. It randomly samples R
times sample.size
sequences from seqdata
using the sampling procedure defined by the sampling
arguments. In each subsample, a distance matrix is computed using the selected sequences and the seqdist.args
arguments and the cluster quality indices are then estimated using as.clustrange
.
The clustering can be specified either as a seqclararange
object or a data.frame
.
A clustrange
object, see as.clustrange
with the bootrapped values.
Studer, M., R. Sadeghi and L. Tochon (2024). Sequence Analysis for Large Databases. LIVES Working Papers 104 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.12682/lives.2296-1658.2024.104")}
See Also as.clustrange
for the list of cluster quality indices that are computed, and seqclararange
for example of use
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.