Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/bootstrapLatencyClustering.R
Performs classification with kmeans
, hc
or pam
,
then estimates uncertainty via bootstrap resampling.
Bootstrap resampling is stratified such that the proportion of short and long latency in each bootstrap remains constant.
1 2 | bootstrapClusterClass(latency, its = 1000, stratification = "initial",
algorithm=c("kmeans", "hc", "pam"), emModelNames = "E")
|
latency |
A vector of average latency values. |
its |
Number of bootstrap iterations. |
stratification |
By default ("initial"), the proportion of SL / LL in each bootstrap is fixed, based on the original classification (the prior). This helps stabilize the bootstrap fit if the distribution is skewed such that there are relatively few SL or LL samples. We want to avoid producing bootstraps that contain few or no samples from one of the groups. If "probabilistic", classify the data with For example, consider the subjects (A,B,C,D,E) with prior probabilities (0, 0.05, 0.45, 0.95, 1). This contains 3 SL and 2 LL. But if we resample the groups using these probabilities, we will classify A as LL with probability 0, B with probability 0.05, C with probability 0.45, and so on. If "none", the stratification is disabled and the bootstraps are produced by randomly sampling the original data with replacement, without regard to the initial classification. |
algorithm |
The clustering algorithm. By default, calls |
emModelNames |
Only used if |
This function is used to classify average latency scores into two groups, the "short latency" (SL) with low stress resilience and the "long latency" (LL) with high stress resilience.
The bootstrap resampling is done by sampling, with replacement, from the SL and LL groups defined by the initial call to the clustering algorithm on the original data. The proportion of SL and LL in each bootstrap remains fixed, unless restratify=TRUE, in which case an EM algorithm is used to define prior probabilities, and the stratification is recomputed by sampling these priors at each bootstrap.
The classification boundary for each bootstrap is the mid-point between the two centroids / means. This boundary is used to the classify the original latency data.
A list with the following components
bootProbLL |
The probability that a subject is classified as LL, defined as the number of times this subject was classified LL over all bootstraps. |
its |
The number of bootstrap iterations. |
latency |
The original latency data used for the original classification and resampled for the bootstrap. |
boundary |
Approximate cluster boundary between SL and LL means (from kmeans or hc) or medoids (from pam) computed on the original data. |
boundary_boot |
Approximate boundary point from all bootstraps. |
centers |
Cluster centers or medoids fit to the original data. |
class_boot |
A matrix containing the classification of sample data at each bootstrap. |
clusters |
The classification of the original data: 1 (SL), 2 (LL). |
The ordering of the per-subject values (such as bootProbLL) is the same as in the latency vector.
The cluster boundaries are defined as halfway between the two cluster means at each bootstrap.
Philip A Cook <cookpa@pennmedicine.upenn.edu>
plotBootstrapClusterClass
kmeans
pam
hc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
set.seed(20140123)
sl <- rnorm(60, 200, 100)
sl[which(sl < 0)] <- 0
ll <- rnorm(15, 600, 100)
ll[which(ll > 900)] <- 900
boot = bootstrapClusterClass(c(sl, ll), its = 100, algorithm = "pam")
## Not run
## Not run: plotBootstrapClusterClass(boot)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.