View source: R/optimal_kmeans_d.R
optimal_kmeans_d | R Documentation |
optimal_kmeans_d
applies k-means clustering using the
kmeans
function with many random starts. The D value is
then calculated for the cluster solution at each random start using the
d
function, and the cluster solution that maximizes D is returned,
along with the corresponding value of D. In this way the optimally
etiologically heterogeneous subtype solution can be identified from possibly
high-dimensional disease marker data.
optimal_kmeans_d(markers, M, factors, case, data, nstart = 100, seed = NULL)
markers |
a vector of the names of the disease markers. These markers
should be of a type that is suitable for use with
|
M |
is the number of clusters to identify using
|
factors |
a list of the names of the binary or continuous risk factors.
For binary risk factors the lowest level will be used as the reference level.
e.g. |
case |
denotes the variable that contains each subject's status as a
case or control. This value should be 1 for cases and 0 for controls.
Argument must be supplied in quotes, e.g. |
data |
the name of the dataframe that contains the relevant variables. |
nstart |
the number of random starts to use with
|
seed |
an integer argument passed to |
Returns a list
optimal_d
The D value for the optimal D solution
optimal_d_data
The original data frame supplied through the
data
argument, with a column called optimal_d_label
added for the optimal D subtype label.
This has the subtype assignment for cases, and is 0 for all controls.
Begg, C. B., Zabor, E. C., Bernstein, J. L., Bernstein, L., Press, M. F., & Seshan, V. E. (2013). A conceptual and methodological framework for investigating etiologic heterogeneity. Stat Med, 32(29), 5039-5052.
# Cluster 30 disease markers to identify the optimally # etiologically heterogeneous 3-subtype solution res <- optimal_kmeans_d( markers = c(paste0("y", seq(1:30))), M = 3, factors = list("x1", "x2", "x3"), case = "case", data = subtype_data, nstart = 100, seed = 81110224 ) # Look at the value of D for the optimal D solution res[["optimal_d"]] # Look at a table of the optimal D solution table(res[["optimal_d_data"]]$optimal_d_label)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.