analyse_stability: Stability analysis, clustering evaluation and optimal...

Description Usage Arguments Value Author(s) References Examples

View source: R/external_TMixClust.R

Description

analyse_stability Performs multiple clustering runs with TMixClust, analyses the agreement between runs with the Rand index and returns the clustering solution with the largest likelihood. A plot of agreement probability between all the runs and the run with the maximum likelihood is produced.

Usage

1
2
3
analyse_stability(time_series_df, time_points = seq_len(ncol(time_series_df)),
  nb_clusters = 2, em_iter_max = 1000, mc_em_iter_max = 10,
  em_ll_convergence = 0.001, nb_clustering_runs = 3, nb_cores = 1)

Arguments

time_series_df

data frame containing the time series. Each row is a time series comprised of the time series name which is also the row name, and the time series values at each time point.

time_points

vector containing numeric values for the time points. Default: seq_len(ncol(time_series_df)).

nb_clusters

desired number of clusters

em_iter_max

maximum number of iterations for the expectation-maximization (EM) algorithm. Default: 1000.

mc_em_iter_max

maximum number of iterations for Monte-Carlo resampling. Default is 10.

em_ll_convergence

convergence threshold for likelihood improvement. Default is 0.001.

nb_clustering_runs

number of times the clustering procedure is repeated on the input data. Default is 3.

nb_cores

number of cores to be used to run the separate clustering operations in parallel. Default is 1.

Value

TMixClust object with the highest likelihood. Renders a plot showing the overall distribution of the Rand index, which allows the user to assess clustering stability.

Author(s)

Monica Golumbeanu, monica.golumbeanu@bsse.ethz.ch

References

Golumbeanu M, Desfarges S, Hernandez C, Quadroni M, Rato S, Mohammadi P, Telenti A, Beerenwinkel N, Ciuffi A. (2017) Dynamics of Proteo-Transcriptomic Response to HIV-1 Infection.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Load the toy time series data provided with the TMixClust package
data(toy_data_df)

# Identify the most optimal clustering solution with 3 clusters
best_clust_obj = analyse_stability(toy_data_df, nb_clusters = 3,
                                   nb_clustering_runs = 4, nb_cores = 1)

# Plot the time series from each cluster
for (i in seq_len(3)) {
    # Extract the time series in the current cluster and plot them
    c_df=toy_data_df[which(best_clust_obj$em_cluster_assignment==i),]
    plot_time_series_df(c_df, plot_title = paste("cluster",i))
}

cbg-ethz/TMixClust documentation built on May 30, 2019, 8:28 a.m.