generate_clusters: Create the clusters.

Description Usage Arguments Value Examples

View source: R/generate_clusters.R

Description

This function takes the normalized data (TPM/FPKM & feature scaled) and uses the k-means function to generate an iterative series of clusters to identify a potentially optimal number of clusters for the dataset. For reproducible clusters, it is highly recommended that a seed value is used prior to generating the clusters using the set.seed function.

Usage

1
generate_clusters(df, kmin, kmax, ktot, num_iter, km_algo)

Arguments

df

A dataframe containing the normalized reads

kmin

An integer indicating the minimum number of clusters to generate. By default, this is set to 10.

kmax

An integer indicating the maximum number of clusters to generate. By default, this is set to 150.

ktot

An integer indicating how many unique k-values to generate. By default, this is set to 15. This produces 15 values ranging from kmin up to kmax. Increasing this number will significantly impact performance.

num_iter

An integer indicating the number or cluster iterations to generate. By default, this is set to 10. This will perform the same k-means clustering multiple times to account for the stochastic nature of the k-means algorithm, resulting in a mean quality value in the final step that is more reliable than a single iteration would be. Lowering this value will negatively affect the GECO quality assessment, raising it will impact performance.

km_algo

A string indicating which k-means algorithm to use. By default, this is set to 'Hartigan-Wong'.

Value

A list containing each iteration of the clustering performed. Within each of the iterations are the kmeans objects for use in the second step e.g. score_clusters(clusters).

Examples

1
2
3
4
5
# Create a pseudo RNA-seq counts table
df <- data.frame(replicate(10,sample(-1:10,200,rep=TRUE)))
rownames(df) <- paste0(rep("Gene.", 200), seq(1:200))
# Generate clusters
clusters <- generate_clusters(df)

JasonPBennett/GECO documentation built on Aug. 30, 2021, 4:30 p.m.