timeclust: time couse data clustering

View source: R/timeclust.R

timeclustR Documentation

time couse data clustering

Description

This function performs clustering analysis of the time course data.

Usage

timeclust(
  x,
  algo,
  k,
  dist = "distance",
  dist.method = "euclidean",
  centers = NULL,
  standardize = TRUE,
  ...
)

Arguments

x

a TCA object returned from timecourseTable or a matrix

algo

a character string giving a clustering method. Options are "km" (kmeans), "pam" (partitioning around medoids), "hc" (hierachical clustering), "cm" (cmeans).

k

a numeric value between 1 and n - 1 (n is the number of data points to be clustered).

dist

a character string specifying either "distance" or "correlation" will be used to measure the distance between data points.

dist.method

a character string. It can be chosen from one of the correlation methods in cor function ("pearson", "spearman", "kendall") if dist is "correlation", or one of the distance measure methods in dist function (for example, "euclidean", "manhattan") if dist is "distance".

centers

a numeric matrix giving intial centers for kmeams, pam or cmeans. If given, number of rows of the matrix must be equal to k.

standardize

logical, if TRUE, z-score transformation will performed on the data before clustering. See 'Details' below.

...

additional arguments passing to kmeans, pam, hclust, cmeans

Details

two types of clustering methods are provided: hard clustering (kmeans, pam, hclust) and soft clustering(cmeans). In hard clustering, a data point can only be allocated to exactly one cluster (for hclust, cutree is used to cut a tree into clusters), while in soft clustering (also known as fuzzy clustering), a data point can be assigned to multiple clusters, membership values are used to indicate to what degree a data point belongs to each cluster.

To better capture the differences of temporal patterns rather than expression levels, z-score transformation can be applied to covert the the expression values to z-scores by performing the following formula:

z = \frac{x - \mu}{\sigma}

x is the value to be converted (e.g., expression value of a genomic feature in one condition), \mu is the population mean (e.g., average expression value of a genomic feature across different conditions), \sigma is the standard deviation (e.g., standard deviation of the expression values of a genomic feature across different conditions).

Value

If x is a TCA object, a TCA object will be returned. If x is a matrix, a clust object will be returned

Author(s)

Mengjun Wu

See Also

clust, kmeans, pam, hclust, cutree

Examples


example.mat <- matrix(rnorm(1600,sd=0.3), nrow = 200,
            dimnames = list(paste0('peak', 1:200), 1:8))
clust_res <- timeclust(x = example.mat, algo = 'cm', k = 4) 
# return a clust object


MengjunWu/TCseq documentation built on May 15, 2023, 9:47 p.m.