ClustIndexes: Clustering the indexes applying K-means
In michmich76/ctsGE: Clustering of Time Series Gene Expression data

Description Usage Arguments Details Value See Also Examples

Clustering each index, that was predifined by PreparingTheIndexes, with kmeans.

1	ClustIndexes(x, scaling = TRUE)

`x`	list of expression data and their indexes after running `PreparingTheIndexes`
`scaling`	Boolean parameter, does the data should be standardized before clustered. Default = TRUE

The clustering is done with K-means. To choose an optimal k for K-means clustering, the Elbow method was applied, this method looks at the percentage of variance explained as a function of the number of clusters: the chosen number of clusters should be such that adding another cluster does not give much better modeling of the data. First, the ratio of the within-cluster sum of squares (WSS) to the total sum of squares (TSS) is computed for different values of k (i.e., 1, 2, 3 ...). The WSS, also known as sum of squared error (SSE), decreases as k gets larger. The Elbow method chooses the k at which the SSE decreases abruptly. This happens when the computed value of the WSS-to-TSS ratio first drops from 0.2.

Running kmeans and calculating the optimal k for each one of the indexes in the data could take a long time. To shorten the procedure the user can skip this step altogether and directly view a specific index and its clusters by running either the PlotIndexesClust or the ctsGEShinyApp function.

By default data is standardize before clustering,for clustering the raw counts set the scaling parameter to FALSE.

list object is returned as output, with the relative culstered indexes table in object$ClusteredIdxTable, and the number of clusters for each index in object$optimalK

kmeans, PlotIndexesClust

data_dir <- system.file("extdata", package = "ctsGE")
files <- dir(path=data_dir,pattern = "\\.xls$")
rts <- readTSGE(files, path = data_dir,
labels = c("0h","6h","12h","24h","48h","72h"), skip = 10625 )
prts <- PreparingTheIndexes(rts)

tsCI <- ClustIndexes(prts)

head(tsCI$ClusteredIdxTable) #the table with the clusterd indexes
head(tsCI$optimalK) #the table with the number of clusters for each index