clustOptimal: Generate optimal clustering

View source: R/clustOptimal.R

clustOptimalR Documentation

Generate optimal clustering

Description

Takes a clue output and generate the optimal clustering of the time-course data.

Usage

clustOptimal(
  clueObj,
  rep = 5,
  user.maxK = NULL,
  effectiveSize = NULL,
  pvalueCutoff = 0.05,
  visualize = TRUE,
  universe = NULL,
  mfrow = c(1, 1)
)

Arguments

clueObj

the output from runClue.

rep

number of times (default is 5) the clustering is to be repeated to find the best clustering result.

user.maxK

user defined optimal k value for generating optimal clustering. If not provided, the optimal k that is identified by clue will be used.

effectiveSize

the size of kinase-substrate groups to be considered for calculating enrichment. Groups that are too small or too large will be removed from calculating overall enrichment of the clustering.

pvalueCutoff

a pvalue cutoff for determining which kinase-substrate groups to be included in calculating overall enrichment of the clustering.

visualize

a boolean parameter indicating whether to visualize the clustering results.

universe

the universe of genes/proteins/phosphosites etc. that the enrichment is calculated against. The default are the row names of the dataset.

mfrow

control the subplots in graphic window.

Value

return a list containing optimal clustering object and enriched kinases or gene sets.

Examples

# simulate a time-series data with 4 distinctive profile groups and each group with
# a size of 50 phosphorylation sites.
simuData <- temporalSimu(seed=1, groupSize=50, sdd=1, numGroups=4)

# create an artificial annotation database. Generate 20 kinase-substrate groups each
# comprising 10 substrates assigned to a kinase.
# among them, create 4 groups each contains phosphorylation sites defined to have the
# same temporal profile.
kinaseAnno <- list()
groupSize <- 50
for (i in 1:4) {
 kinaseAnno[[i]] <- paste("p", (groupSize*(i-1)+1):(groupSize*(i-1)+10), sep="_")
}

for (i in 5:20) {
 set.seed(i)
 kinaseAnno[[i]] <- paste("p", sample.int(nrow(simuData), size = 10), sep="_")
}
names(kinaseAnno) <- paste("KS", 1:20, sep="_")

# run CLUE with a repeat of 2 times and a range from 2 to 7
set.seed(1)
clueObj <- runClue(Tc=simuData, annotation=kinaseAnno, rep=5, kRange=2:7)

# visualize the evaluation outcome
xl <- "Number of clusters"
yl <- "Enrichment score"
boxplot(clueObj$evlMat, col=rainbow(ncol(clueObj$evlMat)), las=2, xlab=xl, ylab=yl, main="CLUE")
abline(v=(clueObj$maxK-1), col=rgb(1,0,0,.3))

# generate optimal clustering results using the optimal k determined by CLUE
best <- clustOptimal(clueObj, rep=3, mfrow=c(2, 3))

# list enriched clusters
best$enrichList

# obtain the optimal clustering object

  best$clustObj




ClueR documentation built on Nov. 16, 2023, 5:08 p.m.