find_modules: Find modules

Description Usage Arguments Value References Examples

View source: R/find_modules.R

Description

This is a wrapper function for mclust::Mclust(), which uses Gaussian mixture models to find modules (clusters of correlated genes). The Bayesian Information Criterion (BIC) is used to estimate the number of modules. The modules are derived using only one of the phenotype groups, but the corresponding modules are returned for both groups. Later you can use the test_modules() function to test whether the correlation structure within a module differs between the two groups.

Usage

1
2
3
4
5
6
7
8
find_modules(
  X1,
  X2,
  cluster_group = 1,
  num_modules = 1:100,
  modelName = c("EII", "VII"),
  ...
)

Arguments

X1

matrix of genes for phenotype group 1 (rows are genes, columns are samples)

X2

matrix of genes for phenotype group 2 (rows are genes, columns are samples)

cluster_group

which group you want to use for deriving the modules (= 1 or 2)

num_modules

the number of modules to consider. Should be an integer vector specifying the number of possible modules, e.g. 1:50, or you can specify an exact number of modules, e.g. num_modules=20.

modelName

the types of mclust covariance structures to consider. See mclust::Mclust() for details. When clustering a large number of features, we recommend using diagonal covariance structures EII and VII.

...

additional arguments can be passed to mclust::Mclust()

Value

References

Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R journal, 8(1), 289.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
######## load example dataset
library(multtest)
data(golub)
X1 = golub[,which(golub.cl==0)]
X2 = golub[,which(golub.cl==1)]
rownames(X1) = golub.gnames[,3]
rownames(X2) = golub.gnames[,3]

# use subset of 200 genes for example
set.seed(1234)
ind = sample(1:nrow(X1),200)
X1 = X1[ind,]
X2 = X2[ind,]

######## Derive modules in group 1
modules = find_modules(X1,X2,cluster_group=1)
modules$num_modules # number of modules estimated by BIC (modules with < 3 genes are excluded)
ngm = unlist(lapply(modules$group1_modules, ncol)) # number of genes per module
summary(ngm)


######## test modules for differential co-expression
testmods = test_modules(group1_modules = modules$group1_modules,group2_modules = modules$group2_modules)
# View(testmods$pvalues)
# View(testmods$qvalues)

which(testmods$pvalues$PND6 <= 0.05)
which(testmods$qvalues$PND6 <= 0.05)

# use parallel computing:
# testmods = test_modules(group1_modules = modules$group1_modules,
#                        group2_modules = modules$group2_modules,
#                        parallel=TRUE,
#                        cores=4)


######## plot module 5
heat = corrheatmap(modules$group1_modules[[5]],modules$group2_modules[[5]])
# plot(heat$both_plots)
# plot(heat$group1_plot)
# plot(heat$group2_plot)

arbet003/discoMod documentation built on Dec. 19, 2021, 4:36 a.m.