Normal mixture model estimation and selection for a series of cluster numbers

Description

Perform co-expression and co-abudance analysis of high-throughput sequencing data, with or without data transformation, using a Normal mixture models. The output of NormMixClus is an S3 object of class NormMixClus.

Usage

1
2
NormMixClus(y_profiles, K, subset = NULL, parallel = TRUE,
  BPPARAM = bpparam(), ...)

Arguments

y_profiles

(n x q) matrix of observed profiles for n observations and q variables

K

Number of clusters (a single value or a sequence of values).

subset

Optional vector providing the indices of a subset of genes that should be used for the co-expression analysis (i.e., row indices of the data matrix y.

parallel

If FALSE, no parallelization. If TRUE, parallel execution using BiocParallel (see next argument BPPARAM). A note on running in parallel using BiocParallel: it may be advantageous to remove large, unneeded objects from the current R environment before calling the function, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.

BPPARAM

Optional parameter object passed internally to bplapply when parallel=TRUE. If not specified, the parameters last registered with register will be used.

...

Additional optional parameters to be passed to NormMixClus_K.

Value

An S3 object of class NormMixClus containing the following:

nbCluster.all

Vector giving the number of clusters for each of the fitted models

loglike.all

Log likelihoods calculated for each of the fitted models

ICL.all

ICL values calculated for each of the fitted models

ICL.results

Object of class NormMixClus giving the results from the model chosen via the ICL criterion

all.results

List of objects of class NormMixClus giving the results for all models

Author(s)

Andrea Rau, Cathy Maugis-Rabusseau

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Simulate toy data, n = 300 observations
set.seed(12345)
countmat <- matrix(runif(300*4, min=0, max=500), nrow=300, ncol=4)
countmat <- countmat[which(rowSums(countmat) > 0),]
profiles <- transform_RNAseq(countmat, norm="none", 
                             transformation="arcsin")$tcounts

conds <- rep(c("A","B","C","D"), each=2)

## Run the Normal mixture model for K = 2,3
run <- NormMixClus(y=profiles, K=2:3, iter=5)

## Run the Normal mixture model for K=2
run2 <- NormMixClus_K(y=profiles, K=2, iter=5)

## Re-estimate mixture parameters for the model with K=2 clusters
param <- NormMixParam(run2, y_profiles=profiles)

## Summary of results
summary(run, y_profiles=profiles)