coseq: Co-expression or co-abudance analysis of high-throughput...

Description Usage Arguments Value Author(s) Examples

Description

This is the primary user interface for the coseq package. Generic S4 methods are implemented to perform co-expression or co-abudance analysis of high-throughput sequencing data, with or without data transformation, using K-means or mixture models. The supported classes are matrix, data.frame, and DESeqDataSet. The output of coseq is an S4 object of class coseqResults.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
coseq(object, ...)

## S4 method for signature 'matrix'
coseq(
  object,
  K,
  subset = NULL,
  model = "kmeans",
  transformation = "logclr",
  normFactors = "TMM",
  meanFilterCutoff = NULL,
  modelChoice = ifelse(model == "kmeans", "DDSE", "ICL"),
  parallel = FALSE,
  BPPARAM = bpparam(),
  seed = NULL,
  ...
)

## S4 method for signature 'data.frame'
coseq(
  object,
  K,
  subset = NULL,
  model = "kmeans",
  transformation = "logclr",
  normFactors = "TMM",
  meanFilterCutoff = NULL,
  modelChoice = ifelse(model == "kmeans", "DDSE", "ICL"),
  parallel = FALSE,
  BPPARAM = bpparam(),
  seed = NULL,
  ...
)

## S4 method for signature 'DESeqDataSet'
coseq(
  object,
  K,
  model = "kmeans",
  transformation = "logclr",
  normFactors = "TMM",
  meanFilterCutoff = NULL,
  modelChoice = ifelse(model == "kmeans", "DDSE", "ICL"),
  parallel = FALSE,
  BPPARAM = bpparam(),
  seed = NULL,
  ...
)

Arguments

object

Data to be clustered. May be provided as a y (n x q) matrix or data.frame of observed counts for n observations and q variables, or an object of class DESeqDataSet arising from a differential analysis via DESeq2.

...

Additional optional parameters.

K

Number of clusters (a single value or a vector of values)

subset

Optional vector providing the indices of a subset of genes that should be used for the co-expression analysis (i.e., row indices of the data matrix y. For the generic function coseq, the results of a previously run differential analysis may be used to select a subset of genes on which to perform the co-expression analysis. If this is desired, subset.index can also be an object of class DESeqResults (from the results function in DESeq2).

model

Type of mixture model to use (“Poisson” or “Normal”), or alternatively “kmeans” for a K-means algorithm

transformation

Transformation type to be used: “voom”, “logRPKM” (if geneLength is provided by user), “arcsin”, “logit”, “logMedianRef”, “profile”, “logclr”, “clr”, “alr”, “ilr”, or “none

normFactors

The type of estimator to be used to normalize for differences in library size: (“TC” for total count, “UQ” for upper quantile, “Med” for median, “DESeq” for the normalization method in the DESeq package, and “TMM” for the TMM normalization method (Robinson and Oshlack, 2010). Can also be a vector (of length q) containing pre-estimated library size estimates for each sample, or “none” if no normalization is required.

meanFilterCutoff

Value used to filter low mean normalized counts if desired (by default, set to a value of 50)

modelChoice

Criterion used to select the best model. For Gaussian mixture models, “ICL” (integrated completed likelihood criterion) is currently supported. For Poisson mixture models, “ICL”, “BIC” (Bayesian information criterion), and a non-asymptotic criterion calibrated via the slope heuristics using either the “DDSE” (data-driven slope estimation) or “Djump” (dimension jump) approaches may be used. See the HTSCluster package documentation for more details about the slope heuristics approaches.

parallel

If FALSE, no parallelization. If TRUE, parallel execution using BiocParallel (see next argument BPPARAM). A note on running in parallel using BiocParallel: it may be advantageous to remove large, unneeded objects from the current R environment before calling the function, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.

BPPARAM

Optional parameter object passed internally to bplapply when parallel=TRUE. If not specified, the parameters last registered with register will be used.

seed

If desired, an integer defining the seed of the random number generator. If NULL, a random seed is used.

Value

An S4 object of class coseqResults, where conditional probabilities of cluster membership for each gene in each model is stored as a SimpleList of assay data, and the corresponding log likelihood, ICL value, number of clusters, and form of Gaussian model for each model are stored as metadata.

Author(s)

Andrea Rau

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Simulate toy data, n = 300 observations
set.seed(12345)
countmat <- matrix(runif(300*4, min=0, max=500), nrow=300, ncol=4)
countmat <- countmat[which(rowSums(countmat) > 0),]
conds <- rep(c("A","B","C","D"), each=2)

## Run the Normal mixture model for K = 2,3,4
run_arcsin <- coseq(object=countmat, K=2:4, iter=5, transformation="arcsin",
                    model="Normal", seed=12345)
run_arcsin

## Plot and summarize results
plot(run_arcsin)
summary(run_arcsin)

## Compare ARI values for all models (no plot generated here)
ARI <- compareARI(run_arcsin, plot=FALSE)

## Compare ICL values for models with arcsin and logit transformations
run_logit <- coseq(object=countmat, K=2:4, iter=5, transformation="logit",
                   model="Normal")
compareICL(list(run_arcsin, run_logit))

## Use accessor functions to explore results
clusters(run_arcsin)
likelihood(run_arcsin)
nbCluster(run_arcsin)
ICL(run_arcsin)

## Examine transformed counts and profiles used for graphing
tcounts(run_arcsin)
profiles(run_arcsin)

## Run the K-means algorithm for logclr profiles for K = 2,..., 20
run_kmeans <- coseq(object=countmat, K=2:20, transformation="logclr",
                    model="kmeans")
run_kmeans

coseq documentation built on Nov. 8, 2020, 5:18 p.m.