coseq: Co-expression or co-abudance analysis of high-throughput...
In andreamrau/coseq: Co-Expression Analysis of Sequencing Data

Description Usage Arguments Value Author(s) Examples

This is the primary user interface for the coseq package. Generic S4 methods are implemented to perform co-expression or co-abudance analysis of high-throughput sequencing data, with or without data transformation, using K-means or mixture models. The supported classes are matrix, data.frame, and DESeqDataSet. The output of coseq is an S4 object of class coseqResults.

coseq(object, ...)

## S4 method for signature 'matrix'
coseq(
  object,
  K,
  subset = NULL,
  model = "kmeans",
  transformation = "logclr",
  normFactors = "TMM",
  meanFilterCutoff = NULL,
  modelChoice = ifelse(model == "kmeans", "DDSE", "ICL"),
  parallel = FALSE,
  BPPARAM = bpparam(),
  seed = NULL,
  ...
)

## S4 method for signature 'data.frame'
coseq(
  object,
  K,
  subset = NULL,
  model = "kmeans",
  transformation = "logclr",
  normFactors = "TMM",
  meanFilterCutoff = NULL,
  modelChoice = ifelse(model == "kmeans", "DDSE", "ICL"),
  parallel = FALSE,
  BPPARAM = bpparam(),
  seed = NULL,
  ...
)

## S4 method for signature 'DESeqDataSet'
coseq(
  object,
  K,
  model = "kmeans",
  transformation = "logclr",
  normFactors = "TMM",
  meanFilterCutoff = NULL,
  modelChoice = ifelse(model == "kmeans", "DDSE", "ICL"),
  parallel = FALSE,
  BPPARAM = bpparam(),
  seed = NULL,
  ...
)

`object`	Data to be clustered. May be provided as a y (n x q) matrix or data.frame of observed counts for n observations and q variables, or an object of class `DESeqDataSet` arising from a differential analysis via DESeq2.
`...`	Additional optional parameters.
`K`	Number of clusters (a single value or a vector of values)
`subset`	Optional vector providing the indices of a subset of genes that should be used for the co-expression analysis (i.e., row indices of the data matrix `y`. For the generic function `coseq`, the results of a previously run differential analysis may be used to select a subset of genes on which to perform the co-expression analysis. If this is desired, `subset.index` can also be an object of class DESeqResults (from the `results` function in `DESeq2`).
`model`	Type of mixture model to use (“`Poisson`” or “`Normal`”), or alternatively “`kmeans`” for a K-means algorithm
`transformation`	Transformation type to be used: “`voom`”, “`logRPKM`” (if `geneLength` is provided by user), “`arcsin`”, “`logit`”, “`logMedianRef`”, “`profile`”, “`logclr`”, “`clr`”, “`alr`”, “`ilr`”, or “`none`”
`normFactors`	The type of estimator to be used to normalize for differences in library size: (“`TC`” for total count, “`UQ`” for upper quantile, “`Med`” for median, “`DESeq`” for the normalization method in the DESeq package, and “`TMM`” for the TMM normalization method (Robinson and Oshlack, 2010). Can also be a vector (of length q) containing pre-estimated library size estimates for each sample, or “`none`” if no normalization is required.
`meanFilterCutoff`	Value used to filter low mean normalized counts if desired (by default, set to a value of 50)
`modelChoice`	Criterion used to select the best model. For Gaussian mixture models, “`ICL`” (integrated completed likelihood criterion) is currently supported. For Poisson mixture models, “`ICL`”, “`BIC`” (Bayesian information criterion), and a non-asymptotic criterion calibrated via the slope heuristics using either the “`DDSE`” (data-driven slope estimation) or “`Djump`” (dimension jump) approaches may be used. See the `HTSCluster` package documentation for more details about the slope heuristics approaches.
`parallel`	If `FALSE`, no parallelization. If `TRUE`, parallel execution using BiocParallel (see next argument `BPPARAM`). A note on running in parallel using BiocParallel: it may be advantageous to remove large, unneeded objects from the current R environment before calling the function, as it is possible that R's internal garbage collection will copy these files while running on worker nodes.
`BPPARAM`	Optional parameter object passed internally to `bplapply` when `parallel=TRUE`. If not specified, the parameters last registered with `register` will be used.
`seed`	If desired, an integer defining the seed of the random number generator. If `NULL`, a random seed is used.

An S4 object of class coseqResults, where conditional probabilities of cluster membership for each gene in each model is stored as a SimpleList of assay data, and the corresponding log likelihood, ICL value, number of clusters, and form of Gaussian model for each model are stored as metadata.

Andrea Rau

## Simulate toy data, n = 300 observations
set.seed(12345)
countmat <- matrix(runif(300*4, min=0, max=500), nrow=300, ncol=4)
countmat <- countmat[which(rowSums(countmat) > 0),]
conds <- rep(c("A","B","C","D"), each=2)

## Run the Normal mixture model for K = 2,3,4
run_arcsin <- coseq(object=countmat, K=2:4, iter=5, transformation="arcsin",
                    model="Normal", seed=12345)
run_arcsin

## Plot and summarize results
plot(run_arcsin)
summary(run_arcsin)

## Compare ARI values for all models (no plot generated here)
ARI <- compareARI(run_arcsin, plot=FALSE)

## Compare ICL values for models with arcsin and logit transformations
run_logit <- coseq(object=countmat, K=2:4, iter=5, transformation="logit",
                   model="Normal")
compareICL(list(run_arcsin, run_logit))

## Use accessor functions to explore results
clusters(run_arcsin)
likelihood(run_arcsin)
nbCluster(run_arcsin)
ICL(run_arcsin)

## Examine transformed counts and profiles used for graphing
tcounts(run_arcsin)
profiles(run_arcsin)

## Run the K-means algorithm for logclr profiles for K = 2,..., 20
run_kmeans <- coseq(object=countmat, K=2:20, transformation="logclr",
                    model="kmeans")
run_kmeans