pagoda.gene.clusters: Determine de-novo gene clusters and associated overdispersion...

Description Usage Arguments Value Examples

View source: R/functions.R

Description

Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.

Usage

1
2
3
4
5
6
pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
  n.clusters = 150, n.samples = 60, cor.method = "p",
  n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
  verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
  method = "ward.D", secondary.correlation = FALSE,
  n.cells = ncol(varinfo$mat), old.results = NULL)

Arguments

varinfo

varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect())

trim

additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns

n.clusters

number of clusters to be determined (recommended range is 100-200)

n.samples

number of randomly generated matrix samples to test the background distribution of lambda1 on

cor.method

correlation method ("pearson", "spearman") to be used as a distance measure for clustering

n.internal.shuffles

number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate)

n.starts

number of wPCA EM algorithm starts at each iteration

n.cores

number of cores to use

verbose

verbosity level

plot

whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit)

show.random

whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation

n.components

number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case)

method

clustering method to be used in determining gene clusters

secondary.correlation

whether clustering should be performed on the correlation of the correlation matrix instead

n.cells

number of cells to use for the randomly generated cluster lambda1 model

old.results

optionally, pass old results just to plot the model without recalculating the stats

Value

a list containing the following fields:

Examples

1
2
3
4
5
6
data(pollen)
cd <- clean.counts(pollen)

knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)

hms-dbmi/scde documentation built on March 29, 2018, 1:23 p.m.