Pagoda2: Pagoda2 R6 class

Pagoda2R Documentation

Pagoda2 R6 class

Description

The class encompasses gene count matrices, providing methods for normalization, calculating embeddings, and differential expression.

Public fields

counts

Gene count matrix, normalized on total counts (default=NULL)

modelType

string Model used to normalize count matrices. Only supported values are 'raw', 'plain', and 'linearObs'. – 'plain': Normalize by regressing out on the non-zero observations of each gene (default). – 'raw': Use the raw count matrices, without normalization. The expression matrix taken "as is" without normalization, although log.scale still applies. – 'linearObs': Fit a linear model of pooled counts across all genes against depth. This approach isn't recommened, as the depth dependency is not completely normalized out.

clusters

Results of clustering (default=list())

graphs

Graph representations of the dataset (default=list())

reductions

Results of reductions, e.g. PCA (default=list())

embeddings

Results of visualization algorithms, t-SNE or largeVis (default=list())

diffgenes

Lists of differentially expressed genes (default=list())

n.cores

number of cores (default=1)

misc

list with additional info (default=list())

batch

Batch factor for the dataset (default=NULL)

genegraphs

Slot to store graphical representations in gene space (i.e. gene kNN graphs) (default=list())

depth

Number of molecules measured per cell (default=NULL)

Methods

Public methods


Method new()

Initialize Pagoda2 class

Usage
Pagoda2$new(
  x,
  modelType = "plain",
  n.cores = parallel::detectCores(logical = FALSE),
  verbose = TRUE,
  min.cells.per.gene = 0,
  trim = round(min.cells.per.gene/2),
  min.transcripts.per.cell = 10,
  batch = NULL,
  lib.sizes = NULL,
  log.scale = TRUE,
  keep.genes = NULL
)
Arguments
x

input count matrix

modelType

Model used to normalize count matrices (default='plain'). Only supported values are 'raw', 'plain', and 'linearObs'.

n.cores

numeric Number of cores to use (default=1)

verbose

boolean Whether to give verbose output (default=TRUE)

min.cells.per.gene

integer Minimum number of cells per gene, used to subset counts for coverage (default=0)

trim

numeric Parameter used for winsorizing count data (default=round(min.cells.per.gene/2)). If value>0, will winsorize counts in normalized space in the hopes of getting a more stable depth estimates. If value<=0, ignored.

min.transcripts.per.cell

integer Minimum number of transcripts per cells, used to subset counts for coverage (default=10)

batch

fctor Batch factor for the dataset (default=NULL)

lib.sizes

character vector of library sizes (default=NULL)

log.scale

boolean If TRUE, scale counts by log() (default=TRUE)

keep.genes

list of genes to keep in count matrix after filtering out by coverage but before normalization (default=NULL)

Returns

new Pagoda2 object

Examples
\donttest{ 
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts, log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
}


Method setCountMatrix()

Provide the initial count matrix, and estimate deviance residual matrix (correcting for depth and batch)

Usage
Pagoda2$setCountMatrix(
  countMatrix,
  depthScale = 1000,
  min.cells.per.gene = 0,
  trim = round(min.cells.per.gene/2),
  min.transcripts.per.cell = 10,
  lib.sizes = NULL,
  log.scale = FALSE,
  keep.genes = NULL,
  verbose = TRUE
)
Arguments
countMatrix

input count matrix

depthScale

numeric Scaling factor for normalizing counts (defaul=1e3). If 'plain', counts are scaled by counts = counts/as.numeric(depth/depthScale).

min.cells.per.gene

integer Minimum number of cells per gene, used to subset counts for coverage (default=0)

trim

numeric Parameter used for winsorizing count data (default=round(min.cells.per.gene/2)). If value>0, will winsorize counts in normalized space in the hopes of getting a more stable depth estimates. If value<=0, ignored.

min.transcripts.per.cell

integer Minimum number of transcripts per cells, used to subset counts for coverage (default=10)

lib.sizes

character vector of library sizes (default=NULL)

log.scale

boolean If TRUE, scale counts by log() (default=TRUE)

keep.genes

list of genes to keep in count matrix after filtering out by coverage but before normalization (default=NULL)

verbose

boolean Whether to give verbose output (default=TRUE)

Returns

normalized count matrix (or if modelTye='raw', the unnormalized count matrix)


Method adjustVariance()

Adjust variance of the residual matrix, determine overdispersed sites This is done to normalize the extent to which genes with (very) different expression magnitudes will contribute to the downstream anlaysis.

Usage
Pagoda2$adjustVariance(
  gam.k = 5,
  alpha = 0.05,
  plot = FALSE,
  use.raw.variance = FALSE,
  use.unadjusted.pvals = FALSE,
  do.par = TRUE,
  max.adjusted.variance = 1000,
  min.adjusted.variance = 0.001,
  cells = NULL,
  verbose = TRUE,
  min.gene.cells = 0,
  persist = is.null(cells),
  n.cores = self$n.cores
)
Arguments
gam.k

integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=5). If gam.k<2, linear regression is used 'lm(v ~ m)'.

alpha

numeric The Type I error probability or the significance level (default=5e-2). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.

plot

boolean Whether to output the plot (default=FALSE)

use.raw.variance

(default=FALSE). If modelType='raw', then this conditional will be used as TRUE.

use.unadjusted.pvals

boolean Whether to use Benjamini-Hochberg adjusted p-values (default=FALSE).

do.par

boolean Whether to put multiple graphs into a signle plot with par() (default=TRUE)

max.adjusted.variance

numeric Maximum adjusted variance (default=1e3). The gene scale factor is defined as sqrt(pmax(min.adjusted.variance,pmin(max.adjusted.variance,df$qv))/exp(df$v))

min.adjusted.variance

numeric Minimum adjusted variance (default=1e-3). The gene scale factor is defined as sqrt(pmax(min.adjusted.variance,pmin(max.adjusted.variance,df$qv))/exp(df$v))

cells

character vector Subset of cells upon which to perform variance normalization with adjustVariance() (default=NULL)

verbose

boolean Whether to give verbose output (default=TRUE)

min.gene.cells

integer Minimum number of genes per cells (default=0). This parameter is used to filter counts.

persist

boolean Whether to save results (default=TRUE, i.e. is.null(cells)).

n.cores

numeric Number of cores to use (default=1)

Returns

residual matrix with adjusted variance

Examples
\donttest{
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
}


Method makeKnnGraph()

Create k-nearest neighbor graph

Usage
Pagoda2$makeKnnGraph(
  k = 30,
  nrand = 1000,
  type = "counts",
  weight.type = "1m",
  odgenes = NULL,
  n.cores = self$n.cores,
  distance = "cosine",
  center = TRUE,
  x = NULL,
  p = NULL,
  var.scale = (type == "counts"),
  verbose = TRUE
)
Arguments
k

integer Number of k clusters for k-NN (default=30)

nrand

numeric Number of randomizations i.e. the gene sets (of the same size) to be evaluated in parallel with each gene set (default=1e3)

type

string Data type of the reduction (default='counts'). If type='counts', this will access the raw counts. Otherwise, 'type' must be name of the reductions.

weight.type

string 'cauchy', 'normal', 'constant', '1m' (default='1m')

odgenes

character vector Overdispersed genes to retrieve (default=NULL)

n.cores

numeric Number of cores to use (default=1)

distance

string Distance metric used: 'cosine', 'L2', 'L1', 'cauchy', 'euclidean' (default='cosine')

center

boolean Whether to use centering when distance='cosine' (default=TRUE). The parameter is ignored otherwise.

x

counts or reduction to use (default=NULL). If NULL, uses counts. Otherwise, checks for the reduction in self$reductions[[type]]

p

(default=NULL)

var.scale

boolean Apply scaling if using raw counts (default=TRUE). If type="counts", var.scale is TRUE by default.

verbose

boolean Whether to give verbose output (default=TRUE)

Returns

kNN graph, stored in self$graphs

Examples
\donttest{
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=300)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object   
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=20, center=FALSE, distance='L2')
}


Method getKnnClusters()

Calculate clusters based on the kNN graph

Usage
Pagoda2$getKnnClusters(
  type = "counts",
  method = igraph::multilevel.community,
  name = "community",
  n.cores = self$n.cores,
  g = NULL,
  min.cluster.size = 1,
  persist = TRUE,
  ...
)
Arguments
type

string Data type (default='counts'). Currently only 'counts' supported.

method

Method to use (default=igraph::multilevel.community). Accepted methods are either 'igraph::infomap.community' or 'igraph::multilevel.community'. If NULL, if the number of vertices of the graph is greater than or equal to 2000, 'igraph::multilevel.community' will be used. Otherwise, 'igraph::infomap.community' will be used.

name

string Name of the community structure calculated from 'method' (default='community')

n.cores

numeric Number of cores to use (default=1)

g

Input graph (default=NULL). If NULL, access graph from self$graphs[[type]].

min.cluster.size

Minimum size of clusters (default=1). This parameter is primarily used to remove very small clusters.

persist

boolean Whether to save the clusters and community structure (default=TRUE)

...

Additional parameters to pass to 'method'

Returns

the community structure calculated from 'method'


Method geneKnnbyPCA()

Deprecated function. Use makeGeneKnnGraph() instead.

Usage
Pagoda2$geneKnnbyPCA()

Method getHierarchicalDiffExpressionAspects()

Take a given clustering and generate a hierarchical clustering

Usage
Pagoda2$getHierarchicalDiffExpressionAspects(
  type = "counts",
  groups = NULL,
  clusterName = NULL,
  method = "ward.D",
  dist = "pearson",
  persist = TRUE,
  z.threshold = 2,
  n.cores = self$n.cores,
  min.set.size = 5,
  verbose = TRUE
)
Arguments
type

string Data type of the reduction (default='counts'). If type='counts', this will access the raw counts. Otherwise, 'type' must be name of the reductions.

groups

factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.

clusterName

string Cluster name to access (default=NULL)

method

string The agglomeration method to be used in stats::hcust(method=method) (default='ward.D'). Accepted values are: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). For more information, see stats::hclust().

dist

string 'pearson', 'spearman', 'euclidean', 'L2', 'JS' (default='pearson')

persist

boolean Whether to save the clusters and community structure (default=TRUE)

z.threshold

numeric Threshold of z-scores to filter, >=z.threshold are kept (default=2)

n.cores

numeric Number of cores to use (default=1)

min.set.size

integer Minimum threshold of sets to keep (default=5)

verbose

boolean Whether to give verbose output (default=TRUE)

Returns

hierarchical clustering


Method makeGeneKnnGraph()

Calculates gene Knn network for gene similarity

Usage
Pagoda2$makeGeneKnnGraph(
  nPcs = 100,
  center = TRUE,
  fastpath = TRUE,
  maxit = 1000,
  k = 30,
  n.cores = self$n.cores,
  verbose = TRUE
)
Arguments
nPcs

integer Number of principal components (default=100). This is the parameter 'nv' in irlba::irlba(), the number of right singular vectors to estimate.

center

boolean Whether to center the PCA (default=TRUE)

fastpath

boolean Whether to try a (fast) C algorithm implementation if possible (default=TRUE). This parameter is equivalent to 'fastpath' in irlba::irlba().

maxit

integer Maximum number of iterations (default=1000). This parameter is equivalent to 'maxit' in irlba::irlba().

k

integer Number of k clusters for calculating k-NN on the resulting principal components (default=30).

n.cores

numeric Number of cores to use (default=1)

verbose

boolean Whether to give verbose output (default=TRUE)

Returns

graph with gene similarity


Method getDensityClusters()

Calculate density-based clusters

Usage
Pagoda2$getDensityClusters(
  type = "counts",
  embeddingType = NULL,
  name = "density",
  eps = 0.5,
  v = 0.7,
  s = 1,
  verbose = TRUE,
  ...
)
Arguments
type

string Data type (default='counts'). Currently only 'counts' supported.

embeddingType

The type of embedding used when calculating with ‘getEmbedding()' (default=NULL). Accepted values are: ’largeVis', 'tSNE', 'FR', 'UMAP', 'UMAP_graph'

name

string Name fo the clustering (default='density').

eps

numeric value of the eps parameter, fed into dbscan::dbscan(x=emb, eps=eps, ...)

v

numeric The “value” to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().

s

numeric The “saturation” to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().

verbose

boolean Whether to give verbose output (default=TRUE)

...

Additional parameters passed to dbscan::dbscan(emb, ...)

Returns

density-based clusters


Method getDifferentialGenes()

Determine differentially expressed genes, comparing each group against all others using Wilcoxon rank sum test

Usage
Pagoda2$getDifferentialGenes(
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  name = "customClustering",
  z.threshold = 3,
  upregulated.only = FALSE,
  verbose = FALSE,
  append.specificity.metrics = TRUE,
  append.auc = FALSE
)
Arguments
type

string Data type (default='counts'). Currently only 'counts' supported.

clusterType

Optional cluster type to use as a group-defining factor (default=NULL)

groups

factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.

name

string Slot to store the results in (default='customClustering')

z.threshold

numeric Minimal absolute Z score (adjusted) to report (default=3)

upregulated.only

boolean Whether to report only genes that are expressed significantly higher in each group (default=FALSE)

verbose

boolean Whether to give verbose output (default=FALSE)

append.specificity.metrics

boolean Whether to append specifity metrics (default=TRUE). Uses the function sccore::appendSpecificityMetricsToDE().

append.auc

boolean If TRUE, append AUC values (default=FALSE). Parameter ignored if append.specificity.metrics is FALSE.

Returns

List with each element of the list corresponding to a cell group in the provided/used factor (i.e. factor levels) Each element of a list is a data frame listing the differentially epxressed genes (row names), with the following columns: Z - adjusted Z score, with positive values indicating higher expression in a given group compare to the rest M - log2 fold change highest- a boolean flag indicating whether the expression of a given gene in a given vcell group was on average higher than in every other cell group fe - fraction of cells in a given group having non-zero expression level of a given gene


Method plotDiffGeneHeatmap()

Plot heatmap of DE results

Usage
Pagoda2$plotDiffGeneHeatmap(
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  n.genes = 100,
  z.score = 2,
  gradient.range.quantile = 0.95,
  inner.clustering = FALSE,
  gradientPalette = NULL,
  v = 0.8,
  s = 1,
  box = TRUE,
  drawGroupNames = FALSE,
  ...
)
Arguments
type

string Data type (default='counts'). Currently only 'counts' supported.

clusterType

Optional cluster type to use as a group-defining factor (default=NULL)

groups

factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.

n.genes

integer Number of genes to plot (default=100)

z.score

numeric Threshold of z-scores to filter (default=2). Only greater than or equal to this value are kept.

gradient.range.quantile

numeric Trimming quantile (default=0.95)

inner.clustering

boolean Whether to cluster cells within each cluster (default=FALSE)

gradientPalette

palette of colors to use (default=NULL). If NULL, uses 'colorRampPalette(c('gray90','red'), space = "Lab")(1024)'

v

numeric The “value” to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().

s

numeric The “saturation” to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().

box

boolean Whether to draw a box around the current plot in the given color and linetype (default=TRUE)

drawGroupNames

boolean Whether to draw group names (default=FALSE)

...

Additional parameters passed to internal function used for heatmap plotting, my.heatmap2()

Returns

heatmap of DE results


Method getRefinedLibSizes()

Recalculate library sizes using robust regression within clusters

Usage
Pagoda2$getRefinedLibSizes(
  clusterType = NULL,
  groups = NULL,
  type = "counts",
  n.cores = self$n.cores
)
Arguments
clusterType

Name of cluster to access (default=NULL). If NULL, takes the most recently generated clustering. Parameter ignored if groups is not NULL.

groups

factor named with cell names specifying the clusters of cells (default=NULL)

type

string Either 'counts' or the name of a stored embedding, names(self$embeddings) (default=NULL)

n.cores

numeric Number of cores to use (default=self$n.cores=1)

Returns

recalculated library sizes


Method plotGeneHeatmap()

Plot heatmap for a given set of genes

Usage
Pagoda2$plotGeneHeatmap(
  genes,
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  gradient.range.quantile = 0.95,
  cluster.genes = FALSE,
  inner.clustering = FALSE,
  gradientPalette = NULL,
  v = 0.8,
  s = 1,
  box = TRUE,
  drawGroupNames = FALSE,
  useRaster = TRUE,
  smooth.span = max(1, round(nrow(self$counts)/1024)),
  ...
)
Arguments
genes

character vector Gene names

type

string Data type (default='counts'). Currently only 'counts' supported.

clusterType

Optional cluster type to use as a group-defining factor (default=NULL)

groups

factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.

gradient.range.quantile

numeric Trimming quantile (default=0.95)

cluster.genes

boolean Whether to cluster genes within each cluster using stats::hclust() (default=FALSE)

inner.clustering

boolean Whether to cluster cells within each cluster (default=FALSE)

gradientPalette

palette of colors to use (default=NULL). If NULL, uses 'colorRampPalette(c('gray90','red'), space = "Lab")(1024)'

v

numeric The “value” to be used to complete the HSV color descriptions (default=0.7). Equivalent to the 'v' parameter in grDevices::rainbow().

s

numeric The “saturation” to be used to complete the HSV color descriptions (default=1). Equivalent to the 's' parameter in grDevices::rainbow().

box

boolean Whether to draw a box around the current plot in the given color and linetype (default=TRUE)

drawGroupNames

boolean Whether to draw group names (default=FALSE)

useRaster

boolean If TRUE a bitmap raster is used to plot the image instead of polygons (default=TRUE). The grid must be regular in that case, otherwise an error is raised. For more information, see graphics::image().

smooth.span

(default=max(1,round(nrow(self$counts)/1024)))

...

Additional parameters passed to internal function used for heatmap plotting, my.heatmap2()

Returns

plot of gene heatmap


Method plotEmbedding()

Show embedding

Usage
Pagoda2$plotEmbedding(
  type = NULL,
  embeddingType = NULL,
  clusterType = NULL,
  groups = NULL,
  colors = NULL,
  gene = NULL,
  plot.theme = ggplot2::theme_bw(),
  ...
)
Arguments
type

string Either 'counts' or the name of a stored embedding, names(self$embeddings) (default=NULL)

embeddingType

string Embedding type (default=NULL). If NULL, takes the most recently generated embedding.

clusterType

Name of cluster to access (default=NULL). If NULL, takes the most recently generated clustering. Parameter ignored if groups is not NULL.

groups

factor named with cell names specifying the clusters of cells (default=NULL)

colors

character vector List of gene names (default=NULL)

gene

(default=NULL)

plot.theme

(default=ggplot2::theme_bw())

...

Additional parameters passed to sccore::embeddingPlot()

Returns

plot of the embedding


Method getOdGenes()

Get overdispersed genes

Usage
Pagoda2$getOdGenes(
  n.odgenes = NULL,
  alpha = 0.05,
  use.unadjusted.pvals = FALSE
)
Arguments
n.odgenes

integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.

alpha

numeric The Type I error probability or the significance level (default=5e-2). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.

use.unadjusted.pvals

boolean Whether to use Benjamini-Hochberg adjusted p-values (default=FALSE).

Returns

vector of overdispersed genes


Method getNormalizedExpressionMatrix()

Return variance-normalized matrix for specified genes or a number of OD genes

Usage
Pagoda2$getNormalizedExpressionMatrix(genes = NULL, n.odgenes = NULL)
Arguments
genes

vector of gene names to explicitly return (default=NULL)

n.odgenes

integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.

Returns

cell by gene matrix


Method calculatePcaReduction()

Calculate PCA reduction of the data

Usage
Pagoda2$calculatePcaReduction(
  nPcs = 20,
  type = "counts",
  name = "PCA",
  use.odgenes = TRUE,
  n.odgenes = NULL,
  odgenes = NULL,
  center = TRUE,
  cells = NULL,
  fastpath = TRUE,
  maxit = 100,
  verbose = TRUE,
  var.scale = (type == "counts"),
  ...
)
Arguments
nPcs

numeric Number of principal components (PCs) (default=20)

type

string Dataset view to reduce (counts by default, but can specify a name of an existing reduction) (default='counts')

name

string Name for the PCA reduction to be created (default='PCA')

use.odgenes

boolean Whether pre-calculated set of overdispersed genes should be used (default=TRUE)

n.odgenes

integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.

odgenes

Explicitly specify a set of overdispersed genes to use for the reduction (default=NULL)

center

boolean Whether data should be centered prior to PCA (default=TRUE)

cells

optional subset of cells on which PCA should be run (default=NULL)

fastpath

boolean Use C implementation for speedup (default=TRUE)

maxit

numeric Maximum number of iterations (default=100). For more information, see 'maxit' parameter in irlba::irlba().

verbose

boolean Whether to give verbose output (default=TRUE)

var.scale

boolean Apply scaling if using raw counts (default=TRUE). If type="counts", var.scale is TRUE by default.

...

additional arguments forwarded to irlba::irlba

Returns

Invisible PCA result (the reduction itself is saved in self$reductions[[name]])"


Method expandOdGenes()

Reset overdispersed genes 'odgenes' to be a superset of the standard odgene selection (guided by n.odgenes or alpha), and a set of recursively determined odgenes based on a given group (or a cluster info)

Usage
Pagoda2$expandOdGenes(
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  min.group.size = 30,
  od.alpha = 0.1,
  use.odgenes = FALSE,
  n.odgenes = NULL,
  odgenes = NULL,
  n.odgene.multiplier = 1,
  gam.k = 10,
  verbose = FALSE,
  n.cores = self$n.cores,
  min.odgenes = 10,
  max.odgenes = Inf,
  recursive = TRUE
)
Arguments
type

string Data type (default='counts'). Currently only 'counts' supported.

clusterType

Optional cluster type to use as a group-defining factor (default=NULL)

groups

factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.

min.group.size

integer Number of minimum cells for filtering out group size (default=30)

od.alpha

numeric The Type I error probability or the significance level for calculating overdispersed genes (default=1e-1). This is the criterion used to measure statistical significance, i.e. if the p-value < alpha, then it is statistically significant.

use.odgenes

boolean Whether pre-calculated set of overdispersed genes should be used (default=FALSE)

n.odgenes

integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.

odgenes

Explicitly specify a set of overdispersed genes to use for the reduction (default=NULL) #' @param odgenes (default=NULL)

n.odgene.multiplier

numeric (default=1)

gam.k

integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=10). If gam.k<2, linear regression is used 'lm(v ~ m)'.

verbose

boolean Whether to give verbose output (default=TRUE)

n.cores

numeric Number of cores to use (default=1)

min.odgenes

integer Minimum number of overdispersed genes to use (default=10)

max.odgenes

integer Maximum number of overdispersed genes to use (default=Inf)

recursive

boolean Whether to determine groups for which variance normalization will be rerun (default=TRUE)

Returns

List of overdispersed genes


Method localPcaKnn()

local PCA implementation

Usage
Pagoda2$localPcaKnn(
  nPcs = 5,
  type = "counts",
  clusterType = NULL,
  groups = NULL,
  k = 30,
  b = 1,
  a = 1,
  min.group.size = 30,
  name = "localPCA",
  od.alpha = 0.1,
  n.odgenes = NULL,
  gam.k = 10,
  verbose = FALSE,
  n.cores = self$n.cores,
  min.odgenes = 5,
  take.top.odgenes = FALSE,
  recursive = TRUE,
  euclidean = FALSE,
  perplexity = k,
  return.pca = FALSE,
  skip.pca = FALSE
)
Arguments
nPcs

integer Number of principal components (default=5)

type

string Data type (default='counts'). Currently only 'counts' supported.

clusterType

Optional cluster type to use as a group-defining factor (default=NULL)

groups

factor named with cell names specifying the clusters of cells to be compared (one against all) (default=NULL). To compare two cell clusters against each other, simply pass a factor containing only two levels.

k

integer Number of components for kNN graph (default=30)

b

numeric Constant within exp(-b*(ncid/cldsd)^2), used for calculating cell relevance per cluster (default=1)

a

numeric Constant within "(1-exp(-a*(dsq)/(p$pcs$trsd^2)))*(pk /outerproduct pk)" (default=1)

min.group.size

integer Number of minimum cells for filtering out group size (default=30)

name

string Title (default='localPCA')

od.alpha

numeric Significance level for calculating overdispersed genes (default=1e-1). P-values will be filtered by <log(od.alpha).

n.odgenes

integer Number of overdispersed genes to retrieve (default=NULL). If NULL, will return all.

gam.k

integer The k used for the generalized additive model 'v ~ s(m, k =gam.k)' (default=10). If gam.k<2, linear regression is used 'lm(v ~ m)'.

verbose

boolean Whether to give verbose output (default=TRUE)

n.cores

numeric Number of cores to use (default=1)

min.odgenes

integer Minimum number of overdispersed genes to use (default=5)

take.top.odgenes

boolean Take top overdispersed genes in decreasing order (default=FALSE)

recursive

boolean Whether to recursively determine groups for which variance normalization will be rerun (default=FALSE)

euclidean

boolean Whether to applied euclidean-based distance similarity during variance normalization (default=FALSE)

perplexity

integer Perplexity parameter within Rtsne::Rtsne() (default=k). Please see Rtsne for more details.

return.pca

boolean Whether to return the PCs (default=FALSE)

skip.pca

boolean If TRUE and return.pca=TRUE, will return a list of scale factors, cells, and overdispersed genes, i.e. list(sf=sf, cells=cells, odgenes=odgenes) (default=FALSE). Otherwise, ignored.

Returns

localPcaKnn return here


Method testPathwayOverdispersion()

Test pathway overdispersion Note: this is a compressed version of the PAGODA1 approach in SCDE <https://hms-dbmi.github.io/scde/>

Usage
Pagoda2$testPathwayOverdispersion(
  setenv,
  type = "counts",
  max.pathway.size = 1000,
  min.pathway.size = 10,
  n.randomizations = 5,
  verbose = FALSE,
  n.cores = self$n.cores,
  score.alpha = 0.05,
  plot = FALSE,
  cells = NULL,
  adjusted.pvalues = TRUE,
  z.score = qnorm(0.05/2, lower.tail = FALSE),
  use.oe.scale = FALSE,
  return.table = FALSE,
  name = "pathwayPCA",
  correlation.distance.threshold = 0.2,
  loading.distance.threshold = 0.01,
  top.aspects = Inf,
  recalculate.pca = FALSE,
  save.pca = TRUE
)
Arguments
setenv

Specific environment for pathway analysis

type

string Data type (default='counts'). Currently only 'counts' supported.

max.pathway.size

integer Maximum number of observed genes in a valid gene set (default=1e3)

min.pathway.size

integer Minimum number of observed genes that should be contained in a valid gene set (default=10)

n.randomizations

numeric Number of random gene sets (of the same size) to be evaluated in parallel with each gene set (default=5). (This can be kept at 5 or 10, but should be increased to 50-100 if the significance of pathway overdispersion will be determined relative to random gene set models.)

verbose

boolean Whether to give verbose output (default=TRUE)

n.cores

numeric Number of cores to use (default=1)

score.alpha

numeric Significance level of the confidence interval for determining upper/lower bounds (default=0.05)

plot

boolean Whether to output the plot (default=FALSE)

cells

character vector Specific cells to investigate (default=NULL)

adjusted.pvalues

boolean Whether to use adjusted p-values (default=TRUE)

z.score

numeric Z-score to be used as a cutoff for statistically significant patterns (default=qnorm(0.05/2, lower.tail = FALSE))

use.oe.scale

boolean Whether the variance of the returned aspect patterns should be normalized using observed/expected value instead of the default chi-squared derived variance corresponding to overdispersion Z-score (default=FALSE)

return.table

boolean Whether to return a text table with results (default=FALSE)

name

string Title (default='pathwayPCA')

correlation.distance.threshold

numeric Similarity threshold for grouping interdependent aspects in pagoda.reduce.redundancy() (default=0.2)

loading.distance.threshold

numeric Similarity threshold for grouping interdependent aspects in pagoda.reduce.loading.redundancy() (default=0.2)

top.aspects

Restrict output to the top N aspects of heterogeneity (default=Inf)

recalculate.pca

boolean Whether to recalculate PCA (default=FALSE)

save.pca

boolean Whether to save the PCA results (default=TRUE). If TRUE, caches them in self$misc[['pwpca']].

Returns

pathway output


Method getEmbedding()

Return embedding

Usage
Pagoda2$getEmbedding(
  type = "counts",
  embeddingType = "largeVis",
  name = NULL,
  dims = 2,
  M = 1,
  gamma = 1/M,
  perplexity = 50,
  verbose = TRUE,
  sgd_batches = NULL,
  diffusion.steps = 0,
  diffusion.power = 0.5,
  distance = "pearson",
  n.cores = self$n.cores,
  n.sgd.cores = n.cores,
  ...
)
Arguments
type

string Data type (default='counts'). Currently only 'counts' supported.

embeddingType

string Type of embedding to construct (default='largeVis'). Possible values are: 'largeVis', 'tSNE', 'FR' (Fruchterman–Reingold), 'UMAP', 'UMAP_graph'

name

string Name of the embedding (default=NULL). If NULL, the name = embeddingType.

dims

integer Parameter 'dims' Matrix::sparseMatrix(); a non-negative, integer, dimensions vector of length 2 (default=2). See Matrix package documentation for more details.

M

numeric (largeVis) The number of negative edges to sample for each positive edge (default=5). Parameter only used if embeddingType is 'largeVis'.

gamma

numeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=7). Parameter only used if embeddingType is 'largeVis'.

perplexity

numeric Parameter 'perplexity' within largeVis::buildWijMatrix() (default=50). Please see the largeVis documentation for more details.

verbose

boolean Whether to give verbose output (default=TRUE)

sgd_batches

numeric The number of edges to process during SGD (default=NULL). Passed to projectKNNs(). Defaults to a value set based on the size of the dataset. If the parameter given is between 0 and 1, the default value will be multiplied by the parameter.

diffusion.steps

integer Iteration steps to use. If 0, no steps are run. (default=0)

diffusion.power

numeric Factor to be used when calculating diffusion, (default=0.5)

distance

string 'pearson', 'spearman', 'euclidean', 'L2', 'JS' (default='pearson')

n.cores

numeric Number of cores to use (default=1)

n.sgd.cores

numeric Number of cores to use (default=n.cores)

...

Additional parameters passed to embedding functions, Rtsne::Rtsne() if 'L2', uwot::umap() if 'UMAP', embedKnnGraphUmap() if 'UMAP_graph'

Returns

embedding stored in self$embedding


Method clone()

The objects of this class are cloneable with this method.

Usage
Pagoda2$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

Simon Steiger

Examples


## ------------------------------------------------
## Method `Pagoda2$new`
## ------------------------------------------------

 
## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts, log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 



## ------------------------------------------------
## Method `Pagoda2$adjustVariance`
## ------------------------------------------------


## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=500)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object 
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)



## ------------------------------------------------
## Method `Pagoda2$makeKnnGraph`
## ------------------------------------------------


## Load pre-generated a dataset of 50 bone marrow cells as matrix
cm <- readRDS(system.file("extdata", "sample_BM1_50.rds", package="pagoda2"))
## Perform QC, i.e. filter any cells that
counts <- gene.vs.molecule.cell.filter(cm, min.cell.size=300)
rownames(counts) <- make.unique(rownames(counts))
## Generate Pagoda2 object   
p2_object <- Pagoda2$new(counts,log.scale=TRUE, min.cells.per.gene=10, n.cores=1) 
## Normalize gene expression variance
p2_object$adjustVariance(plot=TRUE, gam.k=10)
## Generate a kNN graph of cells that will allow us to identify clusters of cells
p2_object$makeKnnGraph(k=20, center=FALSE, distance='L2')



pagoda2 documentation built on June 24, 2024, 9:09 a.m.