Conos | R Documentation |
The class encompasses sample collections, providing methods for calculating and visualizing joint graph and communities.
samples
list of samples (Pagoda2 or Seurat objects)
pairs
pairwise alignment results
graph
alignment graph
clusters
list of clustering results named by clustering type
expression.adj
adjusted expression values
embeddings
list of joint embeddings
embedding
joint embedding
n.cores
number of cores
misc
list with unstructured additional info
override.conos.plot.theme
boolean Whether to override the conos plot theme
new()
initialize Conos class
Conos$new( x, ..., n.cores = parallel::detectCores(logical = FALSE), verbose = TRUE, override.conos.plot.theme = FALSE )
x
a named list of pagoda2 or Seurat objects (one per sample)
...
additional parameters upon initializing Conos
n.cores
numeric Number of cores to use (default=parallel::detectCores(logical=FALSE))
verbose
boolean Whether to provide verbose output (default=TRUE)
override.conos.plot.theme
boolean Whether to reset plot settings to the ggplot2 default (default=FALSE)
a new 'Conos' object
con <- Conos$new(small_panel.preprocessed, n.cores=1)
addSamples()
Initialize or add a set of samples to the conos panel. Note: this will simply add samples, but will not update graph, clustering, etc.
Conos$addSamples(x, replace = FALSE, verbose = FALSE)
x
a named list of pagoda2 or Seurat objects (one per sample)
replace
boolean Whether the existing samples should be purged before adding new ones (default=FALSE)
verbose
boolean Whether to provide verbose output (default=FALSE)
invisible view of the full sample list
buildGraph()
Build the joint graph that encompasses all the samples, establishing weighted inter-sample cell-to-cell links
Conos$buildGraph( k = 15, k.self = 10, k.self.weight = 0.1, alignment.strength = NULL, space = "PCA", matching.method = "mNN", metric = "angular", k1 = k, data.type = "counts", l2.sigma = 1e+05, var.scale = TRUE, ncomps = 40, n.odgenes = 2000, matching.mask = NULL, exclude.samples = NULL, common.centering = TRUE, verbose = TRUE, base.groups = NULL, append.global.axes = TRUE, append.decoys = TRUE, decoy.threshold = 1, n.decoys = k * 2, score.component.variance = FALSE, snn = FALSE, snn.quantile = 0.9, min.snn.jaccard = 0, min.snn.weight = 0, snn.k.self = k.self, balance.edge.weights = FALSE, balancing.factor.per.cell = NULL, same.factor.downweight = 1, k.same.factor = k, balancing.factor.per.sample = NULL )
k
integer integer Size of the inter-sample neighborhood (default=15)
k.self
integer Size of the with-sample neighborhoods (default=10).
k.self.weight
numeric Weight multiplier on the intra-sample edges relative to inter-sample edges (default=0.1)
alignment.strength
numeric Alignment strength (default=NULL will result in alignment.strength=0)
space
character Reduced expression space used to establish putative alignments between pairs of samples (default='PCA'). Currently supported spaces are: — "CPCA" Common principal component analysis — "JNMF" Joint NMF — "genes" Gene expression space (log2 transformed) — "PCA" Principal component analysis — "CCA" Canonical correlation analysis — "PMA" (Penalized Multivariate Analysis <https://cran.r-project.org/web/packages/PMA/index.html>)
matching.method
character Matching method (default='mNN'). Currently supported methods are "NN" (nearest neighbors) or "mNN" (mututal nearest neighbors).
metric
character Distance metric to measure similarity (default='angular'). Currenlty supported metrics are "angular" and "L2".
k1
numeric Neighborhood radius for identifying mutually-matching neighbors (default=k). Note that k1 must be greater than or equal to k, i.e. k1>=k. Increasing k1 beyond k will lead to more aggressive alignment of distinct subpopulations (i.e. increased alignment strengths).
data.type
character Type of data type in the input pagoda2 objects within r.n (default='counts').
l2.sigma
numeric L2 distances get transformed as exp(-d/sigma) using this value (default=1e5)
var.scale
boolean Whether to use common variance scaling (default=TRUE). If TRUE, use geometric means for variance, as we're trying to focus on the common variance components. See scaledMatricesP2() code.
ncomps
integer Number of components (default=40)
n.odgenes
integer Number of overdispersed genes to be used in each pairwise alignment (default=2000)
matching.mask
an optional matrix explicitly specifying which pairs of samples should be compared (a symmetrical matrix of logical values with row and column names corresponding to sample names). (default=NULL). By default, comparisons between all paris are allowed. The argument can be used to exclude comparisons across certain pairs of samples (e.g. techincal replicates, which are expected to show very high similarity).
exclude.samples
optional list of sample names that should be excluded from the alignment and the resulting graph (default=NULL)
common.centering
boolean When calculating reduced expression space for a given sample pair, whether the expression of genes should be centered using the mean from both samples (TRUE) or using the mean within each sample (FALSE) (default=TRUE)
verbose
boolean Whether to provide verbose output (default=TRUE)
base.groups
an optional factor on cells specifying previously-obtained cell grouping to be used for adjusting the sample alignment (default: NULL). Specifically, cell clusters specfiieid by the base.groups can be used to i) calculate global expression axes which are appended to the overall set of eigenvectors, ii) adding decoy cells.
append.global.axes
boolean Whether to project samples on global expression axes, as defined by pre-defined (typically crude) set of cell subpopulations as specified by the base.gruops parameter (default=TRUE, but works only if base.groups is specified)
append.decoys
boolean Whether to use pre-defined cell groups (specified by base.groups) to append decoy cells to the samples which are otherwise lacking any of the pre-specified cell groups (default=TRUE, but works only if base.groups is specified). The decoy cells can reduce the number of erroneous matches in highly heterogeneous sample collections, where some of the samples lack entire cell subpopulations which are found in other samples. The approach only works if the base.groups (typically a crude clustering of top-level cell types) can be established with a reasonable confidence.
decoy.threshold
integer Minimal number of cells of a given cell type that should exist in a given sample (according to base.groups) to avoid addition of decoy cells to that sample for the purposes of alignment (default=1)
n.decoys
integer Number of decoy cells that should be added to a sample that had less than decoy.threshold cells of a given cell type (default=k*2)
score.component.variance
boolean Whether to score the amount of total variance explained by different components (default=FALSE as it takes extra time to calculate)
snn
boolean Whether to transform the joint graph by computing a shared nearest neighborhood graph (analogous to Seurat 3), further weighting the edges between two matched cells based on the similarity (measured by Jaccard coefficient) of all of their predicted neighbors (across all of the samples) (default: FALSE)
snn.quantile
numeric Specifies how the shared neighborhood graph transformation will determine final edge weights. If snn.quantile=NULL, the edge weight will be simply equal to the Jaccard coefficient of the neighborhoods. If snn.quantile is a vector of two numeric values (p1, p2), they will be treated as quantile probabilities, and quantile values (q1,q2) on the set of all Jaccard coefficients (for all edges) will be determiend. The edge weights will then be reset, so that edges with Jaccard coefficients below or equal to q1 will be set to 0, and those with coefficients >=q2 will be set to 1. The rest of the weights will be mapped uniformly from [q1,q2]->[0,1] range. If a single numeric value is supplied, it will be treated as a symmetric quantile probability (i.e. snn.quantile=0.8 is equivalent to specifying snn.quantile=c(1-0.8,0.8)). (default: 0.9)
min.snn.jaccard
numeric Minimum Jaccard coefficient required for a shared neighborhood graph edge (default: 0). The edges with Jaccard coefficients below this threshold will be removed (i.e. weight set to 0)
min.snn.weight
numeric Shared nearest neighbor procedure will adjust the weights of the edges, and even eliminate some of the edges (by setting their weight to zero). The min.snn.weight parameter allows to set a minimal adjusted edge weight, so that the edge weight is never reduced beyond this level (and hence never deleted) (default: 0 - no adjustments)
snn.k.self
integer Size of the within-sample neighorhood to be used in shared nearest neighbor calculations (default=k.self)
balance.edge.weights
boolean Whether to balance edge weights to control for a cell- or sample- specific factor (default=FALSE)
balancing.factor.per.cell
A per-cell factor (discrete factor, named with cell names) specifying a design difference should be controlled for by adjusting edge weights in the joint graph (default=NULL)
same.factor.downweight
numeric Optional weighting factor for edges connecting cells with the same cell factor level per cell balancing (default=1.0)
k.same.factor
integer An neighborhood size that should be used when aligning samples of the same balancing.factor.per.sample level. Setting a value smaller than k will lead to reduction of alingment strenth within the sample batches (default=k)
balancing.factor.per.sample
A covariate factor per sample that should be controlled for by adjusting edge weights in the joint graph (default=NULL)
joint graph to be used for downstream analysis
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE)
getDifferentialGenes()
Calculate genes differentially expressed between cell clusters. Estimates base mean, z-score, p-values, specificity, precision, expressionFraction, AUC (if append.auc=TRUE)
Conos$getDifferentialGenes( clustering = NULL, groups = NULL, z.threshold = 3, upregulated.only = FALSE, verbose = TRUE, append.specificity.metrics = TRUE, append.auc = TRUE )
clustering
character Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)
groups
a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)
z.threshold
numeric Minimum absolute value of a Z score for which the genes should be reported (default=3.0).
upregulated.only
boolean If TRUE, will report only genes significantly upregulated in each cluster; otherwise both up- and down-regulated genes will be reported (default=FALSE)
verbose
boolean Whether to provide verbose output (default=TRUE)
append.specificity.metrics
boolean Whether to append specificity metrics (default=TRUE)
append.auc
boolean Whether to append AUC scores (default=TRUE)
list of DE results; each is a data frame with rows corresponding to the differentially expressed genes, and columns listing log2 fold change (M), signed Z scores (both raw and adjusted for mulitple hypothesis using BH correction), optional specificty/sensitivity and AUC metrics.
findCommunities()
Find cell clusters (as communities on the joint graph)
Conos$findCommunities( method = leiden.community, min.group.size = 0, name = NULL, test.stability = FALSE, stability.subsampling.fraction = 0.95, stability.subsamples = 100, verbose = TRUE, cls = NULL, sr = NULL, ... )
method
community detection method (igraph syntax) (default=leiden.community)
min.group.size
numeric Minimal allowed community size (default=0)
name
character Optional name of the clustering result (will default to the algorithm name) (default=NULL will try to obtain the name from the community detection method, or will use 'community' as a default)
test.stability
boolean Whether to test stability of community detection (default=FALSE)
stability.subsampling.fraction
numeric Fraction of clusters to subset (default=0.95). Must be within range [0, 1].
stability.subsamples
integer Number of subsampling iterations (default=100)
verbose
boolean Whether to provide verbose output (default=TRUE)
cls
optional pre-calculated community result (may be useful for stability testing) (default: NULL)
sr
optional pre-calculated subsampled community results (useful for stability testing) (default: NULL)
...
extra parameters are passed to the specified community detection method
invisible list containing identified communities (groups) and the full community detection result (result); The results are stored in $clusters$name slot in the conos object. Each such slot contains an object with elements: $results which stores the raw output of the community detection method, and $groups which is a factor on cells describing the resulting clustering. The later can be used, for instance, in plotting: con$plotGraph(groups=con$clusters$leiden$groups). If test.stability==TRUE, then the result object will also contain a $stability slot.
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN', metric='angular', score.component.variance=TRUE, verbose=TRUE) con$findCommunities(method = igraph::walktrap.community, steps=5)
plotPanel()
Plot panel of individual embeddings per sample with joint coloring
Conos$plotPanel( clustering = NULL, groups = NULL, colors = NULL, gene = NULL, use.local.clusters = FALSE, plot.theme = NULL, use.common.embedding = FALSE, embedding = NULL, adj.list = NULL, ... )
clustering
character Name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default=NULL - if groups are not specified, the first clustering will be used)
groups
a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default=NULL, see clustering)
colors
a color factor (named with cell names) use for cell coloring
gene
show expression of a gene
use.local.clusters
boolean Whether clusters should be taken from the individual samples; otherwise joint clusters in the conos object will be used (see clustering) (default=FALSE).
plot.theme
string Theme for the plot, passed to plotSamples() (default=NULL)
use.common.embedding
boolean Whether a joint embedding in the conos object should be used (or embeddings determined for the individual samples) (default=FALSE)
embedding
(default=NULL) If a character value is passed, it is interpreted as an embedding name (a name of a joint embedding in conos when use.commmon.embedding=TRUE, or a name of an embedding within the individual objects when use.common.embedding=FALSE). If a matrix is passed, it is interpreted as an actual embedding (then first two columns are interpreted as x/y coordinates, row names must be cell names). If NULL, the default embedding will be used.
adj.list
an optional list of additional ggplot2 directions to apply (default=NULL)
...
Additional parameters passed to plotSamples(), plotEmbeddings(), sccore::embeddingPlot().
cowplot grid object with the panel of plots
embedGraph()
Generate an embedding of a joint graph
Conos$embedGraph( method = "largeVis", embedding.name = method, M = 1, gamma = 1, alpha = 0.1, perplexity = NA, sgd_batches = 1e+08, seed = 1, verbose = TRUE, target.dims = 2, ... )
method
Embedding method (default='largeVis'). Currently 'largeVis' and 'UMAP' are supported.
embedding.name
character Optional name of the name of the embedding set by user to store multiple embeddings (default: method name)
M
numeric (largeVis) The number of negative edges to sample for each positive edge to be used (default=1)
gamma
numeric (largeVis) The strength of the force pushing non-neighbor nodes apart (default=1)
alpha
numeric (largeVis) Hyperparameter used in the default distance function, 1 / (1 + \alpha \dot ||y_i - y_j||^2)
(default=0.1). The function relates the distance
between points in the low-dimensional projection to the likelihood that the two points are nearest neighbors. Increasing \alpha
tends
to push nodes and their neighbors closer together; decreasing \alpha
produces a broader distribution. Setting \alpha
to zero
enables the alternative distance function. \alpha
below zero is meaningless.
perplexity
(largeVis) The perplexity passed to largeVis (default=NA)
sgd_batches
(largeVis) The number of edges to process during SGD (default=1e8). Defaults to a value set based on the size of the dataset. If the parameter given is
between 0
and 1
, the default value will be multiplied by the parameter.
seed
numeric Random seed for the largeVis algorithm (default=1)
verbose
boolean Whether to provide verbose output (default=TRUE)
target.dims
numeric Number of dimensions for the reduction (default=2). Higher dimensions can be used to generate embeddings for subsequent reductions by other methods, such as tSNE
...
additional arguments, passed to UMAP embedding (run ?conos:::embedGraphUmap for more info)
plotClusterStability()
Plot cluster stability statistics.
Conos$plotClusterStability(clustering = NULL, what = "all")
clustering
string Name of the clustering result to show (default=NULL)
what
string Show a specific plot (ari - adjusted rand index, fjc - flat Jaccard, hjc - hierarchical Jaccard, dend - cluster dendrogram, all - everything except 'dend') (default='all')
cluster stability statistics
plotGraph()
Plot joint graph
Conos$plotGraph( color.by = "cluster", clustering = NULL, embedding = NULL, groups = NULL, colors = NULL, gene = NULL, plot.theme = NULL, subset = NULL, ... )
color.by
character A shortcut to color the plot by 'cluster' or by 'sample' (default: 'cluster'). If any other string is input, an error is thrown.
clustering
a character name of the clustering to use (see names(con$clusters)) for the value of the groups factor (default: NULL - if groups are not specified, the first clustering will be used)
embedding
A character name of an embedding, or a matrix of the actual embedding (rownames should correspond to cells, first to columns to x/y coordinates). If NULL (default: NULL), the latest generated embedding will be used
groups
a cell factor (a factor named with cell names) specifying clusters of cells to be compared (one against all). To compare two cell clusters against each other, simply pass a factor containing only two levels (default: NULL, see clustering)
colors
a color factor (named with cell names) use for cell coloring (default=NULL)
gene
Show expression of a gene (default=NULL)
plot.theme
Theme for the plot, passed to sccore::embeddingPlot() (default=NULL)
subset
A subset of cells to show (default: NULL - shows all the cells)
...
Additional parameters passed to sccore::embeddingPlot()
ggplot2 plot of joint graph
correctGenes()
Smooth expression of genes to minimize the batch effect between samples Use diffusion of expression on graph with the equation dv = exp(-a * (v + b))
Conos$correctGenes( genes = NULL, n.od.genes = 500, fading = 10, fading.const = 0.5, max.iters = 15, tol = 0.005, name = "diffusion", verbose = TRUE, count.matrix = NULL, normalize = TRUE )
genes
List of genes to be smooothed smoothing (default=NULL will smooth top n.od.genes overdispersed genes)
n.od.genes
numeric If 'genes' is NULL, top n.od.genes of overdispersed genes are taken across all samples (default=500)
fading
numeric Level of fading of expression change from distance on the graph (parameter 'a' of the equation) (default=10)
fading.const
numeric Minimal penalty for each new edge during diffusion (parameter 'b' of the equation) (default=0.5)
max.iters
numeric Maximal number of diffusion iterations (default=15)
tol
numeric Tolerance after which the diffusion stops (default=5e-3)
name
string Name to save the correction (default='diffusion')
verbose
boolean Verbose mode (default=TRUE)
count.matrix
Alternative gene count matrix to correct (rows: genes, columns: cells; has to be dense matrix). Default: joint count matrix for all datasets.
normalize
boolean Whether to normalize values (default=TRUE)
smoothed expression of the input genes
propagateLabels()
Estimate labeling distribution for each vertex, based on a partial labeling of the cells. There are two methods used for the propagation to calculate the distribution of labels: "solver" and "diffusion". * "diffusion" (default) will estimate the labeling distribution for each vertex, based on provided labels using a random walk. * "solver" will propagate labels using the algorithm described by Zhu, Ghahramani, Lafferty (2003) <http://mlg.eng.cam.ac.uk/zoubin/papers/zgl.pdf> Confidence values are then calculated by taking the maximum value from this distribution of labels, for each cell.
Conos$propagateLabels(labels, method = "diffusion", ...)
labels
Input labels
method
type of propagation. Either 'diffusion' or 'solver'. 'solver' gives better result but has bad asymptotics, so is inappropriate for datasets > 20k cells. (default='diffusion')
...
additional arguments for conos:::propagateLabels* functions
list with three fields: * labels = matrix with distribution of label probabilities for each vertex by rows. * uncertainty = 1 - confidence values * label.distribution = the distribution of labels calculated using either the methods "diffusion" or "solver"
getClusterCountMatrices()
Calculate pseudo-bulk expression matrices for clusters (by adding up, for each gene, all of the molecules detected for all cells in a given cluster in a given sample)
Conos$getClusterCountMatrices( clustering = NULL, groups = NULL, common.genes = TRUE, omit.na.cells = TRUE )
clustering
string Name of the clustering to use
groups
a factor on cells to use for coloring
common.genes
boolean Whether to bring individual sample matrices to a common gene list (default=TRUE)
omit.na.cells
boolean If set to FALSE, the resulting matrices will include a first column named 'NA' that will report total molecule counts for all of the cells that were not covered by the provided factor. (default=TRUE)
a list of per-sample uniform dense matrices with rows being genes, and columns being clusters
getDatasetPerCell()
applies 'getCellNames()' on all samples
Conos$getDatasetPerCell()
list of cellnames for all samples
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getDatasetPerCell()
getJointCountMatrix()
Retrieve joint count matrices
Conos$getJointCountMatrix(raw = FALSE)
raw
boolean If TRUE, return merged "raw" count matrices, using function getRawCountMatrix(). Otherwise, return the merged count matrices, using getCountMatrix(). (default=FALSE)
list of merged count matrices
con <- Conos$new(small_panel.preprocessed, n.cores=1) con$getJointCountMatrix()
clone()
The objects of this class are cloneable with this method.
Conos$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------
## Method `Conos$new`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
## ------------------------------------------------
## Method `Conos$buildGraph`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
## ------------------------------------------------
## Method `Conos$findCommunities`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$buildGraph(k=10, k.self=5, space='PCA', ncomps=10, n.odgenes=20, matching.method='mNN',
metric='angular', score.component.variance=TRUE, verbose=TRUE)
con$findCommunities(method = igraph::walktrap.community, steps=5)
## ------------------------------------------------
## Method `Conos$getDatasetPerCell`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$getDatasetPerCell()
## ------------------------------------------------
## Method `Conos$getJointCountMatrix`
## ------------------------------------------------
con <- Conos$new(small_panel.preprocessed, n.cores=1)
con$getJointCountMatrix()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.