| cluster_assessment | R Documentation |
tool for assessment and comparison of cluster partitions based on different:filtering, feature selection, normalization, batch correction, imputation, clustering algorithms
cluster_assessment( assessment_list = NULL, seuratobject = NULL, seurat_assay = "RNA", seurat_lib_size = F, do.features = T, var_feat_len = NULL, RaceIDobject = NULL, RaceID_cl_table = NULL, ScanpyobjectFullpath = NULL, scanpy_clust = "leiden", scanpyscalefactor = 10000, rawdata = NULL, ndata = NULL, norm = T, givepart = NULL, givefeatures = NULL, minexpr = 5, CGenes = NULL, ccor = 0.65, fselectRace = F, fselectSeurat = F, givebatch = NULL, individualbatch = NULL, gene.domain = F, PCA_QA = F, PCAnum = 10, run_cutoff = T, f1Z = F, cutoff = "mean", cutoffmax = F, clustsize = 10, binaclassi = "F1Score", Entro_tresh = T, Entro_med = T, run_enriched = T, give2ndfiff = T, diffexp = "nbino", vfit = NULL, gooutlier = T, individualfit = F, outminc = 5, probthr = 0.01, diptest = T, bwidth = T, critmass = T, mintotal = 3000, unifrac = 0.1, logmodetest = F, b_bw = 25, n_bw = 128, b_ACR = 100, n_ACR = 1024, batch_entropy = F, set.name = NULL, rawdata_null = T )
assessment_list |
list, with named objects for different assessments, to which new assessment is added. Default is |
seuratobject |
Seurat object as input for assessment: derives UMI count object, normalized count object, cluster partition and variable features from Seurat Object. Default = |
seurat_assay |
if |
seurat_lib_size |
logical. If |
do.features |
logical. If |
var_feat_len |
number of top variable genes used for cluster assessment, if |
RaceIDobject |
RaceID object as input for assessment: derives UMI count data of cells passing filtering criteria, normalized data, cluster partition, feature genes, background noise model describing the expression variance of genes as a function of their mean and RaceID filtering criteria. Default = |
RaceID_cl_table |
metadata data frame for a RaceID object in similar form as meta.data object of a Seurat object with rows as cells and columns as e.g. different cluster partitions. Default = |
ScanpyobjectFullpath |
full path to scanpy object in h5ad format, which is converted to Seurat object from which UMI counts, cluster partition and feature genes are derived. Using UMI count data and scale factor, library size normalization is performed and scaled using the scale factor. |
scanpy_clust |
either “leiden” or “louvain”, derives cluster partition of either Leiden or Louvain clustering. Default=”leiden”. |
scanpyscalefactor |
integer number with which relative cell counts are scaled to equal transcript counts. Default = 10,000. |
rawdata |
UMI count expression data with genes as rows and cells as columns. Default = |
ndata |
normalized expression data with genes as rows and cells as columns. Default = |
norm |
performs library size normalization on provided rawdata argument. Default = |
givepart |
clustering partition. Either a vector of integer cluster number for each cell in the same order as UMI count table or normalized count table for RaceIDobject; or a character string representing a column name of Seurat metadata data frame of a Seurat object or similar metadata frame, |
givefeatures |
gene vector to perform assessment. Default = |
minexpr |
minimum required transcript count of a gene across evaluated cells. Genes not passing criteria are filtered out. Default 5. If |
CGenes |
gene vector for genes to exclude from feature selection. Only relevant if |
ccor |
integer value of correlation coefficient used as threshold for determining genes correlated to genes in |
fselectRace |
logical. If |
fselectSeurat |
logical. If |
givebatch |
vector indicating batch information for cells; must have the same length and order as cluster partition. Default = |
individualbatch |
individual batch name, element of |
gene.domain |
logical. If |
PCA_QA |
logical. If |
PCAnum |
integer value, number of genes to be derived with top highest and top lowest loadings for the first two principal components. Default = 10. |
run_cutoff |
logical. If |
cutoff |
either “mean” or “median”, utilizes either per gene average expression within clusters or per gene median expression within clusters to calculate the true label cutoff. The Cutoff is calculated per gene by selecting the cluster with highest average or median expression and averaging this mean, with the mean or median of the remaining clusters. |
cutoffmax |
logical. If |
clustsize |
integer value, threshold of minimum number of cells a cluster should have to be included in the assessment. |
binaclassi |
either “F1Score”, “Cohenkappa”, “MCC” or NULL. Statistical analysis for binary classification. F1Score, Cohenkappa or Matthews correlation coefficient (MCC). If |
Entro_tresh |
logical. If |
Entro_med |
logical. If |
run_enriched |
logical. If |
give2ndfiff |
logical. If |
diffexp |
either “nbino” or “wilcox”. Performs differential expression analysis between cells of clusters with highest number of co-enriched genes for these co-enriched genes based on Wilcoxon test or negative binomial distribution test utilizing global gene mean-variance dependence. Default = “nbino”. |
vfit |
function of the background noise model describing the expression variance as a function of the mean expression. Input can be utilized for differential expression analysis between co-enriched genes and identification of outlier gene-expression within cluster in outlier analysis. Default = |
gooutlier |
logical. If |
individualfit |
logical. If |
outminc |
integer value, minimal transcript count of a gene to be included in the background fit. |
probthr |
integer value, outlier probability threshold for genes to exhibit outlier expression within a cluster. Probability is computed from a negative binomial background model of expression in a cluster. |
diptest |
logical. If |
bwidth |
logical. If |
critmass |
logical. If |
mintotal |
minimal number of transcripts cells are expected to have, to calculate expression cutoff. Default = 3000 |
unifrac |
fraction of cluster required to exhibit at least scaled |
logmodetest |
logical. If |
b_bw |
number of replicates used for Silverman’s critical bandwith test, default = 25. |
n_bw |
number of equally spaced points at which density is estimated, for Silverman’s critical bandwith test, default = 128. |
b_ACR |
number of replicates used for Ameijeiras-Alonsos’s unimodality test, default = 100. |
n_ACR |
number of equally spaced points at which density is estimated, for Ameijeiras-Alonsos’s unimodality test, default = 1024. |
set.name |
set name for individual assessment within output of list of assessments. Default = |
rawdata_null |
logical. If |
logical. |
If |
batch_entrop |
logical. If |
List of assessments, with a named object per assessment. Individual assessments represent a list with the following objects:
rawdata |
Raw expression data matrix/UMI count matrix derived from input objects, with cells as columns and genes as rows in sparse matrix format. |
rowmean |
mean expression of assessed features. |
part |
vector containing cluster partition derived from input objects. |
clustsize |
threshold of minimum number of cells in a cluster used for assessment. |
features |
vector of feature genes derived from object, used to compute its cluster partition. |
assessed_features |
vector of features assessed through assess me function, can differ from |
PCA |
data.frame with 4 columns, indicating top PCAnum genes with: highest loadings for PC1, lowest loadings for PC1, highest loadings for PC2 and lowest loadings for PC2. |
max_cl |
vector indicating for assessed features which cluster exhibits highest mean expression. |
cutoff |
vector indicating calculated numeric cutoff for assessed features. |
f1_score |
vector indicating f1_score or alternative statistical analysis for binary classification, for the assessed features. |
Entropy_tresh |
vector indicating Entropy per assessed feature, calculated based on the per gene cutoff. |
Entropy_median |
ector indicating Entropy per assessed feature, calculated based on per gene median expression per cluster and fraction of individual medians of summed median across clusters. |
cluster |
vector indicating assessed clusters. |
enriched_features |
number of enriched features per cluster. |
enriched_feature_list |
list with a vector per cluster of enriched features. |
unique_features |
number of uniquely enriched features per cluster. |
unique_feature_list |
list with a vector per cluster of uniquely enriched features. |
second_cluster |
data.frame with rows representing a cluster and its closest clusters based on co-enriched genes and columns representing: "frac_shared_to_clos_cluster” = number of co-enriched genes,“rel_frac_shared_to_clos”: fraction of co-enriched genes of enriched genes,“frac_diff_of_shared_features “: number of differential genes of co-enriched genes,“rel_frac_diff_of_shared_to_clos”: fraction of differential genes of co-enriched genes |
list_2ndShared |
list with data.frame for every cluster with rows as enriched genes of a cluster and columns representing binary classification for enrichment (1= enriched, 0 = not enriched) of a cluster and its most similar clusters based on co-enriched genes. |
shared2ndgenes |
list with vector for every cluster of enriched genes with co-enrichment in closest clusters. |
list_2nd_diff |
list with vector for every cluster of co-enriched genes with differential expression to co-enriched clusters. |
outliertab |
data.frame indicating number of outlier cells per cluster with 1, 2 or 3 outlier genes. Rows representing cluster and columns representing number of cells with 1, 2 or 3 outlier genes. |
outlier_genes |
list with vector for every clusters indicating outlier genes. |
nonunimodal_list |
list with data.frame per cluster with rows representing enriched gene per cluster and columns p.value of dip.test and p.value after multiple testing correction with Bonferroni and BH method. |
nonunimodaltab |
data.frame indicating number of genes per cluster with non-unimodal expression before and after multiple-testing correction. |
bandwidth_list |
list with vector for every cluster indicating genes with non-unimodal expression derived from Silverman’s critical bandwith test. |
masstest_list |
list with vectors for every cluster indicating gene with non-unimodal expression based on Ameijeiras-Alonsos’s method to test for unimodality. |
batch_entropy |
entropy of batches across clusters |
entero <- CreateSeuratObject(counts = x, project = "10x", min.cells = 3, min.features = 200)
entero <- NormalizeData(entero, normalization.method = "RC", scale.factor = 10000)
entero <- FindVariableFeatures(entero, selection.method = "vst", nfeatures = 3000)
features <- Seurat::VariableFeatures(entero)
entero <- ScaleData(entero, features = features)
entero <- RunPCA(entero, features = features, npcs = 100)
entero <- FindNeighbors(entero, dims = 1:100)
resolution <- c(1:10)
for (i in resolution) { entero <- FindClusters(entero , resolution = i) }
res <- colnames(entero[[]])[c(4,6:length(colnames(entero[[]])))]
for (i in 1:length(res)) {if (i == 1) { assess_seuratRC <- cluster_assessment( seuratobject=entero,givepart = res[i], give2ndfiff=F, Entro_med=F, diptest=F, run_enriched=T, bwidth=F, critmass=F, gooutlier=T) } else { assess_seuratRC <- cluster_assessment(assessment_list = assess_seuratRC, seuratobject=entero,givepart = res[i], give2ndfiff=F, Entro_med=F, diptest=F, run_enriched=T, bwidth=F, critmass=F, gooutlier=T) }}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.