cluster_assessment: AssessME - a cluster assessment tool for preprocessing and...
In PatZeis/AssessMe: A cluster assessment tool for preprocessing and clustering optimization

cluster_assessment

R Documentation

AssessME - a cluster assessment tool for preprocessing and clustering optimisation

Description

tool for assessment and comparison of cluster partitions based on different:filtering, feature selection, normalization, batch correction, imputation, clustering algorithms

Usage

cluster_assessment(
  assessment_list = NULL,
  seuratobject = NULL,
  seurat_assay = "RNA",
  seurat_lib_size = F,
  do.features = T,
  var_feat_len = NULL,
  RaceIDobject = NULL,
  RaceID_cl_table = NULL,
  ScanpyobjectFullpath = NULL,
  scanpy_clust = "leiden",
  scanpyscalefactor = 10000,
  rawdata = NULL,
  ndata = NULL,
  norm = T,
  givepart = NULL,
  givefeatures = NULL,
  minexpr = 5,
  CGenes = NULL,
  ccor = 0.65,
  fselectRace = F,
  fselectSeurat = F,
  givebatch = NULL,
  individualbatch = NULL,
  gene.domain = F,
  PCA_QA = F,
  PCAnum = 10,
  run_cutoff = T,
  f1Z = F,
  cutoff = "mean",
  cutoffmax = F,
  clustsize = 10,
  binaclassi = "F1Score",
  Entro_tresh = T,
  Entro_med = T,
  run_enriched = T,
  give2ndfiff = T,
  diffexp = "nbino",
  vfit = NULL,
  gooutlier = T,
  individualfit = F,
  outminc = 5,
  probthr = 0.01,
  diptest = T,
  bwidth = T,
  critmass = T,
  mintotal = 3000,
  unifrac = 0.1,
  logmodetest = F,
  b_bw = 25,
  n_bw = 128,
  b_ACR = 100,
  n_ACR = 1024,
  batch_entropy = F,
  set.name = NULL,
  rawdata_null = T
)

Arguments

`assessment_list`	list, with named objects for different assessments, to which new assessment is added. Default is `NULL`.
`seuratobject`	Seurat object as input for assessment: derives UMI count object, normalized count object, cluster partition and variable features from Seurat Object. Default = `NULL`.
`seurat_assay`	if `seuratobject`, name of Seurat assay to retrieve required objects. Default =”RNA”
`seurat_lib_size`	logical. If `FALSE` performs library size normalization of UMI counts object of `seuratobject` and overwrites normalized data object within assessment object. Default = `FALSE`.
`do.features`	logical. If `TRUE` performs feature selection and derives `var_feat_len` number of top variable genes. Default = `TRUE`.
`var_feat_len`	number of top variable genes used for cluster assessment, if `var_feat_len` not equivalent of the length of "var.features" object of `seuratobject`, derive top `var_feat_len` number of feature genes using Seurat’s variance stabilization method, requires `seuratobject` and `do.features` needs to be set `TRUE`. Default = `NULL`.
`RaceIDobject`	RaceID object as input for assessment: derives UMI count data of cells passing filtering criteria, normalized data, cluster partition, feature genes, background noise model describing the expression variance of genes as a function of their mean and RaceID filtering criteria. Default = `NULL`.
`RaceID_cl_table`	metadata data frame for a RaceID object in similar form as meta.data object of a Seurat object with rows as cells and columns as e.g. different cluster partitions. Default = `NULL`.
`ScanpyobjectFullpath`	full path to scanpy object in h5ad format, which is converted to Seurat object from which UMI counts, cluster partition and feature genes are derived. Using UMI count data and scale factor, library size normalization is performed and scaled using the scale factor.
`scanpy_clust`	either “leiden” or “louvain”, derives cluster partition of either Leiden or Louvain clustering. Default=”leiden”.
`scanpyscalefactor`	integer number with which relative cell counts are scaled to equal transcript counts. Default = 10,000.
`rawdata`	UMI count expression data with genes as rows and cells as columns. Default = `NULL`.
`ndata`	normalized expression data with genes as rows and cells as columns. Default = `NULL`.
`norm`	performs library size normalization on provided rawdata argument. Default = `TRUE`.
`givepart`	clustering partition. Either a vector of integer cluster number for each cell in the same order as UMI count table or normalized count table for RaceIDobject; or a character string representing a column name of Seurat metadata data frame of a Seurat object or similar metadata frame, `RaceID_cl_table`,for a RaceID object. Default = `NULL`.
`givefeatures`	gene vector to perform assessment. Default = `NULL`.
`minexpr`	minimum required transcript count of a gene across evaluated cells. Genes not passing criteria are filtered out. Default 5. If `RaceIDobject`, `minexpr` derived from `RaceIDobject`. Relevant for deriving feature genes if `gene.domain` and calculating fit of dependency of mean on variance.
`CGenes`	gene vector for genes to exclude from feature selection. Only relevant if `seuratobject` `&` `RaceIDobject` `&` `ScanpyobjectFullpath` = `NULL` and `rawdata` is given. Default = `NULL`.
`ccor`	integer value of correlation coefficient used as threshold for determining genes correlated to genes in `CGenes`. Only genes correlating less than `ccor` to all genes in `CGenes` are retained for analysis. Default = 0.65.
`fselectRace`	logical. If `True`, performs RaceID feature selection, only if `seuratobject` `&` `RaceIDobject` `&` `ScanpyobjectFullpath` `&` `givefeatures` = `NULL`. Default = `False`.
`fselectSeurat`	logical. If `True`,performs Seurat variance stabilization feature selection and derives `var_feat_len` number of top variable genes, only if `seuratobject` `&` `RaceIDobject` `&` `ScanpyobjectFullpath` `&` `givefeatures` = `NULL`. Default = `False`.
`givebatch`	vector indicating batch information for cells; must have the same length and order as cluster partition. Default = `NULL`.
`individualbatch`	individual batch name, element of `givebatch`, to perform assessment on. Default = `NULL`.
`gene.domain`	logical. If `TRUE`, assess all genes with at least `minexpr` in one cell.
`PCA_QA`	logical. If `TRUE`, derives first two principal components and the top `PCAnum` number of genes with highest or lowest loadings. Default = `False`.
`PCAnum`	integer value, number of genes to be derived with top highest and top lowest loadings for the first two principal components. Default = 10.
`run_cutoff`	logical. If `TRUE` calculate per gene cutoff, representing true label utilized for F1 score, entropy and enrichment of gene per cluster calculation. Default = `T`.
`cutoff`	either “mean” or “median”, utilizes either per gene average expression within clusters or per gene median expression within clusters to calculate the true label cutoff. The Cutoff is calculated per gene by selecting the cluster with highest average or median expression and averaging this mean, with the mean or median of the remaining clusters.
`cutoffmax`	logical. If `TRUE`, then per gene cutoff is the average expression of the cluster with highest average expression across clusters. Default = `False`.
`clustsize`	integer value, threshold of minimum number of cells a cluster should have to be included in the assessment.
`binaclassi`	either “F1Score”, “Cohenkappa”, “MCC” or NULL. Statistical analysis for binary classification. F1Score, Cohenkappa or Matthews correlation coefficient (MCC). If `NULL` then computation is skipped. Default = “F1Score”.
`Entro_tresh`	logical. If `TRUE`, calculate per gene entropy, utilizing the derived per gene cutoff as true-label, to assess label distribution across clusters. Default = `TRUE`, requires `run_cuoff`.
`Entro_med`	logical. If `TRUE`, calculate per gene median expression per cluster and fraction of individual median of summed medians across clusters, which is used to calculate per gene entropy. Default = `F`, requires `run_cuoff`.
`run_enriched`	logical. If `TRUE`, run enrichment analysis using fisher.test. Using cutoff, expression per gene is binarized across cells. Cells have either 1 or 0 expression. Expression is summed within clusters and enrichment per cluster is calculated for each gene using fisher.test. If cluster has enrichment for a gene( p-value < 0.05), the value per gene of a cluster is set to 1. In order to speed up computation, for each gene, fraction of positive cells within a cluster is ordered in decreasing order and enrichment is tested iterativelly along that order. If enrichment p-value of 3 clusters (flag count) is not significant, the remaining clusters are expected to be not enriched. Cluster with less cells than the number of average cells per cluster do not increase the flag count.
`give2ndfiff`	logical. If `TRUE`, run differential expression analysis between every cluster and its closest cluster(s) based on highest number of co-enriched genes, for genes which are shared enriched in these clusters. If more than one cluster share the same number of co-enriched genes, differential expression of co-enriched genes is performed for all co-enriched clusters. Default = `T`. Co-enriched clusters can represent cell states of the same cell types.
`diffexp`	either “nbino” or “wilcox”. Performs differential expression analysis between cells of clusters with highest number of co-enriched genes for these co-enriched genes based on Wilcoxon test or negative binomial distribution test utilizing global gene mean-variance dependence. Default = “nbino”.
`vfit`	function of the background noise model describing the expression variance as a function of the mean expression. Input can be utilized for differential expression analysis between co-enriched genes and identification of outlier gene-expression within cluster in outlier analysis. Default = `NULL`.
`gooutlier`	logical. If `TRUE`, performs outlier identification based on cluster partition and identifies outlier gene expression within clusters.
`individualfit`	logical. If `TRUE`, background noise model, required to infer outlier expression, is fitted for each cluster separately, default = `F`.
`outminc`	integer value, minimal transcript count of a gene to be included in the background fit.
`probthr`	integer value, outlier probability threshold for genes to exhibit outlier expression within a cluster. Probability is computed from a negative binomial background model of expression in a cluster.
`diptest`	logical. If `T`, performs dip.test function from the diptest package to test for unimodality of gene expression (enriched genes) within clusters by computing Hartigans’ dip statistics per gene. Calculation is performed only on expression values with at least `minexpr`. As calculating is performed on library size normalized and rescaled data, `minexpr` is rescaled basd on scalefactor divided by `mintotal`. Expression is only tested, if a given fraction of a cluster, `unifrac`, exhibits minimal expression of rescaled `minexpr` or the sample size equals at least `clustsize`.Default = `T`.
`bwidth`	logical. If `T`, performs Silverman’s critical bandwidth method to test for unimodality of gene expression (enriched genes) within clusters. Calculation is performed only on expression values with at least `minexpr`. As calculating is performed on library size normalized and rescaled data, `minexpr` is rescaled based on scalefactor divided by `mintotal`. Expression is only tested, if a given fraction of a cluster, `unifrac`, exhibits minimal expression of rescaled `minexpr` or the sample size equals at least `clustsize`.Default = `T`.
`critmass`	logical. If `T`, performs Ameijeiras-Alonsos’s method to test for unimodality of gene expression (enriched genes) within clusters. Calculation is performed only on expression values with at least `minexpr`. As calculating is performed on library size normalized and rescaled data, `minexpr` is rescaled based on scalefactor divided by `mintotal`. Expression is only tested, if a given fraction of a cluster, `unifrac`, exhibits minimal expression of rescaled `minexpr` or the sample size equals at least `clustsize`.Default = `T`.
`mintotal`	minimal number of transcripts cells are expected to have, to calculate expression cutoff. Default = 3000
`unifrac`	fraction of cluster required to exhibit at least scaled `minexpr` that gene is tested for unimodality. Default = 0.1.
`logmodetest`	logical. If `T`, performs log transformation before testing unimodality of gene expression. Default = `F`.
`b_bw`	number of replicates used for Silverman’s critical bandwith test, default = 25.
`n_bw`	number of equally spaced points at which density is estimated, for Silverman’s critical bandwith test, default = 128.
`b_ACR`	number of replicates used for Ameijeiras-Alonsos’s unimodality test, default = 100.
`n_ACR`	number of equally spaced points at which density is estimated, for Ameijeiras-Alonsos’s unimodality test, default = 1024.
`set.name`	set name for individual assessment within output of list of assessments. Default = `NULL` and name is given in the following way: if `seuratobject`, name is selected from metadata columns equal to Idents(), or character string given as input for givepart or character string of object name of numeric cluster partition. If `RaceIDobject`, name is given by character string given as input for givepart, character string of the object name of the number cluster partition or “Vdefault”.
`rawdata_null`	logical. If `TRUE`, do not store UMI count table in output of assessment, default = T
`logical.`	If `TRUE` than cutoff for true label is x>0. Default = `False`.
`batch_entrop`	logical. If `T`, calculate the entropy of batches across cluster. Default = `F`.

Value

List of assessments, with a named object per assessment. Individual assessments represent a list with the following objects:

`rawdata`	Raw expression data matrix/UMI count matrix derived from input objects, with cells as columns and genes as rows in sparse matrix format.
`rowmean`	mean expression of assessed features.
`part`	vector containing cluster partition derived from input objects.
`clustsize`	threshold of minimum number of cells in a cluster used for assessment.
`features`	vector of feature genes derived from object, used to compute its cluster partition.
`assessed_features`	vector of features assessed through assess me function, can differ from `features` when `var_feat_len` argument differs from length of object derived features or different set of genes given as argument with `givefeatures`
`PCA`	data.frame with 4 columns, indicating top PCAnum genes with: highest loadings for PC1, lowest loadings for PC1, highest loadings for PC2 and lowest loadings for PC2.
`max_cl`	vector indicating for assessed features which cluster exhibits highest mean expression.
`cutoff`	vector indicating calculated numeric cutoff for assessed features.
`f1_score`	vector indicating f1_score or alternative statistical analysis for binary classification, for the assessed features.
`Entropy_tresh`	vector indicating Entropy per assessed feature, calculated based on the per gene cutoff.
`Entropy_median`	ector indicating Entropy per assessed feature, calculated based on per gene median expression per cluster and fraction of individual medians of summed median across clusters.
`cluster`	vector indicating assessed clusters.
`enriched_features`	number of enriched features per cluster.
`enriched_feature_list`	list with a vector per cluster of enriched features.
`unique_features`	number of uniquely enriched features per cluster.
`unique_feature_list`	list with a vector per cluster of uniquely enriched features.
`second_cluster`	data.frame with rows representing a cluster and its closest clusters based on co-enriched genes and columns representing: "frac_shared_to_clos_cluster” = number of co-enriched genes,“rel_frac_shared_to_clos”: fraction of co-enriched genes of enriched genes,“frac_diff_of_shared_features “: number of differential genes of co-enriched genes,“rel_frac_diff_of_shared_to_clos”: fraction of differential genes of co-enriched genes
`list_2ndShared`	list with data.frame for every cluster with rows as enriched genes of a cluster and columns representing binary classification for enrichment (1= enriched, 0 = not enriched) of a cluster and its most similar clusters based on co-enriched genes.
`shared2ndgenes`	list with vector for every cluster of enriched genes with co-enrichment in closest clusters.
`list_2nd_diff`	list with vector for every cluster of co-enriched genes with differential expression to co-enriched clusters.
`outliertab`	data.frame indicating number of outlier cells per cluster with 1, 2 or 3 outlier genes. Rows representing cluster and columns representing number of cells with 1, 2 or 3 outlier genes.
`outlier_genes`	list with vector for every clusters indicating outlier genes.
`nonunimodal_list`	list with data.frame per cluster with rows representing enriched gene per cluster and columns p.value of dip.test and p.value after multiple testing correction with Bonferroni and BH method.
`nonunimodaltab`	data.frame indicating number of genes per cluster with non-unimodal expression before and after multiple-testing correction.
`bandwidth_list`	list with vector for every cluster indicating genes with non-unimodal expression derived from Silverman’s critical bandwith test.
`masstest_list`	list with vectors for every cluster indicating gene with non-unimodal expression based on Ameijeiras-Alonsos’s method to test for unimodality.
`batch_entropy`	entropy of batches across clusters

Examples

entero <- CreateSeuratObject(counts = x, project = "10x", min.cells = 3, min.features = 200)
entero <- NormalizeData(entero, normalization.method = "RC", scale.factor = 10000)
entero <- FindVariableFeatures(entero, selection.method = "vst", nfeatures = 3000)
features <- Seurat::VariableFeatures(entero)
entero <- ScaleData(entero, features = features)
entero <- RunPCA(entero, features = features, npcs = 100)
entero <- FindNeighbors(entero, dims = 1:100)
resolution <- c(1:10)
for (i in resolution)  { entero  <- FindClusters(entero , resolution = i) }
res <- colnames(entero[[]])[c(4,6:length(colnames(entero[[]])))]
for (i in 1:length(res)) {if (i == 1) { assess_seuratRC <- cluster_assessment( seuratobject=entero,givepart = res[i], give2ndfiff=F, Entro_med=F, diptest=F, run_enriched=T, bwidth=F, critmass=F, gooutlier=T) } else { assess_seuratRC <- cluster_assessment(assessment_list = assess_seuratRC, seuratobject=entero,givepart = res[i], give2ndfiff=F, Entro_med=F, diptest=F, run_enriched=T, bwidth=F, critmass=F, gooutlier=T) }}

PatZeis/AssessMe documentation built on Nov. 19, 2022, 6:03 a.m.

PatZeis/AssessMe index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PatZeis/AssessMe
A cluster assessment tool for preprocessing and clustering optimization

cluster_assessment: AssessME - a cluster assessment tool for preprocessing and...
In PatZeis/AssessMe: A cluster assessment tool for preprocessing and clustering optimization

AssessME - a cluster assessment tool for preprocessing and clustering optimisation

Description

Usage

Arguments

Value

Examples

Related to cluster_assessment in PatZeis/AssessMe...

R Package Documentation

Browse R Packages

We want your feedback!

PatZeis/AssessMe A cluster assessment tool for preprocessing and clustering optimization

cluster_assessment: AssessME - a cluster assessment tool for preprocessing and... In PatZeis/AssessMe: A cluster assessment tool for preprocessing and clustering optimization

AssessME - a cluster assessment tool for preprocessing and clustering optimisation

Description

Usage

Arguments

Value

Examples

Related to cluster_assessment in PatZeis/AssessMe...

R Package Documentation

Browse R Packages

We want your feedback!

PatZeis/AssessMe
A cluster assessment tool for preprocessing and clustering optimization

cluster_assessment: AssessME - a cluster assessment tool for preprocessing and...
In PatZeis/AssessMe: A cluster assessment tool for preprocessing and clustering optimization