CalcSCV: Create sCVdata object with calculation results for a cluster

View source: R/deTest.R

CalcSCVR Documentation

Create sCVdata object with calculation results for a cluster

Description

Creates and populates a new sCVdata object for a single clustering result on the input data. This is the building block of the input to runShiny, the scClustViz interactive visualization of scRNAseq data.

Usage

CalcSCV(
  inD,
  cl,
  assayType = "",
  assaySlot = "",
  DRforClust = "pca",
  exponent = 2,
  pseudocount = 1,
  DRthresh = 0.1,
  calcSil = T,
  calcDEvsRest = T,
  calcDEcombn = T
)

Arguments

inD

The input dataset. An object of class seurat or SingleCellExperiment. Other data classes are not currently supported. Please submit requests for other data objects here!

cl

a factor where each value is the cluster assignment for a cell (column) in the input gene expression matrix.

assayType

Default = "" (for Seurat v1/2). A length-one character vector representing the assay object in which the expression data is stored in the input object. This is not required for Seurat v1 or v2 objects. For Seurat v3 objects, this is often "RNA". For SingleCellExperiment objects, this is often "logcounts". See getExpr for details.

assaySlot

An optional length-one character vector representing the slot of the Seurat v3 Assay object to use. Not used for other single-cell data objects. The default is to use the normalized data in the "data" slot, but you can also use the SCTransform-corrected counts by setting assayType = "SCT" and assaySlot = "counts". This is recommended, as it will speed up differential expression calculations. See getExpr for details.

DRforClust

Default = "pca".A length-one character vector representing the dimensionality reduction method used as the input for clustering. This is commonly PCA, and should correspond to the slot name of the cell embedding in your input data - either the type argument in reducedDim(x,type) or the reduction.type argument in GetDimReduction(object,reduction.type) (v2) or reduction in Embeddings(object,reduction).

exponent

Default = 2. A length-one numeric vector representing the base of the log-normalized gene expression data to be processed. Generally gene expression data is transformed into log2 space when normalizing (set this to 2), though Seurat uses the natural log (set this to exp(1)). If you are using data that has not been log-transformed (for example, corrected counts from SCTransform), set this to NA.

pseudocount

Default = 1. A length-one numeric vector representing the pseudocount added to all log-normalized values in your input data. Most methods use a pseudocount of 1 to eliminate log(0) errors. If you are using data that has not been log-transformed (for example, corrected counts from SCTransform), set this to NA.

DRthresh

Default = 0.1. A length-one numeric vector between 0 and 1 representing the detection rate threshold for inclusion of a gene in the differential expression testing. A gene will be included if it is detected in at least this proportion of cells in at least one of the clusters being compared.

calcSil

Default = TRUE. A logical vector of length 1. If TRUE, silhouette widths (a cluster cohesion/separation metric) will be calculated for all cells. This calculation is performed using the function CalcSilhouette, which is a wrapper to silhouette with distance calculated using the same reduced dimensional cell embedding as was used for clustering, as indicated in the DRforClust argument. If the package cluster is not installed, this calculation is skipped.

calcDEvsRest

Default = TRUE. A logical vector of length 1. If TRUE, differential expression tests will be performed comparing each cluster to the remaining cells in the data using a Wilcoxon rank-sum test and reporting false discovery rates. This calculation is performed using the function CalcDEvsRest. If set to FALSE, it is suggested that you perform DE testing on the same set of comparisons using a statistical method of your choice. This can be passed into your sCVdata objects in the list returned by CalcAllSCV using the function CalcDEvsRest. See function documentation for details.

calcDEcombn

Default = TRUE. A logical vector of length 1. If TRUE, differential expression tests will be performed comparing all pairwise combinations of clusters using a Wilcoxon rank-sum test and reporting false discovery rates. This calculation is performed using the function calcDEcombn. If set to FALSE, it is suggested that you perform DE testing on the same set of comparisons using a statistical method of your choice. This can be passed into your sCVdata objects in the list returned by CalcAllSCV using the function calcDEcombn. See function documentation for details.

Details

By default, CalcSCV populates all slots of the object, calculating cluster separation metrics and differential gene expression results. At a minimum, it calculates the basic gene statistics required by scClustViz for visualization. This can take some time. To help track its progress, this function uses progress bars from pbapply. To disable these, set pboptions(type="none"). To re-enable, set pboptions(type="timer"). To view the results using runShiny, the resulting sCVdata object(s) must be stored as a named list of cluster solutions and saved to an .RData file along with the input data object. See example for details.

Value

The function returns an sCVdata object with all slots populated by default, and at least the Clusters, ClustGeneStats, and params slots populated. The resulting object can be added to a list as part of the input to runShiny for visualization of the cluster results.

See Also

sCVdata for information on the output data class. CalcAllSCV to generate a list of sCVdata objects for all cluster solutions in the input data. runShiny starts the interactive Shiny GUI to view the results of this testing.

Examples

## Not run: 
## This example shows integration of scClustViz with Seurat clustering ##

DE_bw_clust <- TRUE
seurat_resolution <- 0
sCVdata_list <- list()

while(DE_bw_clust) {
  seurat_resolution <- seurat_resolution + 0.2
  # ^ iteratively incrementing resolution parameter

  your_seurat_obj <- Seurat::FindClusters(your_seurat_obj,
                                          resolution=seurat_resolution)

  curr_sCVdata <- CalcSCV(inD=your_seurat_obj,
                          cl=Indents(your_seurat_obj),
                          assayType="RNA",
                          DRforClust="pca",
                          exponent=exp(1),
                          pseudocount=1,
                          DRthresh=0.1,
                          calcSil=T,
                          calcDEvsRest=T,
                          calcDEcombn=T)

  DE_bw_NN <- sapply(DEneighb(curr_sCVdata,0.05),length)
  # ^ counts # of DE genes between neighbouring clusters at 5% FDR

  if (min(DE_bw_NN) < 1) { DE_bw_clust <- FALSE }
  # ^ If no DE genes between nearest neighbours, don't loop again.

  sCVdata_list[[paste0("res.",seurat_resolution)]] <- curr_sCVdata
}

save(your_seurat_obj,sCVdata_list,
     file="for_scClustViz.RData")

runShiny(filePath="for_scClustViz.RData")
# ^ see ?runShiny for detailed argument list

## End(Not run)


BaderLab/scClustViz documentation built on Sept. 10, 2023, 11:51 p.m.