CalcAllSCV: Prepare all cluster solutions for visualization with...

View source: R/deTest.R

CalcAllSCVR Documentation

Prepare all cluster solutions for visualization with scClustViz

Description

An all-in-one function to prepare your data for viewing in the interactive Shiny app. See example for the basic usage of scClustViz.

Usage

CalcAllSCV(
  inD,
  clusterDF,
  assayType = "",
  assaySlot = "",
  DRforClust = "pca",
  exponent = 2,
  pseudocount = 1,
  DRthresh = 0.1,
  testAll = TRUE,
  FDRthresh = 0.05,
  calcSil = T,
  calcDEvsRest = T,
  calcDEcombn = T
)

Arguments

inD

The input dataset. An object of class seurat or SingleCellExperiment. Other data classes are not currently supported. Please submit requests for other data objects here!

clusterDF

A data frame of cluster assignments for all cells in the dataset. Variables (columns) are cluster solutions with different parameters, and rows should correspond to cells of the input gene expression matrix.

assayType

Default = "" (for Seurat v1/2). A length-one character vector representing the assay object in which the expression data is stored in the input object. This is not required for Seurat v1 or v2 objects. For Seurat v3 objects, this is often "RNA". For SingleCellExperiment objects, this is often "logcounts". See getExpr for details.

assaySlot

An optional length-one character vector representing the slot of the Seurat v3 Assay object to use. Not used for other single-cell data objects. The default is to use the normalized data in the "data" slot, but you can also use the SCTransform-corrected counts by setting assayType = "SCT" and assaySlot = "counts". This is recommended, as it will speed up differential expression calculations. See getExpr for details.

DRforClust

Default = "pca".A length-one character vector representing the dimensionality reduction method used as the input for clustering. This is commonly PCA, and should correspond to the slot name of the cell embedding in your input data - either the type argument in reducedDim(x,type) or the reduction.type argument in GetDimReduction(object,reduction.type) (v2) or reduction in Embeddings(object,reduction).

exponent

Default = 2. A length-one numeric vector representing the base of the log-normalized gene expression data to be processed. Generally gene expression data is transformed into log2 space when normalizing (set this to 2), though Seurat uses the natural log (set this to exp(1)). If you are using data that has not been log-transformed (for example, corrected counts from SCTransform), set this to NA.

pseudocount

Default = 1. A length-one numeric vector representing the pseudocount added to all log-normalized values in your input data. Most methods use a pseudocount of 1 to eliminate log(0) errors. If you are using data that has not been log-transformed (for example, corrected counts from SCTransform), set this to NA.

DRthresh

Default = 0.1. A length-one numeric vector between 0 and 1 representing the detection rate threshold for inclusion of a gene in the differential expression testing. A gene will be included if it is detected in at least this proportion of cells in at least one of the clusters being compared.

testAll

Default = TRUE. Logical value indicating whether to test all cluster solutions (TRUE) or stop testing once a cluster solution has been found where there is no differentially expressed genes found between at least one pair of nearest neighbouring clusters (FALSE). If set to FALSE, this function will test cluster solutions in ascending order of number of clusters found. If set to (FALSE), only tested cluster solutions will appear in the scClustViz shiny app.

FDRthresh

Default = 0.05. A length-one numeric vector representing the targeted false discovery rate used to determine the number of differentially expressed genes between nearest neighbouring clusters, assuming testAll is set FALSE. If testAll is TRUE, this argument is unused.

calcSil

Default = TRUE. A logical vector of length 1. If TRUE, silhouette widths (a cluster cohesion/separation metric) will be calculated for all cells. This calculation is performed using the function CalcSilhouette, which is a wrapper to silhouette with distance calculated using the same reduced dimensional cell embedding as was used for clustering, as indicated in the DRforClust argument. If the package cluster is not installed, this calculation is skipped.

calcDEvsRest

Default = TRUE. A logical vector of length 1. If TRUE, differential expression tests will be performed comparing each cluster to the remaining cells in the data using a Wilcoxon rank-sum test and reporting false discovery rates. This calculation is performed using the function CalcDEvsRest. If set to FALSE, it is suggested that you perform DE testing on the same set of comparisons using a statistical method of your choice. This can be passed into your sCVdata objects in the list returned by CalcAllSCV using the function CalcDEvsRest. See function documentation for details.

calcDEcombn

Default = TRUE. A logical vector of length 1. If TRUE, differential expression tests will be performed comparing all pairwise combinations of clusters using a Wilcoxon rank-sum test and reporting false discovery rates. This calculation is performed using the function calcDEcombn. If set to FALSE, it is suggested that you perform DE testing on the same set of comparisons using a statistical method of your choice. This can be passed into your sCVdata objects in the list returned by CalcAllSCV using the function calcDEcombn. See function documentation for details.

Details

This is a wrapper function for running CalcSCV over each cluster resolution in the input, and outputs a list of sCVdata objects that should be saved along with the input data. The resulting file is ready to be read by runShiny for viewing. For each cluster solution provided, this function calculates summary statistics per gene per cluster, differential gene expression, and cluster separation metrics. This may take a while to run, depending on the number of cluster solutions tested. Use the testAll argument to prevent testing of overfitted cluster solutions. To help track its progress, this function uses progress bars from pbapply. To disable these, set pboptions(type="none"). To re-enable, set pboptions(type="timer").

Value

The function returns a list containing sCVdata objects for each cluster resolution (sample) in the clusterDF data frame. The output object and the inD object should be saved as an .RData file. That file is the input for runShiny, the scClustViz Shiny interaction visualization app. See example. For details of calculations performed / stored by this function, see sCVdata.

See Also

sCVdata for information on the output data class. CalcSCV to generate an sCVdata object for a single cluster solution. runShiny starts the interactive Shiny GUI to view the results of this testing.

Examples

## Not run: 
your_cluster_columns <- grepl("res[.0-9]+$",
                              names(getMD(your_scRNAseq_data_object)))
# ^ Finds the cluster columns of the metadata in a Seurat object.

your_cluster_results <- getMD(your_scRNAseq_data_object)[your_cluster_columns]

sCVdata_list <- CalcAllSCV(inD=your_scRNAseq_data_object,
                           clusterDF=your_cluster_results,
                           assayType="RNA",
                           DRforClust="pca",
                           exponent=exp(1),
                           pseudocount=1,
                           DRthresh=0.1,
                           testAll=F,
                           FDRthresh=0.05,
                           calcSil=T,
                           calcDEvsRest=T,
                           calcDEcombn=T)

save(your_scRNAseq_data_object,sCVdata_list,
     file="for_scClustViz.RData")

runShiny(filePath="for_scClustViz.RData")
# ^ see ?runShiny for detailed argument list

## End(Not run)


BaderLab/scClustViz documentation built on Sept. 10, 2023, 11:51 p.m.