SHARP: Run SHARP for single-cell RNA data clustering

Description Usage Arguments Details Value Author(s) Examples

View source: R/SHARP.R

Description

SHARP: Single-cell RNA-Seq Hyper-fast and Accurate clustering via ensemble Random Projection.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SHARP(
  scExp,
  exp.type,
  ensize.K,
  reduced.ndim,
  base.ncells,
  partition.ncells,
  hmethod,
  N.cluster = NULL,
  enpN.cluster = NULL,
  indN.cluster = NULL,
  minN.cluster,
  maxN.cluster,
  sil.thre,
  height.Ntimes,
  flashmark = FALSE,
  logflag,
  sncells,
  n.cores,
  forview = TRUE,
  prep,
  rM,
  rN.seed
)

Arguments

scExp

input single-cell expression matrix, where each column represents a cell and each row represents a gene.

exp.type

the data type of single-cell expression matrix. Common types include 'count', 'UMI', 'CPM', 'TPM', 'FPRKM' and 'RPKM'. If missing, SHARP regards scExp as already normalized expression matrix.

ensize.K

number of applications of random projection for ensemble. The default value is 15.

reduced.ndim

the dimension to be reduced to. If missing, the value will be estimated by an equation associated with number of cells (see our paper and supplementary materials for details).

base.ncells

a base threshold of number of cells. The default value is 5000. When the number of cells of a dataset is smaller than this threshold, we use SHARP_small function; otherwise, we use SHARP_large.

partition.ncells

number of cells for each partition when using SHARP_large. The default value is 2000.

N.cluster

number of clusters for the final clustering results. The default is NULL, i.e., without giving the number of clusters, and SHARP will automatically determine the optimal number of clusters. If given, SHARP will calculate according to the given number of clusters.

enpN.cluster

number of clusters for the weighted ensemble meta-clustering only for SHARP_large. The default is NULL, i.e., without giving the number of clusters, and SHARP will automatically determine the optimal number of clusters. If given, SHARP will calculate according to the given number of clusters.

indN.cluster

number of clusters for the individual RP-based hierarchical clustering. The default is NULL, i.e., without giving the number of clusters, and SHARP will automatically determine the optimal number of clusters. If given, SHARP will calculate according to the given number of clusters.

minN.cluster

the minimum number of clusters that SHARP will try when determining the optimal number of clusters

maxN.cluster

the maximum number of clusters that SHARP will try when determining the optimal number of clusters

sil.thre

the threshold of Silhouette index that SHARP will use the Silhouette index to determine the optimal number of clusters. In other words, if the maximum Silhouette index is larger than sil.thre, then SHARP uses the Silhouette index to determine the number of clusters; otherwise, SHARP uses the other indices (i.e., CH index and/or hierarchical heights) to determine

height.Ntimes

the number of times of the height versus the immediate next height in the hierarchical clustering. SHARP uses this parameter as a threshold to determine the location where to cut the hierarchical tree. In other words, if the current height is (height.Ntimes) times larger than the immediate next height in the descending order of heights, then SHARP cuts the tree at the median of these two heights.

flashmark

a logical to determine whether flashClust is used. By default, flashmark = FALSE, i.e., traditional hclust is used.

logflag

a logical to determine whether to check a log-transform of the input expression matrix. By default, logflag = TRUE, i.e., SHARP will check the log-transform operation.

sncells

number of cells randomly selected for checking log-transform is necessary or not. By default, sncells = 100.

n.cores

number of cores to be used. The default is (n-1) cores, where n is the number of cores in your local computer or server.

forview

a logical to indicate whether those feature-vectors for data visualization should be saved or not. By default, it is TRUE.

prep

a logical to determine whether preprocessing (e.g., removing all-zero rows and replace negative value with 0) is employed or not. By default, prep = TRUE only when the number of single cells is smaller than 10,000.

rM

if provided, it should be a list of random matrices for random projection; otherwise, it will be calculated by SHARP_large.

rN.seed

a number using which we can set seeds for SHARP to obtain reproducible results.

Details

This is the main interface for SHARP to process and analyze different kinds of single-cell RNA-Seq data. Only one parameter is manadatory, i.e., scExp, the single-cell expression matrix. In most cases, most of the parameters can be determined automatically or have been optimized, so users don't have to take efforts to try different parameters. While for some other cases where users need to change parameters, SHARP also provides various parameters, including algorithm-related parameters, hierarchical-clustering-related parameters, parallel-computing parameters and parameters to obtain reproducible results, for better optimizing the performance.

Value

a list containing the SHARP clustering results, distribution of the clustering results, the predicted optimal number of clusters, time SHARP consumes for clustering, some intermediate results including clustering results by each random-projection based hierarchical clustering and other related statstical information including number of cells, genes, reduced dimensions and number of applications of random projection.

Author(s)

Shibiao Wan <shibiaowan.work@gmail.com>

Examples

1
enresults = SHARP(scExp)

shibiaowan/SHARP documentation built on April 28, 2021, 1:56 p.m.