Prepare the SCESet object for SC3 clustering

Description

This function prepares an object of 'SCESet' class for SC3 clustering. It creates and populates the following items of the object@sc3 slot:

  • processed_dataset - contains the expression matrix to be used for SC3 clustering.

  • kmeans_iter_max - contains a value of iter.max parameter to be used in kmeans clustering.

  • rand_seed - contains a random seed to be used by SC3

  • kmeans_nstart - contains a value of nstart parameter to be used in kmeans clustering.

  • n_dim - contains values of the number of eigenvectors to be used in kmeans clustering.

  • svm_train_inds - if SVM is used this item contains indexes of the training cells to be used for SC3 clustering and further SVM prediction.

  • svm_study_inds - if SVM is used this item contains indexes of the cells to be predicted by SVM.

  • n_cores - contains a value of the number of available cores on the user's machine.

  • rselenium - defines whether RSelenium is installed on the user's machine.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
sc3_prepare.SCESet(object, exprs_values = "exprs", gene.filter = FALSE,
  gene.filter.fraction = 0.06, gene.reads.rare = 2, gene.reads.ubiq = 0,
  log.scale = FALSE, d.region.min = 0.04, d.region.max = 0.07,
  svm.num.cells = NULL, svm.train.inds = NULL, svm.max = 5000,
  n.cores = NULL, k.means.nstart = NULL, k.means.iter.max = 1e+09,
  biology = TRUE, seed = 1)

## S4 method for signature 'SCESet'
sc3_prepare(object, exprs_values = "exprs",
  gene.filter = FALSE, gene.filter.fraction = 0.06, gene.reads.rare = 2,
  gene.reads.ubiq = 0, log.scale = FALSE, d.region.min = 0.04,
  d.region.max = 0.07, svm.num.cells = NULL, svm.train.inds = NULL,
  svm.max = 5000, n.cores = NULL, k.means.nstart = NULL,
  k.means.iter.max = 1e+09, biology = TRUE, seed = 1)

Arguments

object

an object of 'SCESet' class

exprs_values

character string indicating which values should be used as the expression values for SC3 clustering. Valid arguments are 'tpm' (default; transcripts per million), 'norm_tpm' (normalised TPM values), 'fpkm' (FPKM values), 'norm_fpkm' (normalised FPKM values), 'counts' (counts for each feature), 'norm_counts', 'cpm' (counts-per-million), 'norm_cpm' (normalised counts-per-million), 'exprs' (whatever is in the 'exprs' slot of the SCESet object; default), 'norm_exprs' (normalised expression values) or 'stand_exprs' (standardised expression values) or any other named element of the assayData slot of the SCESet object that can be accessed with the get_exprs function.

gene.filter

a boolen variable which defines whether to perform gene filtering before SC3 clustering. Default is TRUE. The gene filter removes genes/transcripts that are either expressed (expression value is more than gene.reads.rare) in less than X (expression value is more than gene.reads.ubiq) in at least (100*X) cells (ubiquitous genes/transcripts), where X is the gene.filter.fraction*100. The motivation for the gene filter is that ubiquitous and rare genes most often are not informative for the clustering.

gene.filter.fraction

fraction of cells. Default is 0.06.

gene.reads.rare

expression value threshold for rare genes. Default is 2.

gene.reads.ubiq

expression value threshold for ubiquitous genes. Default is 0.

log.scale

a boolean variable which defines whether to perform log2 scaling before SC3 clustering. Default is TRUE.

d.region.min

defines the minimum number of eigenvectors used for kmeans clustering as a fraction of the total number of cells. Default is 0.04.

d.region.max

defines the maximum number of eigenvectors used for kmeans clustering as a fraction of the total number of cells. Default is 0.07.

svm.num.cells

number of randomly selected training cells to be used for SVM prediction. The default is NULL.

svm.train.inds

a numeric vector defining indeces of training cells that should be used for SVM training. The default is NULL.

svm.max

define the maximum number of cells below which SVM is not run.

n.cores

defines the number of cores to be used on the user's machine.

k.means.nstart

nstart parameter used by kmeans() function. Default is 1000 for up to 2000 cells and 50 for more than 2000 cells.

k.means.iter.max

iter.max parameter passed to kmeans function. Default is 1e+09.

biology

boolean variable, defines whether to comput DE genes, marker genes and cell outliers

seed

sets seed for the random number generator. Can be used to check the stability of clustering results: if the results are the same after changing the seed several time, then the clustering solution is stable.

Value

an object of 'SCESet' class

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.