reclassify: Reclassification of cells

View source: R/FateID_functions.R

reclassifyR Documentation

Reclassification of cells

Description

This function attempts to reassign additional cells in the dataset to one of the target clusters.

Usage

reclassify(
  x,
  y,
  tar,
  z = NULL,
  clthr = 0.75,
  nbfactor = 5,
  use.dist = FALSE,
  seed = NULL,
  nbtree = NULL,
  q = 0.9,
  ...
)

Arguments

x

expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis.

y

clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x.

tar

vector of integers representing target cluster numbers. Each element of tar corresponds to a cluster of cells committed towards a particular mature state. One cluster per different cell lineage has to be given and is used as a starting point for learning the differentiation trajectory.

z

Matrix containing cell-to-cell distances to be used in the fate bias computation. Default is NULL. In this case, a correlation-based distance is computed from x by 1 - cor(x)

clthr

real number between zero and one. This is the threshold for the fraction of random forest votes required to assign a cell not contained within the target clusters to one of these clusters. The value of this parameter should be sufficiently high to only reclassify cells with a high-confidence assignment. Default value is 0.9.

nbfactor

positive integer number. Determines the number of trees grown for each random forest. The number of trees is given by the number of columns of th training set multiplied by nbfactor. Default value is 5.

use.dist

logical value. If TRUE then the distance matrix is used as feature matrix (i. e. z if not equal to NULL and 1-cor(x) otherwise). If FALSE, gene expression values in x are used. Default is FALSE.

seed

integer seed for initialization. If equal to NULL then each run will yield slightly different results due to the radomness of the random forest algorithm. Default is NULL

nbtree

integer value. If given, it specifies the number of trees for each random forest explicitely. Default is NULL.

q

real value between zero and one. This number specifies a threshold used for feature selection based on importance sampling. A reduced expression table is generated containing only features with an importance larger than the q-quantile for at least one of the classes (i. e. target clusters). Default value is 0.75.

...

additional arguments to be passed to the low level function randomForest.

Details

The function uses random forest based supervised learning to assign cells not contained in the target clusters to one of these clusters. All cells not within any of the target clusters which receive a fraction of votes larger than clthr for one of the target clusters will be reassigned to this cluster. Since this function is developed to reclassify cells only if they can be assigned with high confidence, a high value of clthr (e. g. > 0.75) should be applied.

Value

A list with the following three components:

part

A vector with the revised cluster assignment for each cell in the same order as in the input argument y.

rf

The random forest object generated for the reclassification, with enabled importance sampling (see randomForest).

xf

A filtered expression table with features extracted based on the important samples, only features with an importance larger than the q-quantile are for at least one of the classes are retained.

Examples

x <- intestine$x
y <- intestine$y
tar <- c(6,9,13)
rc <- reclassify(x,y,tar,z=NULL,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL,q=.9)

dgrun/FateID documentation built on June 20, 2022, 12:57 p.m.