R/ClassifyManual.R

Defines functions ClassifyManual

Documented in ClassifyManual

#' Manual cell type classification by applying gates
#'
#' This function performs a cell type classification by gating.
#'
#' Based on cell scores (sign_x, which can be signatures from ScoreSignatures, genetic similarities to reference genotypes, normalized hashtag oligo (HTO) values, gene expression, QC parameters etc), cell types (celltype_a, celltype_b...) are defined based on gates (value_xy cutoffs) of the form:
#'
#' "celltype_a = sign_a > value_aa & sign_b < value_ab,
#'  celltype_b = sign_b > value_ba & sign_a < value_ab [...etc...]"
#'
#' The gating can be done iteratively by restricting to one or several cell types from a given identity (restrict.to and restrict.ident respectively). The gates can also keep existing celltype names and add a suffix or prefix. 
#'
#' Scores are taken from the metadata in priority, otherwise from assays. If found in several assays, define which assay to use in priority with the assay parameter, or to combine several assays prefix the feature name with the name of the assay separated by underscore, as in "assay_feature".
#'
#' Since single cell data tends to be noisy and have a significant amount of dropouts, imputation is usually a great combination with gating strategies, in a similar fashion to gaussian filtering needed before thresholding in most image analysis workflows. For this, see Impute() and CrossImpute().
#'
#' @param object Seurat object. Must have metadata columns containing signature scores, such as those generated by ScoreSignatures, with names matching any gating filter supplied.
#' @param gates charater(1). Gate definitions, with gates for various cell types delimited by commas (,). For each gate, the name of the cell type is given first, followed by equal (=) and a series of gates separated by AND symbol (&). Example: "celltype_a = sign_a > value_aa & sign_b < value_ab, celltype_b = sign_b > value_ba & sign_a < value_ab" where sign_a and sign_b are the names of metadata columns and value_xy are numeric values.
#' @param metadata.name The name of the new metadata column where cell type annotations are stored (Default: celltype)
#' @param assay character(1). If some signatures used for classification are stored as assay features, which assay should be used in priority (Default: DefaultAssay(object)).
#' @param layer character(1). If some signatures used for classification are stored as assay features, which layer should be used (Default: "data").
#' @param restrict.ident character(1). The name of a metadata column containing celltype annotations, which will be used to define the subset of cells that should be classified. Default: current Idents(object). 
#' @param restrict.to character(n). Which celltypes (as defined in the restrict.ident metadata column) should be classified. Default: All cells. 
#' @param unclassified.name character(1). Which name should be given to the cells which do not pass any gate. Can be set to "keep" to keep the existing names for cells which don't pass any gate. Default: "None". 
#' @param naming.mode character(1). Should the cell type names be added as a suffix or prefix to existing cell names instead of replacing them? Possible values: "replace", "prefix", "suffix". Default: "replace". When adding as a suffix or prefix, unclassified.name typically needs to be set to empty string "". 
#' @return A Seurat object with an additional metadata column containing the cell type annotations and Idents() set to these annotations. Cells which do not validate any gate defined are attributed the celltype "None" and cells which pass several gates are labelled "Multiplet".
#' @keywords Cell type classification celltype Classifier manual gates gating
#' @export
#' @examples
#' # MySeuratObject should contain an RNA layer with log-normalized data assay.
#' # In this example cells are classified as T cells (TC) or Epithelial cells (EP) based on manual signatures.
#' # Note that cells that validate no gate will have cell type "None", and cells which validate several will be classified as "Multiplets".
#' # It's recommended to first do a signature scatter plot in order to choose threshold values for the gates.
#' MySeuratObject <- Impute(MySeuratObject)
#' signature.list <- list(TCsign=c("CD3D","CD3E","CD3G","CD4","CD8A","CD8B","GZMA","IFNG"),
#'                        EPsign=c("EPCAM","TFF3","CDX2","CLDN3","KRT8","KRT19","KRT18","OLFM4"))
#' MySeuratObject <- ScoreSignatures(MySeuratObject,signature.list)
#' gates <- "TC = TCsign > 1.2 & EPsign < 0.2,
#'           EP = EPsign > 1.1 & TCsign < 0.3"
#' MySeuratObject <- ClassifyManual(MySeuratObject,gates)
#' DimPlot(MySeuratObject) # Plot the current Idents, which were set to "celltype".
#' # Subclassify T cells, and store the results in the metadata column "TC_subcelltype" (Could also be used to overwrite the idents of TC in an existing metadata column without altering the identities of other cells).
#' gates <- "CD8 = CD8A > 0.5 & CD4 < 0.5, CD4 = CD4 > 0.5 & CD8A < 0.5, DoublePositive = CD8A > 0.5 & CD4 > 0.5"
#' MySeuratObject <- ClassifyManual(MySeuratObject, gates, "TC_subcelltype", restrict.ident="celltype", restrict.to="TC")
#' # Mark dividing cells by adding a suffix to their cell names:
#' MySeuratObject <- ClassifyManual(MySeuratObject, gates="Dividing = PCNA > 1 & MKI67 > 1", naming.mode = "suffix", unclassified.name="")

ClassifyManual <- function(object, gates, metadata.name="celltype", assay=DefaultAssay(object), layer="data", restrict.ident="Default", restrict.to=as.character(unique(Idents(object))), unclassified.name="None", naming.mode="replace"){
  
  # Check that the requested Idents to use are correct
  if(!length(restrict.ident)){
    warning("Ident to use for restriction not found. Aborting.")
    return(object)
  }else{
    if(!restrict.ident %in% c("Default",colnames(object@meta.data))){
      warning("Ident to use for restriction not found. Aborting.")
      return(object)
    }
  }
  
  # Define which cells should be classified
  if(!length(restrict.to)){
    warning("The restrict.to argument is empty, which means no cells should be classified.")
    return(object)
  }else{
    if(restrict.ident == "Default"){
      cells.use <- colnames(object)[Idents(object) %in% restrict.to]
    }else{
      cells.use <- colnames(object)[object@meta.data[,restrict.ident,drop=T] %in% restrict.to]
    }
    n.cells <- length(cells.use)
    if(!n.cells){
      warning("There are no cells in the requested restriction. Check the celltypes required as restrict.to exist within the restrict.ident.")
      return(object)
    }
  }
  
  # If the metadata column does not exist, create it, filled with "undefined"
  if(!metadata.name %in% colnames(object@meta.data)){
    object@meta.data[,metadata.name] <- "undefined"
  }
  
  # Split by cell types:
  gates <- gsub(" ", "", gates)
  gates <- gsub("\n", "", gates)
  gates <- unlist(strsplit(gates,","))
  n_celltypes <- length(gates)
  
  # Split the cell type name from the gates:
  tmp <- strsplit(gates,"=")
  celltype_names <- unlist(lapply(tmp,"[",1))
  gates2 <- unlist(lapply(tmp,"[",2))
  
  # Split the gates:
  subgates <- strsplit(gates2,"&")
  
  # Create a df with the signatures and features to be used:
  unique.signatures.needed <- unique(unlist(lapply(strsplit(unlist(subgates),"<|>"),"[",1)))
  df <- GatherFeatures(object, unique.signatures.needed, assay=assay, layer=layer)
  df <- df[cells.use,,drop=F]
  if(!ncol(df)){
    warning("None of the requested features/signatures were found. Did you forget to run Impute and/or ScoreSignatures?")
    return(object)
  }
  
  # Initialize the cell type filter (start from keep all, to then restrict iteratively for each condition):
  filters <- matrix(data = TRUE, nrow = n.cells, ncol = length(celltype_names), dimnames = list(cells.use,celltype_names))
  
  # Go through the gates for one cell type at a time, to iteratively restrict the filters:
  for(i in 1:n_celltypes){
    for(j in 1:length(subgates[[i]])){
      gate_tmp <- subgates[[i]][j]
      is_greater <- length(grep(">",gate_tmp))
      if(is_greater){
        sign_use <- strsplit(gate_tmp,">")[[1]][1]
        cutoff <- as.numeric(strsplit(gate_tmp,">")[[1]][2])
        filters[,celltype_names[i]] <- filters[,celltype_names[i]] & df[,sign_use] > cutoff
      }else{
        sign_use <- strsplit(gate_tmp,"<")[[1]][1]
        cutoff <- as.numeric(strsplit(gate_tmp,"<")[[1]][2])
        filters[,celltype_names[i]] <- filters[,celltype_names[i]] & df[,sign_use] < cutoff
      }
    }
  }
  
  # Combine the filters to define unique cell types vs None vs Multiplets:
  n_cell_types <- rowSums(filters)
  if(unclassified.name=="keep"){
    celltype <- object@meta.data[cells.use, metadata.name, drop=T]
  }else{
    celltype <- as.vector(matrix(unclassified.name,nrow = n.cells,ncol = 1))
  }
  for(i in 1:ncol(filters)){
    celltype[as.vector(filters[,i])] <- celltype_names[i]
  }
  celltype[n_cell_types>1] <- "Multiplet"
  
  if(naming.mode=="replace"){
    object@meta.data[cells.use, metadata.name] <- celltype
  }else if(naming.mode=="suffix"){
    object@meta.data[cells.use, metadata.name] <- paste0(object@meta.data[cells.use, metadata.name],celltype)
  }else if(naming.mode=="prefix"){
    object@meta.data[cells.use, metadata.name] <- paste0(celltype,object@meta.data[cells.use, metadata.name])
  }else{
    stop("The parameter naming.mode should be set to replace, suffix, or prefix.")
  }
  
  # Make it the default Idents and return
  Idents(object) <- object@meta.data[,metadata.name]
  return(object)
}
nbroguiere/burgertools documentation built on Jan. 30, 2024, 3:48 a.m.