R/10-scDD.R

Defines functions scDD_simulation scDD_estimation

Documented in scDD_estimation scDD_simulation

#' Estimate Parameters From Real Datasets by scDD
#'
#' This function is used to estimate useful parameters from a real dataset by
#' using `scDDEstimate` function in Splatter package.
#'
#' @param ref_data A count matrix. Each row represents a gene and each column
#' represents a cell.
#' @param verbose Logical.
#' @param seed An integer of a random seed.
#' @param other_prior A list with names of certain parameters. Some methods need
#' extra parameters to execute the estimation step, so you must input them. In
#' simulation step, the number of cells, genes, groups, batches, the percent of
#' DEGs are usually customed, so before simulating a dataset you must point it out.
#' See `Details` below for more information.
#' @importFrom splatter scDDEstimate
#' @return A list contains the estimated parameters and the results of execution
#' detection.
#' @export
#' @details
#' When you use scDD to estimate parameters from a real dataset, you must input
#' a numeric vector to specify the groups or plates that each cell comes from,
#' like `other_prior = list(group.condition = the numeric vector)`. See `Examples`
#' and learn from it.
#' @references
#' Korthauer K D, Chu L F, Newton M A, et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome biology, 2016, 17(1): 1-15. <https://doi.org/10.1186/s13059-016-1077-y>
#'
#' Bioconductor URL: <https://www.bioconductor.org/packages/release/bioc/html/scDD.html>
#'
#' Github URL: <https://github.com/kdkorthauer/scDD>
#' @examples
#' \dontrun{
#' ref_data <- SingleCellExperiment::counts(scater::mockSCE())
#' ## group information
#' set.seed(111)
#' group_condition <- sample(c(1, 2), 200, replace = TRUE)
#' other_prior <- list(group.condition = group_condition)
#' ## estimation
#' estimate_result <- simmethods::scDD_estimation(ref_data = ref_data,
#'                                                other_prior = other_prior,
#'                                                verbose = TRUE,
#'                                                seed = 111)
#' }
#'
scDD_estimation <- function(ref_data,
                            verbose = FALSE,
                            other_prior = NULL,
                            seed
){
  ##############################################################################
  ####                               Check                                   ###
  ##############################################################################
  if(!is.matrix(ref_data)){
    ref_data <- as.matrix(ref_data)
  }
  if(is.null(other_prior[["group.condition"]])){
    stop("Please input the conditions that each cell belongs to")
  }
  ##############################################################################
  ####                            Estimation                                 ###
  ##############################################################################
  if(verbose){
    message("Estimating parameters using scDD")
  }
  # Seed
  set.seed(seed)
  # Estimation
  estimate_detection <- peakRAM::peakRAM(
    estimate_result <- splatter::scDDEstimate(ref_data,
                                              condition = other_prior[["group.condition"]],
                                              verbose = verbose)
  )
  ##############################################################################
  ####                           Ouput                                       ###
  ##############################################################################
  estimate_output <- list(estimate_result = estimate_result,
                          estimate_detection = estimate_detection)
  return(estimate_output)
}



#' Simulate Datasets by scDD
#'
#' This function is used to simulate datasets from learned parameters by `scDDSimulate`
#' function in Splatter package.
#'
#' @param parameters A object generated by [splatter::scDDEstimate()]
#' @param other_prior A list with names of certain parameters. Some methods need
#' extra parameters to execute the estimation step, so you must input them. In
#' simulation step, the number of cells, genes, groups, batches, the percent of
#' DEGs are usually customed, so before simulating a dataset you must point it out.
#' See `Details` below for more information.
#' @param return_format A character. Alternatives choices: list, SingleCellExperiment,
#' Seurat, h5ad. If you select `h5ad`, you will get a path where the .h5ad file saves to.
#' @param verbose Logical. Whether to return messages or not.
#' @param seed A random seed.
#' @importFrom splatter scDDSimulate
#' @importFrom SingleCellExperiment rowData colData
#' @export
#' @details
#' In scDD, users can only set `nCells` to specify the number of cells because
#' the genes are already fixed after estimation step. See `Examples`.
#' @references
#' Korthauer K D, Chu L F, Newton M A, et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome biology, 2016, 17(1): 1-15. <https://doi.org/10.1186/s13059-016-1077-y>
#'
#' Bioconductor URL: <https://www.bioconductor.org/packages/release/bioc/html/scDD.html>
#'
#' Github URL: <https://github.com/kdkorthauer/scDD>
#'
#' @examples
#' \dontrun{
#' ref_data <- SingleCellExperiment::counts(scater::mockSCE())
#' ## group information
#' set.seed(111)
#' group_condition <- sample(c(1, 2), 200, replace = TRUE)
#' other_prior <- list(group.condition = group_condition)
#' ## estimation
#' estimate_result <- simmethods::scDD_estimation(ref_data = ref_data,
#'                                                other_prior = other_prior,
#'                                                verbose = TRUE,
#'                                                seed = 111)
#'
#' ## Simulate 1000 cells
#' simulate_result <- simmethods::scDD_simulation(parameters = estimate_result[["estimate_result"]],
#'                                                other_prior = list(nCells = 1000),
#'                                                return_format = "list",
#'                                                verbose = TRUE,
#'                                                seed = 111)
#' ## counts
#' counts <- simulate_result[["simulate_result"]][["count_data"]]
#' dim(counts)
#' ## cell information
#' col_data <- simulate_result[["simulate_result"]][["col_meta"]]
#' table(col_data$group)
#' ## gene information
#' row_data <- simulate_result[["simulate_result"]][["row_meta"]]
#' head(row_data)
#' }
#'
scDD_simulation <- function(parameters,
                            other_prior = NULL,
                            return_format,
                            verbose = FALSE,
                            seed
){
  ##############################################################################
  ####                               Check                                   ###
  ##############################################################################
  ## nCells
  nCells <- ifelse(other_prior[["nCells"]]%%2 == 0,
                   other_prior[["nCells"]]/2,
                   (other_prior[["nCells"]]-1)/2)
  if(!is.null(other_prior[["nCells"]])){
    parameters <- splatter::setParam(parameters, "nCells", nCells)
  }
  assertthat::assert_that(class(parameters) == "SCDDParams")
  # Change parameters
  if(!is.null(other_prior[["nDE"]])){
    parameters <- splatter::setParams(parameters, nDE = other_prior[["nDE"]])
  }
  if(!is.null(other_prior[["nDP"]])){
    parameters <- splatter::setParams(parameters, nDP = other_prior[["nDP"]])
  }
  if(!is.null(other_prior[["nDM"]])){
    parameters <- splatter::setParams(parameters, nDM = other_prior[["nDM"]])
  }
  if(!is.null(other_prior[["nDB"]])){
    parameters <- splatter::setParams(parameters, nDB = other_prior[["nDB"]])
  }
  if(!is.null(other_prior[["nEE"]])){
    parameters <- splatter::setParams(parameters, nEE = other_prior[["nEE"]])
  }
  if(!is.null(other_prior[["nEP"]])){
    parameters <- splatter::setParams(parameters, nEP = other_prior[["nEP"]])
  }

  # Get params to check
  params_check <- splatter::getParams(parameters, c("nCells",
                                                    "nGenes"))
  nDE <- splatter::getParams(parameters, "nDE") %>% unlist()
  nDP <- splatter::getParams(parameters, "nDP") %>% unlist()
  nDM <- splatter::getParams(parameters, "nDM") %>% unlist()
  nDB <- splatter::getParams(parameters, "nDB") %>% unlist()
  nEE <- splatter::getParams(parameters, "nEE") %>% unlist()
  nEP <- splatter::getParams(parameters, "nEP") %>% unlist()
  de.prob <- sum(nDE, nDP, nDM, nDB)/sum(nDE, nDP, nDM, nDB, nEE, nEP)
  # Return to users
  message(paste0("nCells: ", params_check[['nCells']] * 2))
  message(paste0("nGenes: ", params_check[['nGenes']]))
  message("nGroups: 2")
  message(paste0("de.prob: ", de.prob))
  ##############################################################################
  ####                            Simulation                                 ###
  ##############################################################################
  if(verbose){
    message("Simulating datasets using scDD")
  }
  # Seed
  parameters <- splatter::setParam(parameters, name = "seed", value = seed)

  # Estimation
  simulate_detection <- peakRAM::peakRAM(
    simulate_result <- splatter::scDDSimulate(parameters,
                                              verbose = verbose)
  )
  ## counts
  counts <- SingleCellExperiment::counts(simulate_result)
  ## col_data
  col_data <- as.data.frame(SingleCellExperiment::colData(simulate_result))
  col_data$Condition <- paste0("Group", col_data$Condition)
  colnames(col_data) <- c("cell_name", "group")
  ## row_data
  row_data <- as.data.frame(SingleCellExperiment::rowData(simulate_result))
  row_data$FoldChange <- ifelse(is.na(row_data$FoldChange), 0, row_data$FoldChange)
  colnames(row_data) <- c("gene_name", "DEstatus", "fc_gene")

  # Establish SingleCellExperiment
  simulate_result <- SingleCellExperiment::SingleCellExperiment(list(counts = counts),
                                                                colData = col_data,
                                                                rowData = row_data)
  ##############################################################################
  ####                        Format Conversion                              ###
  ##############################################################################
  simulate_result <- simutils::data_conversion(SCE_object = simulate_result,
                                               return_format = return_format)

  ##############################################################################
  ####                           Ouput                                       ###
  ##############################################################################
  simulate_output <- list(simulate_result = simulate_result,
                          simulate_detection = simulate_detection)
  return(simulate_output)
}
duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.