R/31-scMultiSim.R

Defines functions scMultiSim_simulation scMultiSim_estimation

Documented in scMultiSim_estimation scMultiSim_simulation

#' Estimate Parameters From Real Datasets by scMultiSim
#'
#' This function is used to estimate useful parameters from a real dataset by
#' using \code{make_trees} function in simutils package.
#'
#' @param ref_data A count matrix. Each row represents a gene and each column
#' represents a cell.
#' @param verbose Logical.
#' @param other_prior A list with names of certain parameters. Some methods need
#' extra parameters to execute the estimation step, so you must input them. In
#' simulation step, the number of cells, genes, groups, batches, the percent of
#' DEGs are usually customed, so before simulating a dataset you must point it out.
#' See `Details` below for more information.
#' @param seed An integer of a random seed.
#' @importFrom simutils make_trees
#' @return A list contains the estimated parameters and the results of execution
#' detection.
#' @export
#' @details
#' In scMultiSim, users can input cell group information by `other_prior = list(group.condition = xxx)`.
#' If this information is not available, we will detect the potential groups in reference data automatically.
#' For more information about scMultiSim, see `Examples` and `References`.
#'
#' @references
#' Li H, Zhang Z, Squires M, et al. scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks. bioRxiv, 2022: 2022.10. 15.512320. <https://doi.org/10.1101/2022.10.15.512320>
#'
#' Github URL: <https://github.com/ZhangLabGT/scMultiSim>
#'
#'
#' @examples
#' \dontrun{
#' ref_data <- simmethods::data
#' ## estimation
#' estimate_result <- simmethods::scMultiSim_estimation(
#'   ref_data = ref_data,
#'   other_prior = NULL,
#'   verbose = TRUE,
#'   seed = 111
#' )
#' }
#'
scMultiSim_estimation <- function(ref_data,
                                  other_prior = NULL,
                                  verbose = FALSE,
                                  seed){

  ##############################################################################
  ####                            Environment                                ###
  ##############################################################################
  if(!requireNamespace("simutils", quietly = TRUE)){
    message("Splatter is not installed on your device")
    message("Installing simutils...")
    devtools::install_github("duohongrui/simutils")
  }
  ##############################################################################
  ####                               Check                                   ###
  ##############################################################################
  if(!is.matrix(ref_data)){
    ref_data <- as.matrix(ref_data)
  }
  if(!is.null(other_prior[["group.condition"]])){
    group <- other_prior[["group.condition"]]
  }else{
    group <- NULL
  }
  ##############################################################################
  ####                            Estimation                                 ###
  ##############################################################################
  if(verbose){
    message("Estimating parameters using scMultiSim")
  }
  # Seed
  set.seed(seed)
  # Estimation
  estimate_detection <- peakRAM::peakRAM(
    estimate_result <- simutils::make_trees(ref_data,
                                            group = group,
                                            is_Newick = FALSE,
                                            is_parenthetic = TRUE)
  )
  estimate_result <- list(phylo = estimate_result[[1]],
                          data_dim = dim(ref_data))
  ##############################################################################
  ####                           Ouput                                       ###
  ##############################################################################
  estimate_output <- list(estimate_result = estimate_result,
                          estimate_detection = estimate_detection)
  return(estimate_output)
}



#' Simulate Datasets by scMultiSim
#'
#' @param parameters A object generated by [simutils::make_trees()]
#' @param other_prior A list with names of certain parameters. Some methods need
#' extra parameters to execute the estimation step, so you must input them. In
#' simulation step, the number of cells, genes, groups, batches, the percent of
#' DEGs are usually customed, so before simulating a dataset you must point it out.
#' See `Details` below for more information.
#' @param return_format A character. Alternative choices: list, SingleCellExperiment,
#' Seurat, h5ad. If you select `h5ad`, you will get a path where the .h5ad file saves to.
#' @param verbose Logical. Whether to return messages or not.
#' @param seed A random seed.
#' @importFrom stringr str_split str_count str_extract_all str_replace
#' @importFrom dplyr select
#' @importFrom scMultiSim sim_true_counts add_expr_noise divide_batches
#' @export
#' @details
#' In scMultiSim, `nCells` and `nGenes` are usually customed and users can set
#' `other_prior = list(nCells = 1000, nGenes = 2000)` to simulate 1000 cells and 5000 genes.
#' In addition, `nBatches` can be customed for simulating the cell batches.
#'
#'
#' @references
#' Li H, Zhang Z, Squires M, et al. scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks. bioRxiv, 2022: 2022.10. 15.512320. <https://doi.org/10.1101/2022.10.15.512320>
#'
#' Github URL: <https://github.com/ZhangLabGT/scMultiSim>
#'
#' @examples
#' ref_data <- simmethods::data
#' ## estimation
#' estimate_result <- simmethods::scMultiSim_estimation(
#'   ref_data = ref_data,
#'   other_prior = NULL,
#'   verbose = TRUE,
#'   seed = 111
#' )
#'
#' # 1) Simulate with default parameters
#' simulate_result <- simmethods::scMultiSim_simulation(
#'   parameters = estimate_result[["estimate_result"]],
#'   other_prior = NULL,
#'   return_format = "list",
#'   verbose = TRUE,
#'   seed = 111
#' )
#' ## counts
#' counts <- simulate_result[["simulate_result"]][["count_data"]]
#' dim(counts)
#'
scMultiSim_simulation <- function(parameters,
                                  other_prior = NULL,
                                  return_format,
                                  verbose = FALSE,
                                  seed
){

  ##############################################################################
  ####                               Check                                   ###
  ##############################################################################
  # nGenes
  if(!is.null(other_prior[["nGenes"]])){
    other_prior[["num.genes"]] <- other_prior[["nGenes"]]
    other_prior <- other_prior[-which(names(other_prior) == "nGenes")]
  }else{
    other_prior[["num.genes"]] <- parameters$data_dim[1]
  }
  # nCells
  if(!is.null(other_prior[["nCells"]])){
    other_prior[["num.cells"]] <- other_prior[["nCells"]]
    other_prior <- other_prior[-which(names(other_prior) == "nCells")]
  }else{
    other_prior[["num.cells"]] <- parameters$data_dim[2]
  }
  other_prior[["tree"]] <- parameters$phylo
  other_prior[["rand.seed"]] <- seed
  other_prior[["GRN"]] <- NA

  ### trajectory
  if(is.null(other_prior[["traj"]])){
    other_prior[["discrete.cif"]] <- TRUE
  }else{
    cat("Simulating datasets with trajectory.../n")
    other_prior[["discrete.cif"]] <- FALSE
  }

  ### group
  other_prior[["discrete.min.pop.size"]] <- 3
  if(is.null(other_prior[["prob.group"]])){
    other_prior[["discrete.pop.size"]] <- NA
  }else{
    other_prior[["discrete.pop.size"]] <- simutils::proportionate(number = other_prior[["num.cells"]],
                                                                  result_sum_strict = other_prior[["num.cells"]],
                                                                  prop = other_prior[["prob.group"]],
                                                                  prop_sum_strict = 1,
                                                                  digits = 0)
    other_prior[["discrete.pop.size"]] <- as.integer(other_prior[["discrete.pop.size"]])
  }

  ### batch
  if(is.null(other_prior[["nBatches"]])){
    nBatches <- 1
  }else{
    nBatches <- other_prior[["nBatches"]]
    other_prior <- other_prior[-which(names(other_prior) == "nBatches")]
  }

  # Return to users
  message(paste0("nCells: ", other_prior[["num.cells"]]))
  message(paste0("nGenes: ", other_prior[["num.genes"]]))
  message(paste0("nGroups: ", nrow(parameters$phylo$edge)))
  message(paste0("nBatches: ", nBatches))
  ##############################################################################
  ####                            Simulation                                 ###
  ##############################################################################
  if(verbose){
    message("Simulating datasets using scMultiSim")
  }
  # Estimation
  set.seed(seed)
  if(nBatches == 1){
    try_error <- try(
      simulate_detection <- peakRAM::peakRAM(
        simulate_result <- excution_function(options = other_prior,
                                             seed = seed)
      )
    )
  }else{
    try_error <- try(
      simulate_detection <- peakRAM::peakRAM(
        simulate_result <- excution_batch_function(options = other_prior,
                                                   seed = seed,
                                                   nbatch = nBatches)
      )
    )
  }
  ##############################################################################
  ####                        Format Conversion                              ###
  ##############################################################################
  ## col_data
  cell_meta <- simulate_result$cell_meta
  col_data <- cell_meta %>%
    mutate(
      group = paste0("group", rep(c(1:length(table(cell_meta$pop))), table(cell_meta$pop)))
    ) %>%
    dplyr::select("cell_id", "group")
  if(nBatches > 1){
    col_data$"batch" <- paste0("Batch", cell_meta$batch)
    counts <- simulate_result$counts_with_batches
    rownames(counts) <- rownames(simulate_result$counts)
    colnames(counts) <- colnames(simulate_result$counts)
  }else{
    counts <- simulate_result$counts
  }
  colnames(col_data)[1] <- "cell_name"
  ## row_data
  row_data <- data.frame("gene_name" = rownames(counts))
  # Establish SingleCellExperiment
  simulate_result <- SingleCellExperiment::SingleCellExperiment(list(counts = counts),
                                                                colData = col_data,
                                                                rowData = row_data)

  simulate_result <- simutils::data_conversion(SCE_object = simulate_result,
                                               return_format = return_format)

  ##############################################################################
  ####                           Ouput                                       ###
  ##############################################################################
  simulate_output <- list(simulate_result = simulate_result,
                          simulate_detection = simulate_detection)
  return(simulate_output)
}
duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.