R/09-POWSC.R

Defines functions POWSC_simulation POWSC_estimation

Documented in POWSC_estimation POWSC_simulation

#' Estimate Parameters From Real Datasets by POWSC
#'
#' This function is used to estimate useful parameters from a real dataset by
#' using `Est2Phase` function in POWSC package.
#'
#' @param ref_data A count matrix. Each row represents a gene and each column
#' represents a cell.
#' @param verbose Logical.
#' @param other_prior A list with names of certain parameters. Some methods need
#' extra parameters to execute the estimation step, so you must input them. In
#' simulation step, the number of cells, genes, groups, batches, the percent of
#' DEGs and other variables are usually customed, so before simulating a dataset
#' you must point it out.
#' @param seed An integer of a random seed.
#' @details
#' When you use POWSC to estimate parameters from a real dataset, the default settings
#' are recommended.
#' @importFrom peakRAM peakRAM
#' @return A list contains the estimated parameters and the results of execution
#' detection.
#' @export
#' @examples
#' \dontrun{
#' ref_data <- SingleCellExperiment::counts(scater::mockSCE())
#'
#' ## estimation
#' estimate_result <- simmethods::POWSC_estimation(ref_data = ref_data,
#'                                                 other_prior = NULL,
#'                                                 verbose = TRUE,
#'                                                 seed = 111)
#' }
#'
POWSC_estimation <- function(ref_data,
                             verbose = FALSE,
                             other_prior = NULL,
                             seed
){
  ##############################################################################
  ####                            Environment                                ###
  ##############################################################################
  if(!requireNamespace("POWSC", quietly = TRUE)){
    message("Splatter is not installed on your device...")
    message("Installing POWSC...")
    BiocManager::install("POWSC")
  }
  ##############################################################################
  ####                               Check                                   ###
  ##############################################################################
  if(!is.matrix(ref_data)){
    ref_data <- as.matrix(ref_data)
  }
  if(is.null(other_prior[["low.prob"]])){
    low.prob <- 0.99
  }else{
    low.prob <- other_prior[["low.prob"]]
  }
  ##############################################################################
  ####                            Estimation                                 ###
  ##############################################################################
  if(verbose){
    message("Estimating parameters using POWSC")
  }
  # Seed
  set.seed(seed)
  # Estimation
  estimate_detection <- peakRAM::peakRAM(
    estimate_result <- POWSC::Est2Phase(ref_data, low.prob = low.prob)
  )
  ##############################################################################
  ####                           Ouput                                       ###
  ##############################################################################
  estimate_output <- list(estimate_result = estimate_result,
                          estimate_detection = estimate_detection)
  return(estimate_output)
}



#' Simulate Datasets by POWSC
#'
#' This function is used to simulate datasets from learned parameters by `Simulate2SCE`
#' function in POWSC package.
#'
#' @param parameters A object generated by [POWSC::Est2Phase()]
#' @param other_prior A list with names of certain parameters. Some methods need
#' extra parameters to execute the estimation step, so you must input them. In
#' simulation step, the number of cells, genes, groups, batches, the percent of
#' DEGs and other variables are usually customed, so before simulating a dataset
#' you must point it out.
#' @param return_format A character. Alternative choices: list, SingleCellExperiment,
#' Seurat, h5ad. If you select `h5ad`, you will get a path where the .h5ad file saves to.
#' @param verbose Logical. Whether to return messages or not.
#' @param seed A random seed.
#' @details
#' In addtion to simulate datasets with default parameters, users want to simulate
#' other kinds of datasets, e.g. a counts matrix with 2 or more cell groups. In
#' POWSC, you can set extra parameters to simulate datasets.
#'
#' The customed parameters you can set are below:
#' 1. nCells. In POWSC, you can set nCells directly by `other_prior = list(nCells = 1000)`.
#' 2. nGroups. POWSC can only simulate **two** groups.
#' 3. de.prob. You can directly set `other_prior = list(de.prob = 0.2)` to simulate DEGs that account for 20 percent of all genes.
#' @importFrom SingleCellExperiment counts colData rowData
#' @importFrom stringr str_replace
#' @export
#' @examples
#' \dontrun{
#' ref_data <- SingleCellExperiment::counts(scater::mockSCE())
#'
#' ## estimation
#' estimate_result <- simmethods::POWSC_estimation(ref_data = ref_data,
#'                                                 other_prior = NULL,
#'                                                 verbose = TRUE,
#'                                                 seed = 111)
#'
#' ## default setting
#' simulate_result <- simmethods::POWSC_simulation(
#'   parameters = estimate_result[["estimate_result"]],
#'   other_prior = NULL,
#'   return_format = "list",
#'   verbose = TRUE,
#'   seed = 111
#' )
#' ## counts
#' counts <- simulate_result[["simulate_result"]][["count_data"]]
#' dim(counts)
#' ## cell information
#' col_data <- simulate_result[["simulate_result"]][["col_meta"]]
#' table(col_data$group)
#' ## gene information
#' row_data <- simulate_result[["simulate_result"]][["row_meta"]]
#' table(row_data$de_gene)
#'
#'
#' ## Simulate 1000 cells (de.prob = 0.2)
#' simulate_result <- simmethods::POWSC_simulation(
#'   parameters = estimate_result[["estimate_result"]],
#'   other_prior = list(nCells = 1000,
#'                      de.prob = 0.2),
#'   return_format = "list",
#'   verbose = TRUE,
#'   seed = 111
#' )
#' ## counts
#' counts <- simulate_result[["simulate_result"]][["count_data"]]
#' dim(counts)
#' ## cell information
#' col_data <- simulate_result[["simulate_result"]][["col_meta"]]
#' table(col_data$group)
#' ## gene information
#' row_data <- simulate_result[["simulate_result"]][["row_meta"]]
#' table(row_data$de_gene)
#' }
#'
POWSC_simulation <- function(parameters,
                             other_prior = NULL,
                             return_format,
                             verbose = FALSE,
                             seed
){
  ##############################################################################
  ####                            Environment                                ###
  ##############################################################################
  if(!requireNamespace("POWSC", quietly = TRUE)){
    message("Splatter is not installed on your device...")
    message("Installing POWSC...")
    BiocManager::install("POWSC")
  }
  ##############################################################################
  ####                               Check                                   ###
  ##############################################################################
  if(is.null(other_prior[["nCells"]])){
    n <- ncol(parameters[["exprs"]])
  }else{
    n <- other_prior[["nCells"]]
  }
  if(is.null(other_prior[["de.prob"]])){
    perDE <- 0.05
  }else{
    perDE <- other_prior[["de.prob"]]/2
  }
  message(paste0("nCells: ", n))
  message(paste0("nGenes: ", dim(parameters[['exprs']])[1]))
  message("nGroups: 2")
  message(paste0("de.prob: ", perDE *2))
  ##############################################################################
  ####                            Simulation                                 ###
  ##############################################################################
  if(verbose){
    message("Simulating datasets using POWSC")
  }
  # Seed
  set.seed(seed)
  # Simulation
  simulate_detection <- peakRAM::peakRAM(
    simulate_result <- POWSC::Simulate2SCE(n = n,
                                           perDE = perDE,
                                           estParas1 = parameters,
                                           estParas2 = parameters))
  ##############################################################################
  ####                        Format Conversion                              ###
  ##############################################################################
  counts <- SingleCellExperiment::counts(simulate_result[["sce"]])
  colnames(counts) <- paste0("Cell", c(1:ncol(counts)))
  rownames(counts) <- paste0("Gene", c(1:nrow(counts)))
  ## cell information
  col_data <- data.frame("cell_name" = colnames(counts),
                         "group" = ifelse(colData(simulate_result[["sce"]])[,1] == "celltype1", "Group1", "Group2"))
  ## gene information
  de_gene <- as.numeric(str_extract_all(simulate_result[["DEGs"]],
                                        "[0-9]+",
                                        simplify = TRUE))
  row_data <- data.frame("gene_name" = rownames(counts),
                         "de_gene" = "no")
  row_data$de_gene[de_gene] <- "yes"
  # Establish SingleCellExperiment
  simulate_result <- SingleCellExperiment::SingleCellExperiment(list(counts = counts),
                                                                colData = col_data,
                                                                rowData = row_data)
  simulate_result <- simutils::data_conversion(SCE_object = simulate_result,
                                               return_format = return_format)

  ##############################################################################
  ####                           Ouput                                       ###
  ##############################################################################
  simulate_output <- list(simulate_result = simulate_result,
                          simulate_detection = simulate_detection)
  return(simulate_output)
}
duohongrui/simmethods documentation built on June 17, 2024, 10:49 a.m.