prepseudobulk: Synthesize pseudo-bulk RNA-seq data from scRNA-seq data

View source: R/reference.R

prepseudobulkR Documentation

Synthesize pseudo-bulk RNA-seq data from scRNA-seq data

Description

Synthesize pseudo-bulk RNA-seq data for RNA reference generation.

Usage

prepseudobulk(
  Seuratobj,
  targetcelltypes = NULL,
  celltypecolname = "annotation",
  pseudobulknum = 10,
  samplebalance = FALSE,
  pseudobulkpercent = 0.9,
  threads = 1,
  savefile = FALSE
)

Arguments

Seuratobj

An object of class Seurat generated with the Seurat R package from scRNA-seq data, should contain read count data, normalized data, and cell meta data. The meta data should contain a column recording the cell type name of each cell.

targetcelltypes

The cell types in Seuratobj whose content need to be deconvolved via scDeconv package. If NULL, all the cell types included in it will be included. Default is NULL.

celltypecolname

In the "meta.data" slot of Seuratobj, which column records the cell type information for each cell and the name of this column should be transferred to this parameter. Default value is "annotation".

pseudobulknum

The scRNA-seq cell counts contained in Seuratobj will be sampled and used to generate some pseudo-bulk RNA-seq samples, for each cell type. The parameter pseudobulknum here defines how many pseudo-bulk RNA-seq data for each cell type need to be generated. Default is 10.

samplebalance

During generating the pseudo-bulk RNA-seq data, the number of single cells can be sampled is always different for each cell type. If want to adjust this bias and make the single cell numbers used to make pseudo-bulk RNA-seq data same for different cell types, set this parameter as TRUE. Then, the cell types with too many candidate cells will be down-sampled while the ones with much fewer cells will be over-sampled. The down-sampling is performed using bootstrapping, and the over-sampling is conducted with SMOTE (Synthetic Minority Over-sampling Technique). This is a time-consuming step and the default value of this parameter is FALSE.

pseudobulkpercent

If the parameter samplebalance is FALSE, for the pseudo-bulk sampling for each cell type, a percent of single cells for each cell type will be randomly sampled and this parameter is used to set this percent value and should be a number between 0 and 1, but if the parameter samplebalance is set as TRUE, bootstrapping and SMOTE will be performed to do the sampling and this parameter will be omitted.

threads

Number of threads need to be used. Its default value is 1.

savefile

Whether need to save the generated pseudo-bulk matrix as an rds file in the working directory automatically. Default is FALSE.

Value

A pseudo-bulk RNA-seq matrix with pseudo-bulk samples as columns and genes as features. The gene values in this matrix are pseudo-bulk RNA-seq read counts. This matrix can be transferred to the functions scRef, scDeconv, or epDeconv. Their parameter pseudobulkdat can accept this matrix, so that they can skip their own pseudo-bulk data synthesis step and directly use this matrix as their pseudo-bulk data to further generate the RNA deconvolution reference. Because if the scRNA-seq dataset need to be converted to the RNA referece is large, generating the pseudo-bulk data can be time-consuming and if the scRNA-seq data need to be repeatedly used to deconvolve different datasets, to avoid repeating this pseudo-bulk data generation process, this function can be used to synthesize and save the data in advance, then the data can be repeatedly used and the synthesis step can always be skipped.


yuabrahamliu/scDeconv documentation built on March 28, 2024, 3:15 p.m.