scRef: Make cell deconvolution reference from scRNA-seq data
In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

scRef

R Documentation

Make cell deconvolution reference from scRNA-seq data

Description

Make cell deconvolution reference matrix from scRNA-seq data.

Usage

scRef(
  Seuratobj,
  targetcelltypes = NULL,
  celltypecolname = "annotation",
  pseudobulknum = 10,
  samplebalance = FALSE,
  pseudobulkpercent = 0.9,
  pseudobulkdat = NULL,
  geneversion = "hg19",
  genekey = "SYMBOL",
  targetdat = NULL,
  targetlogged = FALSE,
  manualmarkerlist = NULL,
  markerremovecutoff = 0.6,
  minrefgenenum = 500,
  savefile = FALSE,
  threads = 1,
  cutoff = 0.95,
  adjustcutoff = 0.4
)

Arguments

`Seuratobj`	An object of class Seurat generated with the `Seurat` R package from scRNA-seq data, should contain read count data, normalized data, and cell meta data. The meta data should contain a column recording the cell type name of each cell.
`targetcelltypes`	The cell types whose content need to be deconvolved. If NULL, all the cell types included in `Seuratobj` will be included. Default is NULL.
`celltypecolname`	In the "meta.data" slot of `Seuratobj`, which column records the cell type information for each cell and the name of this column should be transferred to this parameter. Default value is "annotation".
`pseudobulknum`	At the beginning of making the cell reference matrix, the scRNA-seq cell counts contained in `Seuratobj` will be sampled and used to generate some pseudo-bulk RNA-seq samples, for each cell type. The parameter `pseudobulknum` here defines how many pseudo-bulk RNA-seq data for each cell type need to be generated. Default is 10.
`samplebalance`	During generating the pseudo-bulk RNA-seq data, the number of single cells can be sampled is always different for each cell type. If want to adjust this bias and make the single cell numbers used to make pseudo-bulk RNA-seq data same for different cell types, set this parameter as TRUE. Then, the cell types with too many candidate cells will be down-sampled while the ones with much fewer cells will be over-sampled. The down-sampling is performed using bootstrapping, and the over-sampling is conducted with SMOTE (Synthetic Minority Over-sampling Technique). This is a time-consuming step and the default value of this parameter is FALSE.
`pseudobulkpercent`	If the parameter `samplebalance` is FALSE, for the pseudo-bulk sampling for each cell type, a percent of single cells for each cell type will be randomly sampled and this parameter is used to set this percent value and should be a number between 0 and 1, but if the parameter `samplebalance` is set as TRUE, bootstrapping and SMOTE will be performed to do the sampling and this parameter will be omitted.
`pseudobulkdat`	If the scRNA-seq data transferred via `Seuratobj` is large, the pseudo-bulk RNA-seq data generation step will become time- consuming, and if this same scRNA-seq data needs to be used repeatedly for deconvolving different bulk datasets, to save time, it is recommended to use the function `prepseudobulk` to generate and save the pseudo-bulk RNA-seq data at the first time, and then the data can be transferred to this parameter `pseudobulkdat`, so that `scRef` can always skip its own pseudo-bulk data generation step and directly use the data here to further generate the final RNA deconvolution reference. The default value of this parameter is NULL, and in this case, the synthesis step will not be skipped and `scRef` will synthesize the pseudo-bulk data itself.
`geneversion`	To calculate the TPM value of the genes in the reference matrix, the effective length of the genes will be needed. This parameter is used to define from which genome version the effective gene length will be extracted. For human genes, "hg19" or "hg38" can be used, for mouse, "mm10" can be used. Default is "hg19".
`genekey`	The type of the gene IDs used in the `Seuratobj`, it is "SYMBOL" in most cases, and the default value of this parameter is also "SYMBOL", but sometimes it may be "ENTREZID", "ENSEMBL", or other types.
`targetdat`	The target cell mixture gene expression data need to be deconvolved. Should be a matrix with each column representing one sample and each row representing one gene. The gene ID type here should be the same as that transferred to the parameter `genekey`. Row names are gene IDs and column names are sample IDs. The default value of it is NULL. In this case, the reference matrix generation step will only base on the scRNA-seq data provided by the previous parameter `Seuratobj`, but if provide a matrix to be deconvolved to this parameter, both the reference matrix and this cell mixture matrix will be further processed, including combining the 2 matrices to remove their batch difference, selecting more genes into the reference matrix based on the correlation between the genes and the selected marker genes in the cell mixture , etc. It is recommended to provide the cell mixture matrix via this parameter, especially when the cell mixture is from RNA microarray, rather than RNA-seq data, so that the combination process will be performed to reduce the platform difference between RNA microarray and scRNA-seq.
`targetlogged`	Whether the gene expression values in `targetdat` are log2 transformed values or not.
`manualmarkerlist`	During making the reference matrix, for each cell type, the genes specially expressed in it with a high level will be deemed as markers and further used to generate the reference. However, it cannot be ensured that some known classical markers can be selected, and so if want to make sure these markers can be used to make the reference, a list can be used as an input to this parameter, with its element names as the cell type names and the elements as vectors with the gene IDs of these classical markers. It should be noted that before the final reference is determined, all the marker genes need to go through several filter steps, such as extremely highly expressed genes and collinearity contributing genes removal, to improve the reference quality, so that the classical genes provided via this parameter will be definitely used for reference generation, but may also be filtered out before the final one is returned. The default value of this parameter is NULL.
`markerremovecutoff`	When a gene expression matrix is provided to the parameter `targetdat`, the gene expression values in it will be used to calculate the correlation with the scRNA-seq selected markers in this cell mixture matrix and the ones with a high Pearson correlation to the first principle component of these marker genes will also be used to make the reference. The cutoff of the Pearson correlation coefficient is set by this parameter and the default value is 0.6.
`minrefgenenum`	Because the genes to generate the reference matrix need to go through several filter steps and in some cases, only a small number of them can fulfill all the filter conditions, which makes the gene number in the reference is very small and then influences the next deconvolution. To avoid this extreme case, a cutoff for the reference gene number need to be defined here, so that once the gene number in the reference has been filtered to this level, the filter process will be ended to guarantee the gene number of the reference. This parameter is used to set this cutoff, and its default value is 500.
`savefile`	Whether need to save the finally generated reference matrix, and the adjusted cell mixture matrix (if provided to `targetdat`), as rds file(s) in the working directory automatically. Default is FALSE.
`threads`	Number of threads need to be used to do the computation. Its default value is 1.
`cutoff`	To improve the robustness of the deconvolution result, some extremely highly expressed genes in the reference need to be filtered out due to their large variance. This cutoff is used to set the percent of genes can be kept in the reference while the other genes with a higher expression level will be filtered. The default value is 0.95, meaning the top 5% most highly expressed genes will be removed from the reference.
`adjustcutoff`	For some similar cell types, their gene expressions in the reference matrix have a large correlation, which makes the downstream deconvolution difficult. To relive this problem, for each similar cell pair, some genes largely contributing to their correlation will be found and removed, so that their correlation in the reference can be reduced. This parameter `adjustcutoff` is used to set the cutoff of the cell pair correlation, and if a cell pair has a Pearson correlation coefficient greater than this value, the contributing gene filter process will be used to reduce the coefficient until it becomes smaller than this value. The default value is 0.4.

Value

A list with the final reference matrix as its element, and if the cell mixture data matrix to be deconvolved is provided to the parameter targetdat, a adjusted one will also be returned as an element of this list. The gene values in this adjusted matrix are non-log transformed values.

yuabrahamliu/scDeconv documentation built on March 28, 2024, 3:15 p.m.

yuabrahamliu/scDeconv index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yuabrahamliu/scDeconv
scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

scRef: Make cell deconvolution reference from scRNA-seq data
In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

Make cell deconvolution reference from scRNA-seq data

Description

Usage

Arguments

Value

Related to scRef in yuabrahamliu/scDeconv...

R Package Documentation

Browse R Packages

We want your feedback!

yuabrahamliu/scDeconv scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

scRef: Make cell deconvolution reference from scRNA-seq data In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

Make cell deconvolution reference from scRNA-seq data

Description

Usage

Arguments

Value

Related to scRef in yuabrahamliu/scDeconv...

R Package Documentation

Browse R Packages

We want your feedback!

yuabrahamliu/scDeconv
scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

scRef: Make cell deconvolution reference from scRNA-seq data
In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.