| scRef | R Documentation |
Make cell deconvolution reference matrix from scRNA-seq data.
scRef(
Seuratobj,
targetcelltypes = NULL,
celltypecolname = "annotation",
pseudobulknum = 10,
samplebalance = FALSE,
pseudobulkpercent = 0.9,
pseudobulkdat = NULL,
geneversion = "hg19",
genekey = "SYMBOL",
targetdat = NULL,
targetlogged = FALSE,
manualmarkerlist = NULL,
markerremovecutoff = 0.6,
minrefgenenum = 500,
savefile = FALSE,
threads = 1,
cutoff = 0.95,
adjustcutoff = 0.4
)
Seuratobj |
An object of class Seurat generated with the |
targetcelltypes |
The cell types whose content need to be deconvolved.
If NULL, all the cell types included in |
celltypecolname |
In the "meta.data" slot of |
pseudobulknum |
At the beginning of making the cell reference matrix,
the scRNA-seq cell counts contained in |
samplebalance |
During generating the pseudo-bulk RNA-seq data, the number of single cells can be sampled is always different for each cell type. If want to adjust this bias and make the single cell numbers used to make pseudo-bulk RNA-seq data same for different cell types, set this parameter as TRUE. Then, the cell types with too many candidate cells will be down-sampled while the ones with much fewer cells will be over-sampled. The down-sampling is performed using bootstrapping, and the over-sampling is conducted with SMOTE (Synthetic Minority Over-sampling Technique). This is a time-consuming step and the default value of this parameter is FALSE. |
pseudobulkpercent |
If the parameter |
pseudobulkdat |
If the scRNA-seq data transferred via |
geneversion |
To calculate the TPM value of the genes in the reference matrix, the effective length of the genes will be needed. This parameter is used to define from which genome version the effective gene length will be extracted. For human genes, "hg19" or "hg38" can be used, for mouse, "mm10" can be used. Default is "hg19". |
genekey |
The type of the gene IDs used in the |
targetdat |
The target cell mixture gene expression data need to be
deconvolved. Should be a matrix with each column representing one sample
and each row representing one gene. The gene ID type here should be the
same as that transferred to the parameter |
targetlogged |
Whether the gene expression values in |
manualmarkerlist |
During making the reference matrix, for each cell type, the genes specially expressed in it with a high level will be deemed as markers and further used to generate the reference. However, it cannot be ensured that some known classical markers can be selected, and so if want to make sure these markers can be used to make the reference, a list can be used as an input to this parameter, with its element names as the cell type names and the elements as vectors with the gene IDs of these classical markers. It should be noted that before the final reference is determined, all the marker genes need to go through several filter steps, such as extremely highly expressed genes and collinearity contributing genes removal, to improve the reference quality, so that the classical genes provided via this parameter will be definitely used for reference generation, but may also be filtered out before the final one is returned. The default value of this parameter is NULL. |
markerremovecutoff |
When a gene expression matrix is provided to the
parameter |
minrefgenenum |
Because the genes to generate the reference matrix need to go through several filter steps and in some cases, only a small number of them can fulfill all the filter conditions, which makes the gene number in the reference is very small and then influences the next deconvolution. To avoid this extreme case, a cutoff for the reference gene number need to be defined here, so that once the gene number in the reference has been filtered to this level, the filter process will be ended to guarantee the gene number of the reference. This parameter is used to set this cutoff, and its default value is 500. |
savefile |
Whether need to save the finally generated reference matrix,
and the adjusted cell mixture matrix (if provided to |
threads |
Number of threads need to be used to do the computation. Its default value is 1. |
cutoff |
To improve the robustness of the deconvolution result, some extremely highly expressed genes in the reference need to be filtered out due to their large variance. This cutoff is used to set the percent of genes can be kept in the reference while the other genes with a higher expression level will be filtered. The default value is 0.95, meaning the top 5% most highly expressed genes will be removed from the reference. |
adjustcutoff |
For some similar cell types, their gene expressions in
the reference matrix have a large correlation, which makes the downstream
deconvolution difficult. To relive this problem, for each similar cell
pair, some genes largely contributing to their correlation will be found
and removed, so that their correlation in the reference can be reduced.
This parameter |
A list with the final reference matrix as its element, and if the
cell mixture data matrix to be deconvolved is provided to the parameter
targetdat, a adjusted one will also be returned as an element of
this list. The gene values in this adjusted matrix are non-log transformed
values.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.