epDeconv: Deconvolve bulk DNA methylation data using RNA reference
In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

epDeconv

R Documentation

Deconvolve bulk DNA methylation data using RNA reference

Description

Deconvolve bulk DNA methylation data with RNA reference and also by a paired bulk RNA-bulk DNA methylation dataset

Usage

epDeconv(
  rnaref = NULL,
  Seuratobj = NULL,
  targetcelltypes = NULL,
  celltypecolname = "annotation",
  samplebalance = FALSE,
  pseudobulkdat = NULL,
  geneversion = "hg19",
  genekey = "SYMBOL",
  manualmarkerlist = NULL,
  rnamat,
  methylmat,
  learnernum = 10,
  rnamatlogged,
  resscale = FALSE,
  threads = 1,
  lassoerrortype = "min",
  targetmethyldat = NULL,
  plot = FALSE,
  pddat = NULL,
  targetmethylpddat = NULL
)

Arguments

`rnaref`	The RNA reference recording the signature of each cell type. Each row is one gene, and each column is one cell type. Each entry should be a gene TPM value. Column names are cell type names and row names are gene names. The default is NULL and in this case, it can be synthesized from the scRNA-seq data transferred to the parameter `Seuratobj`.
`Seuratobj`	An object of class Seurat generated with the `Seurat` R package from scRNA-seq data, should contain read count data, normalized data, and cell meta data. The meta data should contain a column recording the cell type name of each cell. When `rnaref` is set as NULL, but this parameter is provided with the matching data, it will be used to make the RNA reference for the downstream deconvolution.
`targetcelltypes`	When use `Seuratobj` to make the RNA reference, this parameter defines the cell types should be coverred by the reference. If NULL, all the cell types included in `Seuratobj` will be included. Default is NULL.
`celltypecolname`	When use `Seuratobj` to make the RNA reference, this parameter indicates which column in its "meta.data" slot records the cell type information for each cell and the name of this column should be transferred to this parameter. Default value is "annotation".
`samplebalance`	When use `Seuratobj` to make the RNA reference, at the beginning, the scRNA-seq cell counts data in `Seuratobj` will be sampled and used to make 100 pseudo-bulk RNA-seq samples, for each cell type, and during synthesizing, the number of single cells can be sampled is always different for each cell type. If want to adjust the bias and make the single cell numbers used to generate pseudo-bulk RNA-seq data same for different cell types, set this parameter as TRUE. Then, the cell types with too many candidate cells will be down-sampled while the ones with much fewer cells will be over-sampled. The down-sampling is performed with bootstrapping, and the over-sampling is via SMOTE (Synthetic Minority Over-sampling Technique). This is a time-consuming step and the default is FALSE, so that no such adjustment will be performed during generating the pseudo-bulk samples.
`pseudobulkdat`	If the scRNA-seq data transferred via `Seuratobj` is large, the pseudo-bulk RNA-seq data generation step will become time- consuming, and if this same scRNA-seq data needs to be used repeatedly for deconvolving different bulk datasets, to save time, it is recommended to use the function `prepseudobulk` to generate and save the pseudo-bulk RNA-seq data at the first time, and then the data can be transferred to this parameter `pseudobulkdat`, so that `epDeconv` can skip its own pseudo-bulk data generation step and use the data here to generate the final RNA deconvolution reference. The default value of this parameter is NULL, and in this case, the synthesis step will not be skipped.
`geneversion`	To calculate the TPM value of the genes when generating the reference matrix, the effective length of the genes will be needed. This parameter is used to define from which genome version the effective gene length will be extracted. For human genes, "hg19" or "hg38" can be used, for mouse, "mm10" can be used. Default is "hg19".
`genekey`	The type of the gene IDs used in the `Seuratobj`, it is "SYMBOL" in most cases, and the default value of this parameter is also "SYMBOL", but sometimes it may be "ENTREZID", "ENSEMBL", or other types.
`manualmarkerlist`	During making the reference matrix from scRNA-seq data, for each cell type, the genes specially expressed in it with a high level will be deemed as markers and used to generate the reference, but it cannot be ensured that some known classical markers can be selected, and so if want to make sure these markers can be used for the reference, a list can be used as an input to this parameter, with its element names as the cell type names and the elements as vectors with the gene IDs of these classical markers. It should be noted that before the final reference is determined, all the marker genes need to go through several filter steps, such as extremely highly expressed genes and colinearity contributing genes removal, to improve the reference quality, so that the classical genes provided via this parameter will be definitely used for reference generation, but may also be filtered out before the final one is made. The default value of this parameter is NULL.
`rnamat`	The RNA data of the paired bulk RNA-bulk methylation dataset. Its sample cell contents will be first deconvolved via the RNA reference provided to the parameter `rnaref`, or generated by `Seuratobj`, then downstream steps will be started to fulfill the bulk methylation data deconvolution. Should be a matrix with each column representing a sample and each row for one gene. Row names are gene names and column names are sample IDs. If the reference matrix is transferred via `rnaref` and generated with the function `scRef`, and both the scRNA-seq and this paired RNA dataset were transferred to it, the result reference matrix can be transferred to `rnaref` and the adjusted paired RNA data returned by `scRef` can be transferred to this parameter.
`methylmat`	The DNA methylaiton data of the paired bulk RNA-bulk DNA methylaiton dataset. Should be a matrix with each column representing a sample and each row representing a feature. Row names are feature names and column names are sample IDs. The sample IDs should be the same as the ones in `rnamat`, because they are data for paired samples.
`rnamatlogged`	A logical value indicating whether the gene values in `rnamat` are log2 transformed or not.
`resscale`	For each sample, whether its cell contents result should be scaled so that the sum of different cell types is 1. Default is FALSE.
`threads`	Number of threads need to be used to do the computation. Its default value is 1.
`lassoerrortype`	The base learners of the bagging model to deconvolve the DNA methylation data are LASSO models and the lambda value for each of them (regularization coefficient) is selected from a grid search. This parameter is used to determine whether the lambda value should be the one giving the minimum cross-validation error (set it as "min"), or the one giving an error within 1 standard error of the minimum (set it as "1se"). Default is "min".
`targetmethyldat`	The target cell mixture methylation data need to be deconvolved. Should be a matrix with each column representing one sample and each row for one feature. Row names are feature names and column names are sample IDs. It is recommended to adjust the batch difference between this dataset and `methylmat` with `ComBat` in advance, and using `methylmat` as the reference batch when adjusting, so that the cell deconvolution model trained from `methylmat` can be transferred to these data with the influence from batch difference minimized. The default value of this parameter is NULL, and it won't influence the deconvolution model training, and the model returned by this function can still be used on other cell mixture data via the function `methylpredict`.
`plot`	Whether generate box plots, heatmaps, and scatter plots for the deconvolution results for the paired RNA data, paired methylaiton data, and target methylation data. Default is FALSE.
`pddat`	If set `plot` as TRUE, this parameter can be used to show the sample group information of the paired bulk RNA-bulk DNA methylation data, so that their box plots will also compare the group difference for each cell type, and heatmaps with this comparison will also be generated. It should be a data frame recording the sample groups, and must include 2 columns. One is named as "sampleid", recording the sample IDs same as the column names of `rnamat` and `methylmat`, the other column is "Samplegroup", recording the sample group to which each sample belongs. It can also be NULL, meaning all the samples are from the same group.
`targetmethylpddat`	If `plot` is TRUE, and `targetmethyldat` is also provided, this parameter can be used to indicate the sample group information of the target DNA methylaiton data, its format requirment and effect are similar to `pddat` on the paired dataset.
`leanernum`	The base leaner number for the bagging model to deconvolve DNA methylation data. Default is 10.

Value

A list containing several slots recording the deconvolution results for the paired RNA and paired DNA methylation data (slots "rnacellconts" and "methylcellconts"), the base leaners of the cell deconvolution model (slots "modellist" and "modelcoeflist"), the weights of the base learners (slots "normweights" and "weights"), the gene subsets used by each RNA data deconvolution base learner (slot "rnageneidxlist"), and the paired RNA-methylation sample cell contents correlation (expressed as R square) deconvolved by each base learner (slot "rnamethylsqrs"). If the target DNA methylation data is provided to the parameter targetmethyldat, a slot recording its cell contents result predicted by the model will also be returned (slot "methyltargetcellcounts").

yuabrahamliu/scDeconv documentation built on March 28, 2024, 3:15 p.m.

yuabrahamliu/scDeconv index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yuabrahamliu/scDeconv
scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

epDeconv: Deconvolve bulk DNA methylation data using RNA reference
In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

Deconvolve bulk DNA methylation data using RNA reference

Description

Usage

Arguments

Value

Related to epDeconv in yuabrahamliu/scDeconv...

R Package Documentation

Browse R Packages

We want your feedback!

yuabrahamliu/scDeconv scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

epDeconv: Deconvolve bulk DNA methylation data using RNA reference In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

Deconvolve bulk DNA methylation data using RNA reference

Description

Usage

Arguments

Value

Related to epDeconv in yuabrahamliu/scDeconv...

R Package Documentation

Browse R Packages

We want your feedback!

yuabrahamliu/scDeconv
scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.

epDeconv: Deconvolve bulk DNA methylation data using RNA reference
In yuabrahamliu/scDeconv: scDeconv is an R Package to Deconvolve Bulk DNA Methylation Data with scRNA-seq Data in a Multi-omics Manner.