treecor_deg | R Documentation |
Constructs sample-level normalized pseudobulk gene expression matrix and uses LIMMA to identify differentially expressed genes (DEGs) for each cell cluster.
treecor_deg(
expr,
hierarchy_list,
cell_meta,
sample_meta,
response_variable,
separate = T,
weight = NULL,
formula = NULL,
coef = 2,
fdr_cutoff = 0.05,
filter_prop = 0.1,
pseudobulk_list = NULL,
ncores = parallel::detectCores(),
save_as_csv = T,
verbose = T
)
expr |
A raw count gene expression matrix with genes on rows and cells on columns. Note that cell barcode shall use ':' to separate sample name and barcode (i.e. "sample:barcode") |
hierarchy_list |
A hierarchy list by running |
cell_meta |
Cell-level metadata, where each row is a cell. Must contain these columns: barcode, celltype and sample. |
sample_meta |
Sample-level metadata, where each row is a sample. Must contain 'sample' column and additional variables to be used in the analysis, such as covariates or outcomes of interest. |
response_variable |
A vector of response variables. |
separate |
A TRUE (default) or FALSE indicator, specifying how to evaluate multivariate outcomes.
|
weight |
A weight matrix to combine multivariate phenotype. The dimension should be number_phenotype * 1 If none is provided, then PC1 will be used as a joint univariate phenotype. |
formula |
An object of class 'formula': a symbolic description of adjustment formula (i.e. only includes covariates other than response variable(s)) |
coef |
A column number or column name specifying which coefficient to be extracted (by default: 2). |
fdr_cutoff |
Cutoff value for FDR. Only genes with lower FDR are listed. Default is 0.05. |
filter_prop |
A number ranges from 0 to 1, to filter low expressed genes across samples (by default: 0.1). Genes with at least this proportion of samples with log2-normalized count greater than 0.01 are retained. |
pseudobulk_list |
A list of sample-level (adjusted) pseudobulk for each node. Default is NULL. Users can provide their processed pseudobulk list (e.g. after covariate adjustment) via this parameter. Note that the names of list shall be matched with |
ncores |
Number of cores to be used. If ncores > 1, it will be implemented in a parallel mode. |
save_as_csv |
An indicator to save identified DEGs in csv files. DEGs for each cell cluster is saved as 'responsevariable_celltype_DEG.csv' and a summary file of combining DEGs from all cell clusters is saved as 'responsevariable_combinedDEG.csv'. |
verbose |
Show progress |
A list of three elements:
dge.summary: A summary table of number of DEGs for each tree node.
dge.ls: A comprehensive list of outcome(s)-associated DEGs for each tree node. Use `result$dge.ls$response_variable[[celltype]]`
to extract DEGs for a specific cell type
pseudobulk.ls: A list of sample-level pseudobulk gene expression matrix for each cell cluster. Use `result$pseudobulk.ls[[celltype]]`
to extract.
Boyang Zhang <bzhang34@jhu.edu>, Hongkai Ji
# default setting
result <- treecor_deg(expr,hierarchy_list, cell_meta, sample_meta, response_variable = 'severity')
# obtain summary table
result$dge.summary
# extract DEGs of severity in all cell types
result$dge.ls$severity
# extract DEGs for celltype 'T'
result$dge.ls$severity[['T']]
# extract sample-level pseudobulk for all cell clusters
result$pseudobulk.ls
# extract sample-level pseudobulk for celltype 'T'
result$pseudobulk.ls[['T']]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.