deconvolute_cdseq: CDSeq Deconvolution
In PelzKo/immunedeconv2: Second generation methods for immune cell deconvolution

deconvolute_cdseq

R Documentation

CDSeq Deconvolution

Description

This function is to calculate the CDSeq deconvolution proportions. IMPORTANT: No model is needed. Everything is done inside this method. IMPORTANT: The result does not necessarily contain all cell types from the input single cell data. It assigns cell types to clusters found in the bulk data. See cellTypeAssignSCRNA for more information.

Usage

deconvolute_cdseq(
  bulk_gene_expression,
  single_cell_object,
  cell_type_annotations,
  batch_ids,
  beta = 0.5,
  alpha = 5,
  cell_type_number = NULL,
  mcmc_iterations = 700,
  dilution_factor = 1,
  gene_subset_size = NULL,
  block_number = 1,
  no_cores = NULL,
  gene_length = NULL,
  reference_gep = NULL,
  print_progress_msg_to_file = 0,
  cdseq_gep_sample_specific = NULL,
  batch_correction = 1,
  harmony_iter = 10,
  harmony_cluster = 20,
  nb_size = NULL,
  nb_mu = NULL,
  corr_threshold = 0,
  breaksList = seq(0, 1, 0.01),
  pseudo_cell_count = 1,
  seurat_count_threshold = 0,
  seurat_scale_factor = 10000,
  seurat_norm_method = "LogNormalize",
  seurat_select_method = "vst",
  seurat_nfeatures = 1000,
  seurat_npcs = 30,
  seurat_dims = 1:30,
  seurat_reduction = "pca",
  seurat_resolution = 0.8,
  seurat_find_marker = FALSE,
  seurat_DE_test = "wilcox",
  seurat_DE_logfc = 0.25,
  seurat_top_n_markers = 10,
  sc_pt_size = 1,
  cdseq_pt_size = 3,
  plot_umap = 0,
  plot_tsne = 0,
  plot_per_sample = 0,
  fig_save = 0,
  fig_path = getwd(),
  fig_name = "CDSeqCellTypeAssignSCRNA",
  fig_format = "jpeg",
  fig_dpi = 100,
  corr_heatmap_fontsize = 10,
  verbose = FALSE
)

Arguments

`bulk_gene_expression`	A matrix or dataframe with the bulk data. Rows are genes, columns are samples.
`single_cell_object`	A Matrix with the single-cell data. Rows are genes and columns are samples.
`cell_type_annotations`	A Vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object.
`batch_ids`	A vector of the ids of the samples or individuals.
`beta`	Beta is a scalar or a vector of length G where G is the number of genes; default value for beta is 0.5; When beta=Null, CDSeq uses reference_gep to estimate beta.
`alpha`	Alpha is a scalar or a vector of length cell_type_number where cell_type_number is the number of cell type; default value for alpha is 5.
`cell_type_number`	Number of cell types. cell_type_number can be an integer or a vector of different integers. To estimate the number of cell types, please provide a vector for cell_type_number, e.g. cell_type_number <- 2:30, then CDSeq will estimate the number of cell types.
`mcmc_iterations`	Number of iterations for the Gibbs sampler; default value is 700.
`dilution_factor`	A scalar to dilute the read counts for speeding up; default value is 1. CDSeq will use bulk_data/dilution_factor.
`gene_subset_size`	Number of genes randomly sampled for each block. Default is NULL.
`block_number`	Number of genes randomly sampled for each block. Default is 1.
`no_cores`	Number of cpu cores that can be used for parellel computing; Default is NULL and CDSeq will detect the available number of cores on the device and use number of all cores - 1 for parallel computing.
`gene_length`	A vector of the effective length (gene length - read length + 1) of each gene; Default is NULL.
`reference_gep`	A reference gene expression profile can be used to determine the cell type and/or estimate beta; Default is NULL.
`print_progress_msg_to_file`	Print progress message to a text file. Set 1 if need to print progress msg to a file and set 0 if no printing. Default is 0.
`cdseq_gep_sample_specific`	CDSeq-estimated sample-specific cell type gene expression, in the form of read counts. It is a 3 dimension array, i.e. gene by sample by cell type. The element cdseq_gep_sample_specific[i,j,k] represents the reads mapped to gene i from cell type k in sample j.
`batch_correction`	perform Harmony batch correction if it is 1.
`harmony_iter`	Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.
`harmony_cluster`	Maximum number of rounds to run clustering at each round of Harmony.
`nb_size`	size parameter for negative binomial distribution, check rnbinom for details.
`nb_mu`	mu parameter for negative binomial distribution, check rnbinom for details.
`corr_threshold`	if the correlation between CDSeq-estimated GEPs and the scRNAseq GEP is below this value, then it is considered the two cell types are not matching.
`breaksList`	parameter for pheatmap controling the color scale. See pheatmap function for details.
`pseudo_cell_count`	an integer indicating how many pseudo cells will be generated from CDSeq-estimated cell-type-specific gene expression profiles. Default values is 1.
`seurat_count_threshold`	this parameter will be passed to Seurat subset function (subset = nCount_RNA > seurat_count_threshold) for filtering out single cells whose total counts is less this threshold.
`seurat_scale_factor`	this parameter will be passed to scale.factor in Seurat function NormalizeData.
`seurat_norm_method`	this parameter will be passed to normalization.method in Seurat function NormalizeData.
`seurat_select_method`	this parameter will be passed to selection.method in Seurat function FindVariableFeatures
`seurat_nfeatures`	this parameter will be passed to nfeatures in Seurat function FindVariableFeatures.
`seurat_npcs`	this parameter will be passed to npcs in Seurat function RunPCA.
`seurat_dims`	this parameter will be passed to dims in Seurat function FindNeighbors.
`seurat_reduction`	this parameter will be passed to reduction in Seurat function FindNeighbors.
`seurat_resolution`	this parameter will be passed to resolution in Seurat function FindClusters.
`seurat_find_marker`	this parameter controls if run seurat FindMarker function, default is FALSE.
`seurat_DE_test`	this parameter will be passed to test.use in Seurat function FindAllMarkers.
`seurat_DE_logfc`	this parameter will be passed to logfc.threshold in Seurat function FindAllMarkers.
`seurat_top_n_markers`	the number of top DE markers saved from Seurat output.
`sc_pt_size`	point size of single cell data in umap and tsne plots
`cdseq_pt_size`	point size of CDSeq-estimated cell types in umap and tsne plots
`plot_umap`	set 1 to plot umap figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.
`plot_tsne`	set 1 to plot tsne figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.
`plot_per_sample`	currently disabled for debugging
`fig_save`	1 or 0. 1 means save figures to local and 0 means do not save figures to local.
`fig_path`	the location where the heatmap figure is saved.
`fig_name`	the name of umap and tsne figures. Umap figure will have the name of fig_name_umap_date and tsne figure will be named fig_name_tsne_date.
`fig_format`	"pdf", "jpeg", or "png".
`fig_dpi`	figure dpi
`corr_heatmap_fontsize`	font size of the correlation heatmap between scRNAseq GEP and CDSeq-estimated GEPs.
`verbose`	Whether to produce an output on the console

Value

A list including:

`fig_path`	The same as the input fig_path.
`fig_name`	The same as the input fig_name.
`cdseq_synth_scRNA`	The synthetic scRNAseq data generated using CDSeq-estiamted GEPs.
`cdseq_scRNA_umap`	A ggplot figure of the umap outcome.
`cdseq_scRNA_tsne`	A ggplot figure of the tsne outcome.
`cdseq_synth_scRNA_seurat`	A Seurat object containing the scRNAseq combined with CDSeq-estimated cell types. Cell id for CDSeq-estimated cell types start with "CDSeq".
`seurat_cluster_purity`	For all cells in a Seurat cluster i, the ith value in seurat_cluster_purity is the proportion of the mostly repeated cell annotation from sc_annotation. For example, after Seurat clustering, suppose there are 100 cells in cluster 1, out of these 100 cells, 90 cells' annotation in sc_annotation is cell type A, then the fist value in seurat_cluster_purity is 0.9. This output can be used to assess the agreement between Seurat clustering and the given sc_annotation.
`seurat_unique_clusters`	Unique Seurat cluster numbering. This can be used together with seurat_cluster_gold_label to match the Seurat clusters with given annotations.
`seurat_cluster_gold_label`	The cell type annotations for each unique Seurat cluster based on sc_annotation.
`seurat_markers`	DE genes for each Seurat cluster.
`seurat_top_markers`	Top seurat_top_n_markers DE genes for each Seurat cluster.
`CDSeq_cell_type_assignment_df`	The cell type assignment for CDSeq-estimated cell types.
`CDSeq_cell_type_assignment_confidence`	The cell type assignment confidence matrix, only available when pseudo_cell_count > 1.
`CDSeq_cell_type_assignment_df_all`	The cell type assignment for CDSeq-estimated cell types, only available when pseudo_cell_count > 1.
`cdseq_prop_merged`	CDSeq-estimated cell type proportions with cell type annotations (annotated using clustering with scRNAseq).
`cdseq_gep_sample_specific_merged`	Sample-specific cell-type read counts. It is a 3d array with dimensions: gene, sample, cell type.
`input_list`	The values of the input parameters.
`cdseq_sc_comb_umap_df`	The dataframe for umap plot.
`cdseq_sc_comb_tsne_df`	The dataframe for tsne plot.
`cdseq_prop_merged_byCorr`	CDSeq-estimated cell type proportions with cell type annotations (annotated using correlation with scRNAseq).
`cdseq_gep_merged_byCorr`	CDSeq-estimated cell-type-specific GEPs with cell type annotations (annotated using correlation with scRNAseq).
`cdseq_annotation_byCorr`	CDSeq-estimated cell type annotations (annotated using correlation with scRNAseq).