deconvolute_cdseq: CDSeq Deconvolution

View source: R/CDSeq.R

deconvolute_cdseqR Documentation

CDSeq Deconvolution

Description

This function is to calculate the CDSeq deconvolution proportions. IMPORTANT: No model is needed. Everything is done inside this method. IMPORTANT: The result does not necessarily contain all cell types from the input single cell data. It assigns cell types to clusters found in the bulk data. See cellTypeAssignSCRNA for more information.

Usage

deconvolute_cdseq(
  bulk_gene_expression,
  single_cell_object,
  cell_type_annotations,
  batch_ids,
  beta = 0.5,
  alpha = 5,
  cell_type_number = NULL,
  mcmc_iterations = 700,
  dilution_factor = 1,
  gene_subset_size = NULL,
  block_number = 1,
  no_cores = NULL,
  gene_length = NULL,
  reference_gep = NULL,
  print_progress_msg_to_file = 0,
  cdseq_gep_sample_specific = NULL,
  batch_correction = 1,
  harmony_iter = 10,
  harmony_cluster = 20,
  nb_size = NULL,
  nb_mu = NULL,
  corr_threshold = 0,
  breaksList = seq(0, 1, 0.01),
  pseudo_cell_count = 1,
  seurat_count_threshold = 0,
  seurat_scale_factor = 10000,
  seurat_norm_method = "LogNormalize",
  seurat_select_method = "vst",
  seurat_nfeatures = 1000,
  seurat_npcs = 30,
  seurat_dims = 1:30,
  seurat_reduction = "pca",
  seurat_resolution = 0.8,
  seurat_find_marker = FALSE,
  seurat_DE_test = "wilcox",
  seurat_DE_logfc = 0.25,
  seurat_top_n_markers = 10,
  sc_pt_size = 1,
  cdseq_pt_size = 3,
  plot_umap = 0,
  plot_tsne = 0,
  plot_per_sample = 0,
  fig_save = 0,
  fig_path = getwd(),
  fig_name = "CDSeqCellTypeAssignSCRNA",
  fig_format = "jpeg",
  fig_dpi = 100,
  corr_heatmap_fontsize = 10,
  verbose = FALSE
)

Arguments

bulk_gene_expression

A matrix or dataframe with the bulk data. Rows are genes, columns are samples.

single_cell_object

A Matrix with the single-cell data. Rows are genes and columns are samples.

cell_type_annotations

A Vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object.

batch_ids

A vector of the ids of the samples or individuals.

beta

Beta is a scalar or a vector of length G where G is the number of genes; default value for beta is 0.5; When beta=Null, CDSeq uses reference_gep to estimate beta.

alpha

Alpha is a scalar or a vector of length cell_type_number where cell_type_number is the number of cell type; default value for alpha is 5.

cell_type_number

Number of cell types. cell_type_number can be an integer or a vector of different integers. To estimate the number of cell types, please provide a vector for cell_type_number, e.g. cell_type_number <- 2:30, then CDSeq will estimate the number of cell types.

mcmc_iterations

Number of iterations for the Gibbs sampler; default value is 700.

dilution_factor

A scalar to dilute the read counts for speeding up; default value is 1. CDSeq will use bulk_data/dilution_factor.

gene_subset_size

Number of genes randomly sampled for each block. Default is NULL.

block_number

Number of genes randomly sampled for each block. Default is 1.

no_cores

Number of cpu cores that can be used for parellel computing; Default is NULL and CDSeq will detect the available number of cores on the device and use number of all cores - 1 for parallel computing.

gene_length

A vector of the effective length (gene length - read length + 1) of each gene; Default is NULL.

reference_gep

A reference gene expression profile can be used to determine the cell type and/or estimate beta; Default is NULL.

print_progress_msg_to_file

Print progress message to a text file. Set 1 if need to print progress msg to a file and set 0 if no printing. Default is 0.

cdseq_gep_sample_specific

CDSeq-estimated sample-specific cell type gene expression, in the form of read counts. It is a 3 dimension array, i.e. gene by sample by cell type. The element cdseq_gep_sample_specific[i,j,k] represents the reads mapped to gene i from cell type k in sample j.

batch_correction

perform Harmony batch correction if it is 1.

harmony_iter

Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.

harmony_cluster

Maximum number of rounds to run clustering at each round of Harmony.

nb_size

size parameter for negative binomial distribution, check rnbinom for details.

nb_mu

mu parameter for negative binomial distribution, check rnbinom for details.

corr_threshold

if the correlation between CDSeq-estimated GEPs and the scRNAseq GEP is below this value, then it is considered the two cell types are not matching.

breaksList

parameter for pheatmap controling the color scale. See pheatmap function for details.

pseudo_cell_count

an integer indicating how many pseudo cells will be generated from CDSeq-estimated cell-type-specific gene expression profiles. Default values is 1.

seurat_count_threshold

this parameter will be passed to Seurat subset function (subset = nCount_RNA > seurat_count_threshold) for filtering out single cells whose total counts is less this threshold.

seurat_scale_factor

this parameter will be passed to scale.factor in Seurat function NormalizeData.

seurat_norm_method

this parameter will be passed to normalization.method in Seurat function NormalizeData.

seurat_select_method

this parameter will be passed to selection.method in Seurat function FindVariableFeatures

seurat_nfeatures

this parameter will be passed to nfeatures in Seurat function FindVariableFeatures.

seurat_npcs

this parameter will be passed to npcs in Seurat function RunPCA.

seurat_dims

this parameter will be passed to dims in Seurat function FindNeighbors.

seurat_reduction

this parameter will be passed to reduction in Seurat function FindNeighbors.

seurat_resolution

this parameter will be passed to resolution in Seurat function FindClusters.

seurat_find_marker

this parameter controls if run seurat FindMarker function, default is FALSE.

seurat_DE_test

this parameter will be passed to test.use in Seurat function FindAllMarkers.

seurat_DE_logfc

this parameter will be passed to logfc.threshold in Seurat function FindAllMarkers.

seurat_top_n_markers

the number of top DE markers saved from Seurat output.

sc_pt_size

point size of single cell data in umap and tsne plots

cdseq_pt_size

point size of CDSeq-estimated cell types in umap and tsne plots

plot_umap

set 1 to plot umap figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.

plot_tsne

set 1 to plot tsne figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.

plot_per_sample

currently disabled for debugging

fig_save

1 or 0. 1 means save figures to local and 0 means do not save figures to local.

fig_path

the location where the heatmap figure is saved.

fig_name

the name of umap and tsne figures. Umap figure will have the name of fig_name_umap_date and tsne figure will be named fig_name_tsne_date.

fig_format

"pdf", "jpeg", or "png".

fig_dpi

figure dpi

corr_heatmap_fontsize

font size of the correlation heatmap between scRNAseq GEP and CDSeq-estimated GEPs.

verbose

Whether to produce an output on the console

Value

A list including:

fig_path

The same as the input fig_path.

fig_name

The same as the input fig_name.

cdseq_synth_scRNA

The synthetic scRNAseq data generated using CDSeq-estiamted GEPs.

cdseq_scRNA_umap

A ggplot figure of the umap outcome.

cdseq_scRNA_tsne

A ggplot figure of the tsne outcome.

cdseq_synth_scRNA_seurat

A Seurat object containing the scRNAseq combined with CDSeq-estimated cell types. Cell id for CDSeq-estimated cell types start with "CDSeq".

seurat_cluster_purity

For all cells in a Seurat cluster i, the ith value in seurat_cluster_purity is the proportion of the mostly repeated cell annotation from sc_annotation. For example, after Seurat clustering, suppose there are 100 cells in cluster 1, out of these 100 cells, 90 cells' annotation in sc_annotation is cell type A, then the fist value in seurat_cluster_purity is 0.9. This output can be used to assess the agreement between Seurat clustering and the given sc_annotation.

seurat_unique_clusters

Unique Seurat cluster numbering. This can be used together with seurat_cluster_gold_label to match the Seurat clusters with given annotations.

seurat_cluster_gold_label

The cell type annotations for each unique Seurat cluster based on sc_annotation.

seurat_markers

DE genes for each Seurat cluster.

seurat_top_markers

Top seurat_top_n_markers DE genes for each Seurat cluster.

CDSeq_cell_type_assignment_df

The cell type assignment for CDSeq-estimated cell types.

CDSeq_cell_type_assignment_confidence

The cell type assignment confidence matrix, only available when pseudo_cell_count > 1.

CDSeq_cell_type_assignment_df_all

The cell type assignment for CDSeq-estimated cell types, only available when pseudo_cell_count > 1.

cdseq_prop_merged

CDSeq-estimated cell type proportions with cell type annotations (annotated using clustering with scRNAseq).

cdseq_gep_sample_specific_merged

Sample-specific cell-type read counts. It is a 3d array with dimensions: gene, sample, cell type.

input_list

The values of the input parameters.

cdseq_sc_comb_umap_df

The dataframe for umap plot.

cdseq_sc_comb_tsne_df

The dataframe for tsne plot.

cdseq_prop_merged_byCorr

CDSeq-estimated cell type proportions with cell type annotations (annotated using correlation with scRNAseq).

cdseq_gep_merged_byCorr

CDSeq-estimated cell-type-specific GEPs with cell type annotations (annotated using correlation with scRNAseq).

cdseq_annotation_byCorr

CDSeq-estimated cell type annotations (annotated using correlation with scRNAseq).


PelzKo/immunedeconv2 documentation built on Feb. 12, 2025, 4:16 p.m.