cellTypeAssignSCRNA: 'cellTypeAssignSCRNA' assigns CDSeq-identified cell types...

Description Usage Arguments Value

View source: R/cellTypeAssignSCRNA.R

Description

cellTypeAssignSCRNA assigns CDSeq-identified cell types using single cell RNAseq data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
cellTypeAssignSCRNA(
  cdseq_gep = NULL,
  cdseq_prop = NULL,
  cdseq_gep_sample_specific = NULL,
  sc_gep = NULL,
  sc_annotation = NULL,
  sc_batch = NULL,
  batch_correction = 1,
  harmony_iter = 10,
  harmony_cluster = 20,
  nb_size = NULL,
  nb_mu = NULL,
  corr_threshold = 0,
  breaksList = seq(0, 1, 0.01),
  pseudo_cell_count = 1,
  seurat_count_threshold = 0,
  seurat_scale_factor = 10000,
  seurat_norm_method = "LogNormalize",
  seurat_select_method = "vst",
  seurat_nfeatures = 1000,
  seurat_npcs = 30,
  seurat_dims = 1:30,
  seurat_reduction = "pca",
  seurat_resolution = 0.8,
  seurat_find_marker = FALSE,
  seurat_DE_test = "wilcox",
  seurat_DE_logfc = 0.25,
  seurat_top_n_markers = 10,
  sc_pt_size = 1,
  cdseq_pt_size = 3,
  plot_umap = 1,
  plot_tsne = 1,
  plot_per_sample = 0,
  fig_save = 0,
  fig_path = getwd(),
  fig_name = "CDSeqCellTypeAssignSCRNA",
  fig_format = "jpeg",
  fig_dpi = 100,
  corr_heatmap_fontsize = 10,
  verbose = TRUE
)

Arguments

cdseq_gep

CDSeq-estimated gene expression profile matrix with G rows (genes) and T columns (cell types).

cdseq_prop

CDSeq-estimated sample-specific cell-type proportion, a matrix with T rows (cell type) and M (sample size).

cdseq_gep_sample_specific

CDSeq-estimated sample-specific cell type gene expression, in the form of read counts. It is a 3 dimension array, i.e. gene by sample by cell type. The element cdseq_gep_sample_specific[i,j,k] represents the reads mapped to gene i from cell type k in sample j.

sc_gep

a G (genes) by N (cell) matrix or dataframe that contains the gene expression profile for N single cells.

sc_annotation

a dataframe contains two columns "cell_id" and "cell_type". cell_id needs to match with the cell_id in sc_gep but not required to have the same size. cell_type is the cell type annotation for the single cells.

sc_batch

a vector contains batch information of single cell data, i.e. sc_gep, and length(sc_batch) = ncol(sc_gep).

batch_correction

perform Harmony batch correction if it is 1.

harmony_iter

Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.

harmony_cluster

Maximum number of rounds to run clustering at each round of Harmony.

nb_size

size parameter for negative binomial distribution, check rnbinom for details.

nb_mu

mu parameter for negative binomial distribution, check rnbinom for details.

corr_threshold

if the correlation between CDSeq-estimated GEPs and the scRNAseq GEP is below this value, then it is considered the two cell types are not matching.

breaksList

parameter for pheatmap controling the color scale. See pheatmap function for details.

pseudo_cell_count

an integer indicating how many pseudo cells will be generated from CDSeq-estimated cell-type-specific gene expression profiles. Default values is 1.

seurat_count_threshold

this parameter will be passed to Seurat subset function (subset = nCount_RNA > seurat_count_threshold) for filtering out single cells whose total counts is less this threshold.

seurat_scale_factor

this parameter will be passed to scale.factor in Seurat function NormalizeData.

seurat_norm_method

this parameter will be passed to normalization.method in Seurat function NormalizeData.

seurat_select_method

this parameter will be passed to selection.method in Seurat function FindVariableFeatures

seurat_nfeatures

this parameter will be passed to nfeatures in Seurat function FindVariableFeatures.

seurat_npcs

this parameter will be passed to npcs in Seurat function RunPCA.

seurat_dims

this parameter will be passed to dims in Seurat function FindNeighbors.

seurat_reduction

this parameter will be passed to reduction in Seurat function FindNeighbors.

seurat_resolution

this parameter will be passed to resolution in Seurat function FindClusters.

seurat_find_marker

this parameter controls if run seurat FindMarker function, default is FALSE.

seurat_DE_test

this parameter will be passed to test.use in Seurat function FindAllMarkers.

seurat_DE_logfc

this parameter will be passed to logfc.threshold in Seurat function FindAllMarkers.

seurat_top_n_markers

the number of top DE markers saved from Seurat output.

sc_pt_size

point size of single cell data in umap and tsne plots

cdseq_pt_size

point size of CDSeq-estimated cell types in umap and tsne plots

plot_umap

set 1 to plot umap figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.

plot_tsne

set 1 to plot tsne figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise.

plot_per_sample

currently disabled for debugging

fig_save

1 or 0. 1 means save figures to local and 0 means do not save figures to local.

fig_path

the location where the heatmap figure is saved.

fig_name

the name of umap and tsne figures. Umap figure will have the name of fig_name_umap_date and tsne figure will be named fig_name_tsne_date.

fig_format

"pdf", "jpeg", or "png".

fig_dpi

figure dpi

corr_heatmap_fontsize

font size of the correlation heatmap between scRNAseq GEP and CDSeq-estimated GEPs.

verbose

if TRUE, some calculation information will be print.

Value

cellTypeAssignSCRNA returns a list containing following fields: fig_path: same as the input fig_path

fig_name: same as the input fig_name

cdseq_synth_scRNA: synthetic scRNAseq data generated using CDSeq-estiamted GEPs

cdseq_scRNA_umap: ggplot figure of the umap outcome

cdseq_scRNA_tsne: ggplot figure of the tsne outcome

cdseq_synth_scRNA_seurat: Seurat object containing the scRNAseq combined with CDSeq-estimated cell types. Cell id for CDSeq-estimated cell types start with "CDSeq".

seurat_cluster_purity: for all cells in a Seurat cluster i, the ith value in seurat_cluster_purity is the proportion of the mostly repeated cell annotation from sc_annotation. For example, after Seurat clustering, suppose there are 100 cells in cluster 1, out of these 100 cells, 90 cells' annotation in sc_annotation is cell type A, then the fist value in seurat_cluster_purity is 0.9. This output can be used to assess the agreement between Seurat clustering and the given sc_annotation.

seurat_unique_clusters: Unique Seurat cluster numbering. This can be used together with seurat_cluster_gold_label to match the Seurat clusters with given annotations.

seurat_cluster_gold_label: The cell type annotations for each unique Seurat cluster based on sc_annotation.

seurat_markers: DE genes for each Seurat cluster.

seurat_top_markers: Top seurat_top_n_markers DE genes for each Seurat cluster.

CDSeq_cell_type_assignment_df: cell type assignment for CDSeq-estimated cell types.

CDSeq_cell_type_assignment_confidence: cell type assignment confidence matrix, only available when pseudo_cell_count > 1.

CDSeq_cell_type_assignment_df_all: cell type assignment for CDSeq-estimated cell types, only available when pseudo_cell_count > 1.

cdseq_prop_merged: CDSeq-estimated cell type proportions with cell type annotations (annotated using clustering with scRNAseq).

cdseq_gep_sample_specific_merged: sample-specific cell-type read counts. It is a 3d array with dimensions: gene, sample, cell type.

input_list: values for input parameters

cdseq_sc_comb_umap_df: dataframe for umap plot

cdseq_sc_comb_tsne_df: dataframe for tsne plot

cdseq_prop_merged_byCorr: CDSeq-estimated cell type proportions with cell type annotations (annotated using correlation with scRNAseq).

cdseq_gep_merged_byCorr: CDSeq-estimated cell-type-specific GEPs with cell type annotations (annotated using correlation with scRNAseq).

cdseq_annotation_byCorr: CDSeq-estimated cell type annotations (annotated using correlation with scRNAseq)


kkang7/CDSeq_R_Package documentation built on May 4, 2021, 8:12 p.m.