Description Usage Arguments Value
View source: R/cellTypeAssignSCRNA.R
cellTypeAssignSCRNA
assigns CDSeq-identified cell types using single cell RNAseq data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | cellTypeAssignSCRNA(
cdseq_gep = NULL,
cdseq_prop = NULL,
cdseq_gep_sample_specific = NULL,
sc_gep = NULL,
sc_annotation = NULL,
sc_batch = NULL,
batch_correction = 1,
harmony_iter = 10,
harmony_cluster = 20,
nb_size = NULL,
nb_mu = NULL,
corr_threshold = 0,
breaksList = seq(0, 1, 0.01),
pseudo_cell_count = 1,
seurat_count_threshold = 0,
seurat_scale_factor = 10000,
seurat_norm_method = "LogNormalize",
seurat_select_method = "vst",
seurat_nfeatures = 1000,
seurat_npcs = 30,
seurat_dims = 1:30,
seurat_reduction = "pca",
seurat_resolution = 0.8,
seurat_find_marker = FALSE,
seurat_DE_test = "wilcox",
seurat_DE_logfc = 0.25,
seurat_top_n_markers = 10,
sc_pt_size = 1,
cdseq_pt_size = 3,
plot_umap = 1,
plot_tsne = 1,
plot_per_sample = 0,
fig_save = 0,
fig_path = getwd(),
fig_name = "CDSeqCellTypeAssignSCRNA",
fig_format = "jpeg",
fig_dpi = 100,
corr_heatmap_fontsize = 10,
verbose = TRUE
)
|
cdseq_gep |
CDSeq-estimated gene expression profile matrix with G rows (genes) and T columns (cell types). |
cdseq_prop |
CDSeq-estimated sample-specific cell-type proportion, a matrix with T rows (cell type) and M (sample size). |
cdseq_gep_sample_specific |
CDSeq-estimated sample-specific cell type gene expression, in the form of read counts. It is a 3 dimension array, i.e. gene by sample by cell type. The element cdseq_gep_sample_specific[i,j,k] represents the reads mapped to gene i from cell type k in sample j. |
sc_gep |
a G (genes) by N (cell) matrix or dataframe that contains the gene expression profile for N single cells. |
sc_annotation |
a dataframe contains two columns "cell_id" and "cell_type". cell_id needs to match with the cell_id in sc_gep but not required to have the same size. cell_type is the cell type annotation for the single cells. |
sc_batch |
a vector contains batch information of single cell data, i.e. sc_gep, and length(sc_batch) = ncol(sc_gep). |
batch_correction |
perform Harmony batch correction if it is 1. |
harmony_iter |
Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step. |
harmony_cluster |
Maximum number of rounds to run clustering at each round of Harmony. |
nb_size |
size parameter for negative binomial distribution, check rnbinom for details. |
nb_mu |
mu parameter for negative binomial distribution, check rnbinom for details. |
corr_threshold |
if the correlation between CDSeq-estimated GEPs and the scRNAseq GEP is below this value, then it is considered the two cell types are not matching. |
breaksList |
parameter for pheatmap controling the color scale. See pheatmap function for details. |
pseudo_cell_count |
an integer indicating how many pseudo cells will be generated from CDSeq-estimated cell-type-specific gene expression profiles. Default values is 1. |
seurat_count_threshold |
this parameter will be passed to Seurat subset function (subset = nCount_RNA > seurat_count_threshold) for filtering out single cells whose total counts is less this threshold. |
seurat_scale_factor |
this parameter will be passed to scale.factor in Seurat function NormalizeData. |
seurat_norm_method |
this parameter will be passed to normalization.method in Seurat function NormalizeData. |
seurat_select_method |
this parameter will be passed to selection.method in Seurat function FindVariableFeatures |
seurat_nfeatures |
this parameter will be passed to nfeatures in Seurat function FindVariableFeatures. |
seurat_npcs |
this parameter will be passed to npcs in Seurat function RunPCA. |
seurat_dims |
this parameter will be passed to dims in Seurat function FindNeighbors. |
seurat_reduction |
this parameter will be passed to reduction in Seurat function FindNeighbors. |
seurat_resolution |
this parameter will be passed to resolution in Seurat function FindClusters. |
seurat_find_marker |
this parameter controls if run seurat FindMarker function, default is FALSE. |
seurat_DE_test |
this parameter will be passed to test.use in Seurat function FindAllMarkers. |
seurat_DE_logfc |
this parameter will be passed to logfc.threshold in Seurat function FindAllMarkers. |
seurat_top_n_markers |
the number of top DE markers saved from Seurat output. |
sc_pt_size |
point size of single cell data in umap and tsne plots |
cdseq_pt_size |
point size of CDSeq-estimated cell types in umap and tsne plots |
plot_umap |
set 1 to plot umap figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise. |
plot_tsne |
set 1 to plot tsne figure of scRNAseq and CDSeq-estimated cell types, 0 otherwise. |
plot_per_sample |
currently disabled for debugging |
fig_save |
1 or 0. 1 means save figures to local and 0 means do not save figures to local. |
fig_path |
the location where the heatmap figure is saved. |
fig_name |
the name of umap and tsne figures. Umap figure will have the name of fig_name_umap_date and tsne figure will be named fig_name_tsne_date. |
fig_format |
"pdf", "jpeg", or "png". |
fig_dpi |
figure dpi |
corr_heatmap_fontsize |
font size of the correlation heatmap between scRNAseq GEP and CDSeq-estimated GEPs. |
verbose |
if TRUE, some calculation information will be print. |
cellTypeAssignSCRNA returns a list containing following fields: fig_path: same as the input fig_path
fig_name: same as the input fig_name
cdseq_synth_scRNA: synthetic scRNAseq data generated using CDSeq-estiamted GEPs
cdseq_scRNA_umap: ggplot figure of the umap outcome
cdseq_scRNA_tsne: ggplot figure of the tsne outcome
cdseq_synth_scRNA_seurat: Seurat object containing the scRNAseq combined with CDSeq-estimated cell types. Cell id for CDSeq-estimated cell types start with "CDSeq".
seurat_cluster_purity: for all cells in a Seurat cluster i, the ith value in seurat_cluster_purity is the proportion of the mostly repeated cell annotation from sc_annotation. For example, after Seurat clustering, suppose there are 100 cells in cluster 1, out of these 100 cells, 90 cells' annotation in sc_annotation is cell type A, then the fist value in seurat_cluster_purity is 0.9. This output can be used to assess the agreement between Seurat clustering and the given sc_annotation.
seurat_unique_clusters: Unique Seurat cluster numbering. This can be used together with seurat_cluster_gold_label to match the Seurat clusters with given annotations.
seurat_cluster_gold_label: The cell type annotations for each unique Seurat cluster based on sc_annotation.
seurat_markers: DE genes for each Seurat cluster.
seurat_top_markers: Top seurat_top_n_markers DE genes for each Seurat cluster.
CDSeq_cell_type_assignment_df: cell type assignment for CDSeq-estimated cell types.
CDSeq_cell_type_assignment_confidence: cell type assignment confidence matrix, only available when pseudo_cell_count > 1.
CDSeq_cell_type_assignment_df_all: cell type assignment for CDSeq-estimated cell types, only available when pseudo_cell_count > 1.
cdseq_prop_merged: CDSeq-estimated cell type proportions with cell type annotations (annotated using clustering with scRNAseq).
cdseq_gep_sample_specific_merged: sample-specific cell-type read counts. It is a 3d array with dimensions: gene, sample, cell type.
input_list: values for input parameters
cdseq_sc_comb_umap_df: dataframe for umap plot
cdseq_sc_comb_tsne_df: dataframe for tsne plot
cdseq_prop_merged_byCorr: CDSeq-estimated cell type proportions with cell type annotations (annotated using correlation with scRNAseq).
cdseq_gep_merged_byCorr: CDSeq-estimated cell-type-specific GEPs with cell type annotations (annotated using correlation with scRNAseq).
cdseq_annotation_byCorr: CDSeq-estimated cell type annotations (annotated using correlation with scRNAseq)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.