{.tabset}

Overview

This stage build upon the 01_input_qc stage and performs perhaps the most important steps in scRNA-seq data analysis: normalization, highly variable gene selection, dimensionality reduction, clustering, cell type annotation, and visualization.

r emoji::emoji("gear") Config file: config/single_sample/02_norm_clustering.yaml

r emoji::emoji("clipboard") HTML report target (in config/pipeline.yaml): DRAKE_TARGETS: ["report_norm_clustering", "report_norm_clustering_simple"]

r emoji::emoji("scroll") Example report (used config)

r emoji::emoji("scroll") Example simplified report (used config)

r emoji::emoji("ladder") Structure

Config parameters

Config for this stage is stored in the config/single_sample/02_norm_clustering.yaml file. Directory with this file is read from SCDRAKE_SINGLE_SAMPLE_CONFIG_DIR environment variable upon {scdrake} load or attach, and saved as scdrake_single_sample_config_dir option. This option is used as the default argument value in several {scdrake} functions.


Spatial extension

SPATIAL: False

Type: logical scalar

If True, pipeline enables spatial extension.

This option enables spatial extension as selection of spatially variable genes (SVGS), pseudo tissue visualization and other visualization options for marker-based annotation.


Normalization parameters

NORMALIZATION_TYPE: "scran"

Type: character scalar ("scran" | "sctransform" | "none")

A normalization type:

See vignette("pipeline_overview") for more details about normalization methods.

scran normalization parameters

These parameters are used when NORMALIZATION_TYPE is "scran".


SCRAN_USE_QUICKCLUSTER: True

Type: logical scalar

Whether to cluster cells prior to normalization. See ?scran::quickCluster for more details.


SCRAN_QUICKCLUSTER_METHOD: "igraph"

Type: character scalar ("igraph" | "hclust")

A clustering method: "igraph" uses graph-based clustering, while "hclust" uses hierarchical clustering. See ?scran::quickCluster for more details.

sctransform normalization parameters

These parameters are used when NORMALIZATION_TYPE is "sctransform".


SCT_VARS_TO_REGRESS: null

Type: list of character scalar or null

Batch variables to regress out during the SCTransform normalization. You can regress out variables such as ["percent.mt", "nFeature_RNA", "nCount_RNA"] or cell cycle using ["S.Score", "G2M.Score"] or ["CC.Difference"].

However, for correction of cell cycle effect, a better method is the removal of cell cycle-related genes during selection of highly variable genes (HVGs). This can be set by HVG selection parameters, and the approach is explained in more details in vignette("pipeline_overview").


SCT_N_HVG: 3000

Type: integer scalar

A number of top HVGs to return from the SCTransform normalization.

Highly variable genes (HVGs) selection

HVG_METRIC: "gene_var"

Type: character scalar ("gene_var" | "gene_cv2" | "sctransform")

A metric used to find HVGs. See https://bioconductor.org/books/3.15/OSCA.basic/feature-selection.html for more details.


HVG_SELECTION: "top"

Type: character scalar ("top" | "significance" | "threshold")

A HVG selection strategy. Only relevant when the HVG_METRIC parameter is "gene_var" or "gene_cv2".

For "top" and "threshold", bio and ratio columns are used for HVG_METRIC of "gene_var" and "gene_cv2", respectively. These columns are present in the DataFrame returned from the underlying method (scran::modelGeneVar() or scran::modelGeneCV2()).


HVG_SELECTION_VALUE: 1000

Type: depends on value of the HVG_SELECTION parameter:

A threshold value for HVG selection. The following parameter combinations are recommended defaults:

HVG_SELECTION: "top"
HVG_SELECTION_VALUE: 1000
HVG_SELECTION: "significance"
HVG_SELECTION_VALUE: 0.05
HVG_METRIC: "gene_var"
HVG_SELECTION: "threshold"
HVG_SELECTION_VALUE: 0
HVG_METRIC: "gene_cv2"
HVG_SELECTION: "threshold"
HVG_SELECTION_VALUE: 1

Removal of cell cycle-related genes prior to HVG selection

HVG_RM_CC_GENES: False

Type: logical scalar

Whether to apply a correction for cell cycle prior to HVG selection. We implement a cell cycle-related genes removal strategy described here. It is based on the percentage of variance explained by the cell cycle phase in the expression profile for each gene.

Note that when SCTransform is used for normalization including cell cycle variables to regress out, than the results of this strategy remain unclear.


HVG_CC_GENES_VAR_EXPL_THRESHOLD: 5

Type: positive numeric scalar

A threshold on gene cell cycle variance explained, i.e. genes with var. expl. greater than this threshold will be marked as cell cycle-related and removed prior to selection of HVGs.

Doublet detection and filtering

Doublets are droplets in which multiple cells were captured.


MAX_DOUBLET_SCORE: 3.5

Type: positive numeric scalar or null

A threshold for doublet score computed by scDblFinder::computeDoubletDensity(). Cells with doublet score greater than the threshold will be removed. If null, doublets are not removed, but doublet score is still computed.

Dimensionality reduction

PCA
PCA_SELECTION_METHOD: "forced"

Type: character scalar ("forced" | "elbow" | "technical_noise")

A method to select first N principal components (PCs):


PCA_FORCED_PCS: 15

Type: positive integer scalar

If the PCA_SELECTION_METHOD parameter is "forced", use this number of first principal components.

t-SNE

t-stochastic neighbor embedding

See ?scater::runTSNE for the underlying method.


TSNE_PERP: 20

Type: numeric scalar

t-SNE perplexity.


TSNE_MAX_ITER: 1000

Type: positive integer scalar

A maximum number of t-SNE iterations.

Clustering

Graph-based clustering

Parameters for graph-based clustering.


CLUSTER_GRAPH_SNN_K: 10

Type: positive integer scalar

A number of shared nearest neighbors in graph used for community detection algorithms (graph-based clustering). Passed as the k parameter to bluster::makeSNNGraph(). From the function's help page:

The choice of k controls the connectivity of the graph and the resolution of community detection algorithms. Smaller values of k will generally yield smaller, finer clusters, while increasing k will increase the connectivity of the graph and make it more difficult to resolve different communities. The value of k can be roughly interpreted as the anticipated size of the smallest subpopulation.


CLUSTER_GRAPH_SNN_TYPE: "rank"

Type: character scalar ("rank" | "number" | "jaccard")

A type of weighting scheme to use for shared neighbors. Passed as type parameter to bluster::makeSNNGraph() (see the function's help page for more information).


CLUSTER_GRAPH_LEIDEN_ENABLED: True

Type: logical scalar

If True, enable Leiden community detection algorithm.


CLUSTER_GRAPH_LEIDEN_RESOLUTIONS: [0.2, 0.4, 0.6, 0.8, 1.0]

Type: numeric vector

Resolutions for Leiden clustering. Lower values result in more coarse-grained clustering, while higher ones to more fine-grained structures. Cell-cluster memberships will be named as cluster_graph_leiden_r{r} (e.g. in colData() of a SCE object).


CLUSTER_GRAPH_LOUVAIN_ENABLED: False

Type: logical scalar

If True, enable Louvain community detection algorithm.


CLUSTER_GRAPH_LOUVAIN_RESOLUTIONS: [0.2, 0.4, 0.6, 0.8, 1.0]

Type: numeric vector

Resolutions for Louvain clustering. Lower values result in more coarse-grained clustering, while higher ones to more fine-grained structures. Cell-cluster memberships will be named as cluster_graph_louvain_r{r} (e.g. in colData() of a SCE object).


CLUSTER_GRAPH_WALKTRAP_ENABLED: False

Type: logical scalar

If True, enable Walktrap community detection algorithm.

k-means clustering
CLUSTER_KMEANS_K_ENABLED: False

Type: logical scalar

If True, enable k-means clustering with selected numbers of clusters (ks).


CLUSTER_KMEANS_K: !code 3:6

Type: integer vector

A vector of number of clusters for which to compute k-means.

Note: !code 3:6 will evaluate to c(3, 4, 5, 6).


CLUSTER_KMEANS_KBEST_ENABLED: False

Type: logical scalar

If True, enable k-means clustering with best k based on gap statistics.

SC3 clustering
CLUSTER_SC3_ENABLED: False

Type: logical scalar

If True, enable SC3 with selected numbers of clusters (ks).


CLUSTER_SC3_K: !code 5:6

Type: integer vector

A vector of number of clusters for which to compute SC3 clustering.

Note: !code 3:6 will evaluate to c(3, 4, 5, 6).


CLUSTER_SC3_N_CORES: 1

Type: integer scalar

A number of CPU cores to use for SC3 computation. This may lead to out-of-memory errors for a number of cores greater than 1.

Note that drake may have problems when the pipeline is run in parallel mode and at the same time a target is using a parallel code (within target parallelism). For the SC3 computation is recommended to run the pipeline in sequential mode (DRAKE_PARALLELISM: "loop") and set DRAKE_TARGETS: ["cluster_sce_sc3"]. Once the cluster_sce_sc3 target is finished, you can switch the parallelism back to clustermq or future.


CLUSTER_SC3_ENABLED: False

Type: logical scalar

If False, skip SC3 clustering.


Cell type annotation

CELL_ANNOTATION_SOURCES_DEFAULTS:
  TRAIN_PARAMS:
    GENES: "de"
    SD_THRESH: 1
    DE_METHOD: "classic"
    DE_N: null
    ASSAY_TYPE: "logcounts"
  CLASSIFY_PARAMS:
    QUANTILE: 0.8
    TUNE_THRESH: 0.05
    ASSAY_TYPE: "logcounts"
  PRUNE_SCORE_PARAMS:
    N_MADS: 3
    MIN_DIFF_MED: -.inf
    MIN_DIFF_NEXT: 0
  DIAGNOSTICS_PARAMS:
    HEATMAP_N_TOP_MARKERS: 20

Type: named list (dictionary) of named lists

Default parameters for computation and reporting of cell type annotation. These can be overriden for each of the cell annotation reference source in CELL_ANNOTATION_SOURCES (see below).


CELL_ANNOTATION_SOURCES:
  - human_primary_cell_atlas_main:
      reference_type: "celldex"
      reference: "HumanPrimaryCellAtlasData"
      description: >
        Microarray datasets derived from human primary cells (Mabbott et al. 2013).
        Most of the labels refer to blood subpopulations but cell types from other tissues are also available.
      label_column: "label.main"
      label_subsets: []
    monaco_immune_fine:
      reference_type: "celldex"
      reference: "MonacoImmuneData"
      description: "This is the human immune reference that best covers all of the bases for a typical PBMC sample."
      label_column: "label.fine"
      label_subsets: []

Type: list of named lists or null

Specifies reference dataset(s) to be used for cell type annotation. The method used in {SingleR} for training and classification is described here.

The names of named lists are used to name the reference datasets, e.g. as the prefixes in names of colData() columns containing per-cell labels. More info about the structure of related targets is given in the Targets section. To skip the cell annotation step, define CELL_ANNOTATION_SOURCES as null.

Each reference dataset must have the following parameters:

In each entry (e.g. human_primary_cell_atlas_main above) you can override default parameters in CELL_ANNOTATION_SOURCES_DEFAULTS (by using lowercase names), e.g.

CELL_ANNOTATION_SOURCES:
  - human_primary_cell_atlas_main:
      ...
      train_params:
        de_method: "wilcox"
        de_n: 30

will override corresponding parameters in TRAIN_PARAMS in CELL_ANNOTATION_SOURCES_DEFAULTS.



Marker-based cell annotation signatures

MANUAL_ANNOTATION: False

Type: logical scalar

Whether marker-based annotation is enabled. Marker-based annotation is based on expression profiles, markers are taken from user-defined file. Resulted annotation is stored in annotation_metadata target. Marker-based cell annotation implemented from Giotto package enrichment function, which is based on PAGE method from Kim SY et al.


ANNOTATION_MARKERS: Null

Type: logical scalar

Path to user-define file contacting markers for each expected cell type. Each column name is one cell type, in rows are gene markers. Example:

Astrocytes,Endothelial,Neurons
GFAP,PECAM1,SLC6A1
SLC1A3,A2M,CHAT
GLUL,,GAD2

SCALE_ANNOTATION: False

Type: logical scalar

Whether to scale gene profiles.


OVERLAP: 5

Type: numeric scalar

Minimal overlap of genes in cell/cell in selected clusters, to assign cell type.


ANNOTATION_CLUSTERING: "cluster_kmeans_k4"

Type: character scalar

On which clustering (column of cell metadata) to perform annotation.


SHOW_VALUE: "value" (`"value"` | `"zscores"` | `"zscores_rescaled"`)

Type: character scalar

Which value to show in resulting heatmap.


MAKE_CELL_PLOT: False

Type: character scalar

Only toghether with spatial option. Whether to show pseudo tissue cell plots with enrichment of annotation.


HEATMAP_DIMRED: "umap"

Type: character scalar

Which dimension reduction use to show result of marker-based annotation.


Cell grouping assignment

Using the two parameters below, it is possible to add custom cell metadata or rearrange the current one into new groupings (saved as colData() columns of the SingleCellExperiment object). These groupings can be then referenced in other parameters, e.g. NORM_CLUSTERING_REPORT_DIMRED_PLOTS_OTHER or in the cluster_markers and contrasts stages.

ADDITIONAL_CELL_DATA_FILE: null

Type: character scalar or null

A path to file with additional cell data which will be added to colData() of the final SCE object of this stage. The additional data will overwrite existing columns (e.g. cluster_graph_louvain), but are appended before the CELL_GROUPINGS below is applied, so you can refer to additional data columns inside CELL_GROUPINGS. In the additional data, there can be only cell barcodes that are present after cell filtering in the 01_input_qc stage - the missing ones will have NA set in the resulting data. Rownames are not mandatory.

The supported file formats are CSV (comma separated, .csv extension) and Rds (dataframe-like objects, .Rds extension). The additional data must contain Barcode column by which they will be joined with the original cell data (colData()).


CELL_GROUPINGS: null

Type: list of named lists or null

This parameter specifies new cell groupings based on existing columns (primarily the clustering ones). Each key is a name of new cell grouping (column in colData() of the sce_dimred target) and contains a named list in the form:

For possible column names for source_column, see the sce_final_norm_clustering target in the section below.

Let's look at example cell grouping:

CELL_GROUPINGS:
  - cluster_graph_leiden_r0.4_annotated:
      source_column: "cluster_graph_leiden_r0.4"
      description: "Graph-based clustering (Leiden alg.), annotated clusters"
      assignments:
        1: "memory_CD4+"
        6: "B"
        7: "memory_CD4+"

Dimensionality reduction plots

NORM_CLUSTERING_REPORT_DIMRED_NAMES: ["umap", "pca", "tsne"]

Type: character vector ("umap" | "pca" | "tsne")

A vector of names of dimensionality reduction methods for which plots in the report will be made.


NORM_CLUSTERING_REPORT_DIMRED_PLOTS_OTHER:
  - "phase": "Cell cycle phases"
    "doublet_score": "Doublet score"
    "total": "Total number of UMI"
    "detected": "Detected number of genes"

Type: list of named lists with character scalars

Names of other variables in colData() to plot dimreds and color by, displayed in the report.

The format is "variable_name": "description", where description will appear in plot title.

Variables defined in the CELL_GROUPINGS parameter can have description set to null - their description from CELL_GROUPINGS will be used.

Input files

SELECTED_MARKERS_FILE: null

Type: character scalar or null

A path to CSV file with groups of markers to make expression dimred plots for. Set to null to skip. The plots will be created for each of NORM_CLUSTERING_REPORT_DIMRED_NAMES.

The format is following: each row defines gene symbols for a group. First column is a name of group and second column are gene symbols delimited by ":". Example:

Naive_CD4+_T,IL7R:CCR7

Do not use a header!

An example CSV file for PBMC 1k dataset is automatically copied to a newly initialized project (as system.file("extdata", "selected_markers.csv", package = "scdrake", mustWork = TRUE)). You can also view this file on GitHub here.


NORM_CLUSTERING_REPORT_RMD_FILE: "Rmd/single_sample/02_norm_clustering.Rmd"
NORM_CLUSTERING_REPORT_SIMPLE_RMD_FILE: "Rmd/single_sample/02_norm_clustering_simple.Rmd"

Type: character scalar

Paths to RMarkdown files used for HTML reports of this pipeline stage.

NORM_CLUSTERING_REPORT_RMD_FILE is technically more detailed, while NORM_CLUSTERING_REPORT_SIMPLE_RMD_FILE outputs only the dimensionality reduction plots along with cell annotation.

Output files

NORM_CLUSTERING_BASE_OUT_DIR: "02_norm_clustering"

Type: character scalar

A path to base output directory for this stage. It will be created under BASE_OUT_DIR specified in 00_main.yaml config.


NORM_CLUSTERING_SELECTED_MARKERS_OUT_DIR: "selected_markers"
NORM_CLUSTERING_DIMRED_PLOTS_OUT_DIR: "dimred_plots"
NORM_CLUSTERING_OTHER_PLOTS_OUT_DIR: "other_plots"
NORM_CLUSTERING_CELL_ANNOTATION_OUT_DIR: "cell_annotation"
NORM_CLUSTERING_REPORT_HTML_FILE: "02_norm_clustering.html"
NORM_CLUSTERING_REPORT_SIMPLE_HTML_FILE: "02_norm_clustering_simple.html"

Type: character scalar

Names of files and directories created under NORM_CLUSTERING_BASE_OUT_DIR. Subdirectories are not allowed.

HTML output parameters

NORM_CLUSTERING_KNITR_MESSAGE: False
NORM_CLUSTERING_KNITR_WARNING: False
NORM_CLUSTERING_KNITR_ECHO: False

Type: logical scalar

These are passed to knitr::opts_chunk() and used for rendering of stage's HTML report.

Outputs

Here you can find description of the most important targets for this stage. However, for a full overview, you have to inspect the source code of the get_input_norm_clustering_subplan() function.

HTML report target names: report_norm_clustering, report_norm_clustering_simple (only dimred plots)

SingleCellExperiment objects

sce_cc: a SCE object (the sce_final_input_qc target from the input_qc stage) with added cell cycle information to colData(): phase ("G1", "G2m" or "S"), s_score, g2m_score, cc_difference (s_score - g2m_score).


sce_norm: sce_cc with normalized counts in SingleCellExperiment::logcounts() assay and normalization_type item added to metadata().


sce_norm_hvg: sce_norm with added HVG information:

If the parameter HVG_RM_CC_GENES is True, sce_norm_hvg will also contain a PCA dimensionality reduction (named pca_with_cc) for whose computation all HVGs were used.


sce_rm_doublets: sce_norm_hvg with removed doublets and added information:


sce_pca: sce_rm_doublets with calculated PCA (50 PCs). Reduced dimensions can be retrieved by SingleCellExperiment::reducedDim(sce_pca, "pca")


sce_pca_selected_pcs: sce_pca with selected number of PCs. Reduced dimensions in pca slot are subsetted to the selected number of PCs, and pca_all slot contains the full matrix of 50 PCs. Information about selection of PCs is added to metadata(): pca_selection_method, pca_selected_pcs.


sce_dimred: sce_pca_selected_pcs with calculated t-SNE and UMAP, both using the selected number of PCs for calculation


sce_final_norm_clustering: a final SCE object of this stage which will be used in the cluster_markers and contrasts stages. This object is derived from sce_dimred, but cell clusterings along with CELL_GROUPINGS are added to colData().

The names of columns containing cell clusterings are:

The names of reduces dimensionality matrices (which can be retrieved by SingleCellExperiment::reducedDim()):

Or you can view their names with SingleCellExperiment::reducedDimNames()

Selection of PCs

pca_elbow_pcs, pca_gene_var_pcs: a number of selected PCs for elbow point and technical variance methods, respectively

pca_selected_pcs_plot: a {ggplot2} object showing cummulative variance explained and the selected number of first PCs for each of the methods

Cell clustering

clusters_all: a named list of integer vectors which wraps all clusterings below

Graph-based clustering

graph_snn: an {igraph} object of shared nearest-neighbor graph, returned from scran::buildSNNGraph()

cluster_graph_leiden, cluster_graph_walktrap, cluster_graph_louvain: lists with cell clusters returned from igraph::cluster_walktrap() and igraph::cluster_louvain(), respectively. Both functions are using graph_snn as the input

cluster_graph_louvain_df, cluster_graph_walktrap_df, cluster_graph_leiden_df: dataframes with graph-based clustering results

cluster_graph_leiden_clustree, cluster_graph_louvain_clustree: {ggplot2} objects of clustree::clustree() plots

k-means

cluster_kmeans_kbest_gaps: a clusGap object used to calculate the best K. Returned from cluster::clusGap(), to which matrix of selected PCs is passed

cluster_kmeans_kbest_k: the best K for k-means, returned from cluster::maxSE()

cluster_kmeans_kbest, cluster_kmeans_k: lists with cell clusters returned from stats::kmeans(). In the former, cluster_kmeans_kbest_k is used as the number of clusters. In the latter, Ks from the CLUSTER_KMEANS_K parameter are used.

cluster_kmeans_k_df, cluster_kmeans_kbest_df: dataframes with k-means clustering results

cluster_kmeans_k_clustree: {ggplot2} object of clustree::clustree() plot. If CLUSTER_KMEANS_KBEST_ENABLED is True, it will also contain k-means result with the best k.

SC3

sce_sc3: a SingleCellExperiment object returned from SC3::sc3()

cluster_sc3: a list with cell clusters

cluster_sc3_df: a dataframe with SC3 clustering results

cluster_sc3_stability_plots: a list of {ggplot2} objects, each returned from make_sc3_stability_plots() (where SC3::sc3_plot_cluster_stability() is used internally)

cluster_sc3_clustree: {ggplot2} object of clustree::clustree() plot


Cell type annotation

cell_annotation_params: a tibble with parameters parsed from CELL_ANNOTATION_SOURCES

cell_annotation: cell_annotation_params with added cell_annotation list column holding outputs (DataFrames) from SingleR::SingleR(). The structure of those DataFrames is described in ?SingleR::classifySingleR().


cell_annotation_labels: a named list of character vectors with per-cell label assignments. Later added to the cell_data target and in turn, to colData() of the sce_final_norm_clustering and sce_int_clustering_final (in integration pipeline) targets/objects.

By default, there are three vectors with labels for each reference dataset named as:

Note that you can use names of cell_annotation_labels's items in the CELL_GROUPINGS and NORM_CLUSTERING_REPORT_DIMRED_PLOTS_OTHER parameters, and also use them in cluster_markers (CLUSTER_MARKERS_SOURCES) and contrasts (CONTRASTS_SOURCES) stages


cell_annotation_diagnostic_plots: cell_annotation with added list columns holding diagnostic plots, i.e. for each reference dataset:

cell_annotation_diagnostic_plots_files: paths to PDF files of diagnostic plots in cell_annotation_diagnostic_plots (score_heatmaps_out_file, delta_distribution_plot_out_file, marker_heatmaps_out_file). That means when you make this target, those files will be created.

Plots

hvg_plot: a {patchwork} object with HVG metric statistics (average expression vs. variance)


pca_phase_plots: a list of one or two {ggplot2} objects. The first one is a plot of the first two PCs colored by cell cycle phase, where PCA was computed using HVGs. If the parameter HVG_RM_CC_GENES is True, the second plot is the same as the first one, but with PCA computed on all genes. In that case it also means that in the first plot, PCA was computed using HVGs with removed cell cycle-related genes.


pca_doublet_plot, pca_total_plot: a {ggplot2} plots of the first two PCs colored by doublet score and total number of UMIs, respectively


dimred_plots_clustering: a tibble holding plots of reduced dimensions colored by a clustering

dimred_plots_clustering_files_out, dimred_plots_clustering_united_files: make this target to export the plots (the latter produces a single multipage PDF)


dimred_plots_other_vars: a tibble holding plots of reduced dimensions colored by a specified variable, as defined by the NORM_CLUSTERING_REPORT_DIMRED_NAMES and NORM_CLUSTERING_REPORT_DIMRED_PLOTS_OTHER parameters.

dimred_plots_other_vars_files_out: make this target to export the plots

Selected markers

selected_markers_plots: a tibble holding selected marker plots ({patchwork} objects) for each dimensional reduction specified in the NORM_CLUSTERING_REPORT_DIMRED_NAMES parameter. Selected markers are read from CSV file defined in the SELECTED_MARKERS_FILE parameter.

selected_markers_plots_files_out: make this target to export the plots

You can also use the underlying function selected_markers_dimplot():

selected_markers_dimplot(
  sce = sce_final_norm_clustering,
  dimred = "umap",
  selected_markers_df = selected_markers_df,
  assay = "logcounts"
)

The selected_markers_df tibble can be read from the CSV file with

selected_markers_df <- readr::read_csv("markers.csv", col_names = c("group", "markers"), col_types = "cc")

Other targets

config_norm_clustering: a list holding parameters for this stage


cc_genes: a dataframe of cell cycle genes, based on Seurat::cc.genes.updated.2019 data. If the ORGANISM parameter in the 00_main.yaml config is "mouse", then gene symbols are first converted to sentence-case ("MKI67" -> "Mki67").


cell_data: a DataFrame holding colData() of sce_rm_doublets, clusterings from the clusters_all target, cell annotation labels from the cell_annotation_labels target, and the new cell groupings defined in the CELL_GROUPINGS parameter.



bioinfocz/scdrake documentation built on Sept. 19, 2024, 4:43 p.m.