preprocess_cds: Preprocess a cds to prepare for trajectory inference

View source: R/preprocess_cds.R

preprocess_cdsR Documentation

Preprocess a cds to prepare for trajectory inference

Description

Most analyses (including trajectory inference, and clustering) in Monocle3, require various normalization and preprocessing steps. preprocess_cds executes and stores these preprocessing steps.

Specifically, depending on the options selected, preprocess_cds first normalizes the data by log and size factor to address depth differences, or by size factor only. Next, preprocess_cds calculates a lower dimensional space that will be used as the input for further dimensionality reduction like tSNE and UMAP.

Usage

preprocess_cds(
  cds,
  method = c("PCA", "LSI"),
  num_dim = 50,
  norm_method = c("log", "size_only", "none"),
  use_genes = NULL,
  pseudo_count = NULL,
  scaling = TRUE,
  verbose = FALSE,
  build_nn_index = FALSE,
  nn_control = list()
)

Arguments

cds

the cell_data_set upon which to perform this operation

method

a string specifying the initial dimension method to use, currently either "PCA" or "LSI". For "LSI" (latent semantic indexing), it converts the (sparse) expression matrix into a tf-idf matrix and then performs SVD to decompose the gene expression / cells into certain modules / topics. Default is "PCA".

num_dim

the dimensionality of the reduced space.

norm_method

Determines how to transform expression values prior to reducing dimensionality. Options are "log", "size_only", and "none". Default is "log". Users should only use "none" if they are confident that their data is already normalized.

use_genes

NULL or a list of gene IDs. If a list of gene IDs, only this subset of genes is used for dimensionality reduction. Default is NULL.

pseudo_count

NULL or the amount to increase expression values before normalization and dimensionality reduction. If NULL (default), a pseudo_count of 1 is added for log normalization and 0 is added for size factor only normalization.

scaling

When this argument is set to TRUE (default), it will scale each gene before running trajectory reconstruction. Relevant for method = PCA only.

verbose

Whether to emit verbose output during dimensionality reduction

build_nn_index

logical When this argument is set to TRUE, preprocess_cds builds and stores the nearest neighbor index from the reduced dimension matrix for later use. Default is FALSE.

nn_control

An optional list of parameters used to make the nearest neighbor index. See the set_nn_control help for detailed information.

Value

an updated cell_data_set object

Examples

  
    cell_metadata <- readRDS(system.file('extdata',
                                         'worm_embryo/worm_embryo_coldata.rds',
                                         package='monocle3'))
    gene_metadata <- readRDS(system.file('extdata',
                                         'worm_embryo/worm_embryo_rowdata.rds',
                                         package='monocle3'))
    expression_matrix <- readRDS(system.file('extdata',
                                             'worm_embryo/worm_embryo_expression_matrix.rds',
                                             package='monocle3'))
    cds <- new_cell_data_set(expression_data=expression_matrix,
                             cell_metadata=cell_metadata,
                             gene_metadata=gene_metadata)
    cds <- preprocess_cds(cds)
  


cole-trapnell-lab/monocle3 documentation built on April 7, 2024, 9:24 p.m.