```
library(knitr)
```

`tradeSeq`

is an `R`

package that allows analysis of gene expression along trajectories. While it has been developed and applied to single-cell RNA-sequencing (scRNA-seq) data, its applicability extends beyond that, and also allows the analysis of, e.g., single-cell ATAC-seq data along trajectories or bulk RNA-seq time series datasets. For every gene in the dataset, `tradeSeq`

fits a generalized additive model (GAM) by building on the `mgcv`

R package. It then allows statistical inference on the GAM by assessing contrasts of the parameters of the fitted GAM model, aiding in interpreting complex datasets. All details about the `tradeSeq`

model and statistical tests are described in our preprint [@VandenBerge2019a].

In this vignette, we analyze a subset of the data from [@Paul2015].
A `SingleCellExperiment`

object of the data has been provided with the `tradeSeq`

package and can be retrieved as shown below. The data and UMAP reduced dimensions were derived from following the Monocle 3 vignette.

The main vignette focuses on using `tradeSeq`

downsteam of `slingshot`

. Here, we present how to use `tradeSeq`

downstream of `monocle`

[@Qiu2017].

library(tradeSeq) library(RColorBrewer) library(SingleCellExperiment) library(monocle) # For reproducibility palette(brewer.pal(8, "Dark2")) data(countMatrix, package = "tradeSeq") counts <- as.matrix(countMatrix) rm(countMatrix) data(celltype, package = "tradeSeq")

We will fit developmental trajectories using the `monocle`

package.

set.seed(200) pd <- data.frame(cells = colnames(counts), cellType = celltype) rownames(pd) <- pd$cells fd <- data.frame(gene_short_name = rownames(counts)) rownames(fd) <- fd$gene_short_name cds <- newCellDataSet(counts, phenoData = new("AnnotatedDataFrame", data = pd), featureData = new("AnnotatedDataFrame", data = fd)) cds <- estimateSizeFactors(cds) cds <- reduceDimension(cds, max_components = 2) cds <- orderCells(cds) cds <- orderCells(cds, root_state = 5) plot_cell_trajectory(cds)

Only possible for DDRTree. There are two options to do this.
Either use the `extract_monocle_info`

function and fit using the default `fitGAM`

mode.

info <- extract_monocle_info(cds) sce <- fitGAM(counts = Biobase::exprs(cds), cellWeights = info$cellWeights, pseudotime = info$pseudotime)

or use the specific `CellDataSet`

method.

sce <- fitGAM(cds, verbose = TRUE)

You can then analyse the `sce`

object following the main vignette.

As of now (06/2020), monocle3[@Cao2019], is still in its beta version. Therefore, we have no plan yet to include a S4 method for monocle3 while it is not on CRAN or Bioconductor and the format is still moving. However, we present below a way to use `tradeSeq`

downstream of `monocle3`

as of version '0.2', for a fully connected graph. We follow the tutorial from the monocle3 website.

You will need to install monocle3 from here before running the code below.

set.seed(22) library(monocle3) # Create a cell_data_set object cds <- new_cell_data_set(counts, cell_metadata = pd, gene_metadata = data.frame(gene_short_name = rownames(counts), row.names = rownames(counts))) # Run PCA then UMAP on the data cds <- preprocess_cds(cds, method = "PCA") cds <- reduce_dimension(cds, preprocess_method = "PCA", reduction_method = "UMAP") # First display, coloring by the cell types from Paul et al plot_cells(cds, label_groups_by_cluster = FALSE, cell_size = 1, color_cells_by = "cellType") # Running the clustering method. This is necessary to the construct the graph cds <- cluster_cells(cds, reduction_method = "UMAP") # Visualize the clusters plot_cells(cds, color_cells_by = "cluster", cell_size = 1) # Construct the graph # Note that, for the rest of the code to run, the graph should be fully connected cds <- learn_graph(cds, use_partition = FALSE) # We find all the cells that are close to the starting point cell_ids <- colnames(cds)[pd$cellType == "Multipotent progenitors"] closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex closest_vertex <- as.matrix(closest_vertex[colnames(cds), ]) closest_vertex <- closest_vertex[cell_ids, ] closest_vertex <- as.numeric(names(which.max(table(closest_vertex)))) mst <- principal_graph(cds)$UMAP root_pr_nodes <- igraph::V(mst)$name[closest_vertex] # We compute the trajectory cds <- order_cells(cds, root_pr_nodes = root_pr_nodes) plot_cells(cds, color_cells_by = "pseudotime")

library(magrittr) # Get the closest vertice for every cell y_to_cells <- principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>% as.data.frame() y_to_cells$cells <- rownames(y_to_cells) y_to_cells$Y <- y_to_cells$V1 # Get the root vertices # It is the same node as above root <- cds@principal_graph_aux$UMAP$root_pr_nodes # Get the other endpoints endpoints <- names(which(igraph::degree(mst) == 1)) endpoints <- endpoints[!endpoints %in% root] # For each endpoint cellWeights <- lapply(endpoints, function(endpoint) { # We find the path between the endpoint and the root path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]] path <- as.character(path) # We find the cells that map along that path df <- y_to_cells[y_to_cells$Y %in% path, ] df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells)) colnames(df) <- endpoint return(df) }) %>% do.call(what = 'cbind', args = .) %>% as.matrix() rownames(cellWeights) <- colnames(cds) pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights), nrow = ncol(cds), byrow = FALSE)

sce <- fitGAM(counts = counts, pseudotime = pseudotime, cellWeights = cellWeights)

Then, the `sce`

object can be analyzed following the main vignette.

```
sessionInfo()
```

statOmics/tradeSeq documentation built on Feb. 19, 2021, 11:40 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.