smooth_and_cluster_genes: Characterize genes by behavior over pseudotime, returning...
In maehrlab/thymusatlastools: Tools for analysis of single-cell transcriptomic data

Characterize genes by behavior over pseudotime, returning cluster assignments and p values.

1
2
3

smooth_and_cluster_genes(dge, results_path, genes.use = get_mouse_tfs(),
  num_periods_initial_screen = 20, pval_cutoff = 0.05, do_adjust = T,
  abcissae_kmeans = 20, num_clusters = NULL)

`dge`	should be a seurat object with a field "pseudotime". The field ‘dge@data' is accessed for expression levels – for Eric’s objects, the units will be log2(1+CP10K).
`results_path`	is a character vector showing where to dump the output.
`num_periods_initial_screen`	Cells are partitioned into this many pseudotime periods (equal number of cells in each). Initial screening is based on a piecewise linear model where expression is constant within these bins.
`pval_cutoff`	Genes are screened by p-value to avoid too much computationally expensive smoothing.
`do_adjust`	Logical – if TRUE, then apply BH correction.
`abcissae_kmeans`	Gene expression is fed into k-means as a series of predictions at successive time points. The arguments says how many time points to predict and feed in (if length one) or what time points (if longer).
`num_clusters`	Genes are partitioned into this many modules. If NULL (default) the value is selected via the gap statistics and their SEs using the method in the original gap statistic paper: Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. Journal of the Royal Statistical Society B, 63, 411<e2><80><93>423. There's one adjustment: this function will never use just one cluster. It will issue a warning and use 2.

A list with elements:

dge: the Seurat object
gene_stats: genes with effect sizes and cluster labels.
smoothers: fitted regression models, one for each gene.
cluster_mod: output from stats::kmeans
gap_stats: output from cluster::clusGap

This function helps explore gene dynamics over pseudotime. It goes through three main steps:

find genes that respond strongly to pseudotime.
smooth those genes' expression to form an overall pseudotime trend.
cluster genes based on smoothed expression patterns that have been shifted/scaled to the unit interval.

maehrlab/thymusatlastools documentation built on May 28, 2019, 2:32 a.m.