iterative_LSI: Iterative LSI

Description Usage Arguments Value References

View source: R/reduce_dimensions.R

Description

This function aims to both minimize batch effects and accentuate cell type differences in a single cell experiment. This function was implemented using Monocle3 but takes inspiration from the Granja et. al. reference cited below which took inspiration from the fly ATAC paper. At it's heart this function iterates through three main steps: 1) Using TFIDF transformation and SVD to normalize data 2) Clustering this normalized data using leiden clustering in high dimensional space and 3) identifying those features that are over-represented in the resulting clusters using a simple counting method. These three steps are repeated using features identified in step 3 to subset the normalization matrix in step 1 and repeating through the process. TFIDF transformation is supplied in this package. SVD is performed using the irilba package. Leiden clustering is performed using the monocle3 implementation and finally the counting per cluster is performed using the edgeR cpm function. This function takes as its input a cell_data_set and will iterate through n number of iterations. The output of this function is then appropriately input into dimensionality reduction methods such as UMAP or tSNE. The number of iterations is set by the number of resolution parameters specified.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
iterative_LSI(
  cds,
  num_dim = 25,
  starting_features = NULL,
  resolution = c(1e-04, 3e-04, 5e-04),
  do_tf_idf = T,
  num_features = c(3000, 3000, 3000),
  exclude_features = NULL,
  binarize = FALSE,
  scale = T,
  log_transform = T,
  LSI_method = 1,
  partition_qval = 0.05,
  seed = 2020,
  scale_to = 10000,
  leiden_k = 20,
  leiden_weight = FALSE,
  leiden_iter = 1,
  verbose = F,
  return_iterations = F,
  ...
)

Arguments

cds

the cell_data_set upon which to perform this operation.

num_dim

Numeric indicating the number of prinicipal components to be in downstream ordering. Default value is NULL which will result in use of all PCs

resolution

vector of resolution values for leiden clustering

num_features

number of features to use for dimensionality reduction (default 3000). To use different numbers of features for different iterations, supply a vector that is the same length as the resolution vector.

exclude_features

character vector of features (rownames of assay(cds))

binarize

boolean whether to binarize data prior to TFIDF transformation

seed

numeric seed

scale_to

numeric value to scale data

return_iterations

boolean whether to return iterations; funciton will then output a list contianing the final cds and all SVD matrices, clusters and features used in each iteration

Value

an updated cell_data_set object with a reduced dimension LSI object and clusters object

References

Granja, J. M.et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature Biotechnology, 37(12), 1458–1465.

UMAP: McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018

tSNE: Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. J. Mach. Learn. Res., 9(Nov):2579– 2605, 2008.

Cusanovich, D. A., Reddington, J. P., Garfield, D. A., Daza, R. M., Aghamirzaie, D., Marco-Ferreres, R., et al. (2018). The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature, 555(7697), 538–542.


scfurl/m3addon documentation built on Aug. 9, 2021, 5:30 p.m.