runSSN: Perform gene program discovery using SNN analysis
In NMikolajewicz/scMiko: scRNAseq analysis functions. Developed using Seurat framework.

runSSN

R Documentation

Perform gene program discovery using SNN analysis

Description

Runs scale-free shared nearest neighbor network (SNN) analysis on subset of features specified in Seurat object.

Usage

runSSN(
  object,
  features,
  scale_free = T,
  robust_pca = F,
  data_type = c("pearson", "deviance"),
  reprocess_sct = F,
  slot = c("scale", "data"),
  batch_feature = NULL,
  do_scale = F,
  do_center = F,
  pca_var_explained = 0.9,
  weight_by_var = F,
  umap_knn = 10,
  optimize_resolution = T,
  target_purity = 0.8,
  step_size = 0.05,
  n_workers = 1,
  verbose = T
)

Arguments

`object`	Seurat object
`features`	features to compute SNN on. If features are missing from scaled data, scaled data is recomputed.
`scale_free`	Logical to enforce scale free topology. Default is T.
`robust_pca`	Logical to run robust PCA (WARNING: computationally intensive, not recommended for large data). Default is F.
`data_type`	Data type to compute SNN on. "pearson" - pearson residuals for count data based on regularized negative binomial model. "deviance" - deviance for count data based on multinomial null model (assumes each feature has constant rate).
`reprocess_sct`	if 'data_type' is "pearson", specify whether SCTransform is run (regardless whether features missing from existing scaled data or not). Default is F.
`slot`	Slot to use. "scale" - RECOMMENDED (default) "data" - Not recommended and not tested extensively. Available for exploration. If specified, 'data_type' is ignored.
`batch_feature`	Variables to regress out. Default is NULL.
`do_scale`	Whether to scale data (only if 'slot' = "data")
`do_center`	Whether to center data (only if 'slot' = "data")
`pca_var_explained`	Proportion of variance explained by PCA. Uses that top N PC components that explain 'pca_var_explained' amount of variance. Default is 0.9.
`weight_by_var`	Weight the feature embedding by the variance of each PC
`umap_knn`	This determines the number of neighboring points used in local approximations of UMAP manifold structure. Larger values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50. default is 10.
`optimize_resolution`	Logical specifying whether to identify optimal clustering resolution. Optimal resolution identifying use target purity criteria. Default is T.
`target_purity`	Target purity for identifying optimal cluster resolution. Default is 0.8.
`step_size`	Step size between consecutive resolutions to test. Default is 0.05.
`n_workers`	Number of workers for parallel implementation. Default is 1.
`verbose`	Print progress. Default is T.

Value

Cell x Gene Seurat object, with gene-centric UMAP embedding and associated gene programs

Author(s)

Nicholas Mikolajewicz

References

https://nmikolajewicz.github.io/scMiko/articles/Module_Detection.html

Examples


# load human gastrulation data
so.query <- readRDS("../data/demo/so_tyser2021_220621.rds")

# Expression-based feature selection
features_expr <- findNetworkFeatures(object = so.query, method = "expr",
                                     min_pct = 0.5)

# Highly-variable genes
features_hvg <- findNetworkFeatures(object = so.query, method = "hvg",
                                    n_features =  2000)

# run SSN
so.gene <- runSSN(object = so.query ,
     features = unique(c(features_hvg, features_dev)),
     scale_free = T,
     robust_pca = F,
     data_type = "pearson",
     reprocess_sct = T,
     slot = c("scale"),
     batch_feature = NULL,
     pca_var_explained = 0.9,
     optimize_resolution = T,
     target_purity = 0.8,
     step_size =  0.05,
     n_workers = parallel::detectCores(),
     verbose = F)

# get network connectivity plot
plt_connectivity <- SSNConnectivity(so.gene, quantile_threshold = 0.85, raster_dpi = 500)

# visualize
plt_connectivity$plot_edge + labs(title = "Network Connectivity")


# specify pruning threshold [0,1] (low values = less pruning, high values = more pruning)
prune.threshold <- 0.1

get feature-specific connectivities (wi)
df.wi   <- pruneSSN(object = so.gene,
                    graph = "RNA_snn_power",
                    prune.threshold = prune.threshold,
                    return.df = T)

# visualize
plt.prune <- df.wi %>%
  ggplot(aes(x = wi_l2)) +
  geom_histogram(bins = 30) +
  geom_vline(xintercept = prune.threshold, linetype = "dashed", color = "tomato") +
  labs(x = "Degree (L2 norm)", y = "Count",
       title = "Network Pruning",
       subtitle = paste0(signif(100*sum(df.wi$wi_l2 <=  prune.threshold)/nrow(df.wi), 3),
                         "% (", sum(df.wi$wi_l2 <=  prune.threshold), "/", nrow(df.wi), ") genes pruning" )) +
  theme_miko(grid = T)

print(plt.prune)

# get (pruned) gene module list
mod.list   <- pruneSSN(object = so.gene, graph = "RNA_snn_power", prune.threshold = prune.threshold)

NMikolajewicz/scMiko documentation built on June 28, 2023, 1:41 p.m.