RWRFGSEA: Random walk with restart and FGSEA

View source: R/rwrfgsea.R

RWRFGSEAR Documentation

Random walk with restart and FGSEA

Description

Runs random walk in a given network starting from seed genes selected from most over/under expressed in the data intersected with known disease genes. The gene affinities are then processed with FGSEA to yield pathway features.

Usage

RWRFGSEA(
  expr,
  gene_network,
  gene_set_list,
  disease_genes = NULL,
  rwr_gene_lists = NULL,
  rwr_seed_size = NULL,
  min_size = 5,
  max_size = 200,
  parallel = 1,
  verbose = FALSE,
  rwr_dnet = FALSE,
  rwr_restart_probability = 0.75,
  rwr_adjacency_normalization = "laplacian",
  rwr_affinity_normalization = "none",
  fgsea_input_cutoff = 0,
  rwr_ecdf = FALSE,
  second_seed_list_reverse_order = FALSE,
  rwr_return_seeds = FALSE,
  fgsea_nperm = 10000,
  logp_scaling = TRUE,
  ...
)

rwr_wrapper(
  expr,
  gene_network,
  rwr_seed_size = nrow(expr)%/%6,
  parallel = 1,
  rwr_restart_probability = 0.75,
  rwr_adjacency_normalization = "laplacian",
  rwr_affinity_normalization = "none",
  rwr_dnet = FALSE,
  verbose = verbose,
  ...
)

fgsea_wrapper(
  data_matrix,
  gene_set_list,
  fgsea_input_cutoff = 0,
  parallel = 1,
  fgsea_nperm = 10000,
  logp_scaling = TRUE,
  ...
)

Arguments

expr

a gene expression matrix with samples on columns

gene_network

an igraph.object with nodes matching to expr rows

gene_set_list

list of character vectors containing gene sets for FGSEA

disease_genes

a character vector containing gene IDs associated with the target disease

rwr_gene_lists

list containing exactly two character vectors corresponding to gene ids. Used to separate genes for double RWR (e.g. up and down regulated seeds) RWR affinities of second seed list will get subtracted from the RWR affinities of the first seed list.

rwr_seed_size

integer, controls the number of gene seed candidates to intersect with the disease genes, defaults to one sixth of rows.

min_size

integer, minimum size of gene sets

max_size

integer, maximum size of gene sets

parallel

integer, number of threads

verbose

TRUE/FALSE

rwr_dnet

if TRUE, uses dRWR. Otherwise uses an internal function with similar options and solve.

rwr_restart_probability

restart probability used for RWR.

rwr_adjacency_normalization

method used to normalise the adjacency matrix of the input graph. See seeded_rwr or dRWR for more details.

rwr_affinity_normalization

method used to normalise the rwr affinity matrix. See dRWR for more details.

fgsea_input_cutoff

a cutoff value used to select most visited genes for FGSEA.

rwr_ecdf

if TRUE, uses ecdf_transform to transform expression values into empirical cumulative probability density.

second_seed_list_reverse_order

if TRUE, seeds for second RWR are selected from based on lowest expression or ecdf.

rwr_return_seeds

if TRUE, returns seeds in output list (not implemented)

fgsea_nperm

a numeric value determining the number of permutations used in fgseaSimple

logp_scaling

if TRUE, multiplies FGSEA normalized enrichment scores (NESs) with the log p.value to give more weight to the most significantly enriched pathways.

...

extra arguments are ignored

data_matrix

numeric values used for ranking, assumes samples on columns

Value

matrix of enrichment scores

Functions

  • rwr_wrapper(): Transforms gene-expression values into gene-network node-affinity values

  • fgsea_wrapper(): Wrapper for fast pre-ranked gene set enrichment analysis (fgseaSimple)

Examples

library(COPS)

ad_data <- ad_ge_micro_zscore
ad_wgcna_net <- coexpression_network_unweighted(ad_data)
pw_db <- msigdbr::msigdbr(
    species = "Homo sapiens", 
    category = "C2", 
    subcategory = "CP:KEGG")
pw_list <- lapply(split(pw_db, pw_db$gs_name), function(x) x$ensembl_gene) 

ad_rwrfgsea_res <- RWRFGSEA(
    ad_data, 
    ad_wgcna_net, 
    pw_list[1:3], 
    parallel = 1, 
    fgsea_nperm = 1e2)

# Separate genes (e.g. based on DEA)
set.seed(0)
# toy example random fold-change
ad_fc <- sample(c(1,-1), nrow(ad_data), replace = TRUE)

ad_gene_lists <- list(
    rownames(ad_data)[ad_fc > 0], 
    rownames(ad_data)[ad_fc < 0])
ad_rwrfgsea_res <- RWRFGSEA(
    ad_data, 
    ad_wgcna_net, 
    pw_list[1:3], 
    rwr_genelists = ad_gene_lists, 
    rwr_ecdf = TRUE, 
    second_seed_list_reverse_order = TRUE, 
    parallel = 1, 
    fgsea_nperm = 1e2)

## Not run: 
# OTP genes for dermatitis, moderate download size
assoc_score_fields <- paste(
    paste0(
        "&fields=", 
        c("target.gene_info.symbol", "association_score.datatypes")
    ), 
    collapse = ""
)
disease_otp <- COPS::retrieveDiseaseGenesOT(
    "MONDO_0002406", 
    assoc_score_fields)[[1]]
otp_genes <- disease_otp$target.gene_info.symbol[
    disease_otp$association_score.datatypes.genetic_association > 0]

ad_rwrfgsea_res <- RWRFGSEA(
    ad_data, 
    ad_wgcna_net, 
    pw_list[1:3], 
    disease_genes = otp_genes, 
    rwr_genelists = ad_gene_lists, 
    rwr_ecdf = TRUE, 
    second_seed_list_reverse_order = TRUE, 
    parallel = 2)

## End(Not run)


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.