get_multi_omic_kernels: Multi-omic kernels

View source: R/kernels.R

get_multi_omic_kernelsR Documentation

Multi-omic kernels

Description

Process a set of omics observations into kernels

Usage

get_multi_omic_kernels(
  dat_list,
  data_is_kernels = FALSE,
  kernels = rep_len("linear", length(dat_list)),
  kernels_center = TRUE,
  kernels_normalize = TRUE,
  kernels_scale_norm = FALSE,
  kernel_gammas = rep_len(0.5, length(dat_list)),
  pathway_networks = NULL,
  pathway_node_betweenness_endpoints = TRUE,
  pathway_first_shortest_path = FALSE,
  kernel_rwr_restart = 0.7,
  kernel_rwr_seeds = "discrete",
  kernel_rwr_seed_under_threshold = qnorm(0.025),
  kernel_rwr_seed_over_threshold = qnorm(0.975),
  kernel_rwr_dnet = TRUE,
  kernel_rwr_verbose = FALSE,
  gene_id_list = NULL,
  zero_var_removal = TRUE,
  mvc_threads = 1,
  preprocess_data = TRUE,
  pathway_rwr_parallelization = FALSE,
  ...
)

Arguments

dat_list

List of input data.frames for input.

data_is_kernels

If TRUE, input data is assumed to be kernel matrices. Otherwise kernels are computed based on input data and the kernels parameter.

kernels

Character vector of kernel names to use for different views. See details.

kernels_center

Logical vector specifying which kernels should be centered. Repeated for each view if length 1.

kernels_normalize

Logical vector specifying which kernels should be normalized Repeated for each view if length 1.

kernels_scale_norm

Logical vector specifying which kernels should be scaled to unit F-norm. Repeated for each view if length 1.

kernel_gammas

Numeric vector specifying gamma for the gaussian kernel.

pathway_networks

List of igraph objects containing pathway networks. Required for pathway kernels.

pathway_node_betweenness_endpoints

see node_betweenness_parallel

pathway_first_shortest_path

see node_betweenness_parallel

kernel_rwr_restart

Restart probability for RWR, applies to both RWR-BWK and PAMOGK.

kernel_rwr_seeds

Seed selection strategy for RWR, one of: "discrete", "continuous", or "threshold". Applies to both RWR-BWK and PAMOGK. See details below.

kernel_rwr_seed_under_threshold

z-score threshold for under-expressed, applies to both RWR-BWK and PAMOGK.

kernel_rwr_seed_over_threshold

z-score threshold for over-expressed, applies to both RWR-BWK and PAMOGK.

kernel_rwr_dnet

Use dRWR.

kernel_rwr_verbose

See dRWR, applies to both RWR-BWK and PAMOGK.

gene_id_list

If data has been pre-processed by the COPS pipeline, the genes of each omic need to be provided as a list.

zero_var_removal

If set, removes all zero variance features from each omic.

mvc_threads

Number of threads to use for supported operations.

preprocess_data

If TRUE, applies data_preprocess.

pathway_rwr_parallelization

parallelizes pathway network RWR (experimental)

...

Extra arguments are ignored.

Value

list of kernels

Supported kernels:

  • "linear" - Linear kernel based on standard dot product.

  • "gaussian", "rbf" - Gaussian kernel, a.k.a. radial basis function.

  • "jaccard" - Kernel based on Jaccard index. Used for binary features.

  • "tanimoto" - For now, this is identical to "jaccard".

  • "BWK" - Betweenness Weighted Kernel. Uses pathway networks to compute betweenness centrality which is used to weight features in a linear pathway kernels.

  • "RWR-BWK" - BWK with RWR and z-score based seeding similar to PAMOGK.

  • "PAMOGK" - PAthway Multi-Omics Graph Kernel (Tepeli et al. 2021). Uses z-scores, RWR and shortest paths in pathway networks to create pathway kernels.

  • "PIK" - Pathway Induced Kernel (Manica et al. 2019). Uses pathway network adjacency matrices (specifically normalized Laplacians) to define pathway kernels.

Please note that for pathway kernels, the input data must always be mapped to genes and that the names must match with the gene names in the pathways. The default set of pathways is KEGG molecular pathways with gene symbols.

PAMOGK and RWR-BWK seed weight options:

  • "discrete" - 1 if |z| > t, 0 otherwise.

  • "continuous" - z

  • "threshold" - z if |z| > t, 0 otherwise

Regardless of the option, the seeds are divided into two sets based on the sign of the z-score. Each set has a separate smoothing step and the end result is two different kernels per pathway per omic. This impacts the RWR label smoothing by changing the initial distribution.


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.