multi_omic_clustering: Multi-omic clustering via multi-view clustering or...

View source: R/multi_omics.R

multi_omic_clusteringR Documentation

Multi-omic clustering via multi-view clustering or integration

Description

Multi-omic clustering via multi-view clustering or integration

Usage

multi_omic_clustering(
  dat_list,
  meta_data = NULL,
  multi_omic_methods = "ANF",
  n_clusters = 2,
  distance_metric = "euclidean",
  correlation_method = "spearman",
  standardize_data = FALSE,
  non_negativity_transform = rep_len("none", length(dat_list)),
  view_distributions = rep_len("gaussian", length(dat_list)),
  icp_lambda = rep(0.03, length(dat_list)),
  icp_burnin = 100,
  icp_draw = 200,
  icp_maxiter = 20,
  icp_sdev = 0.05,
  icp_eps = 1e-04,
  icb_burnin = 1000,
  icb_draw = 1200,
  icb_sdev = 0.5,
  icb_thin = 1,
  nmf_maxiter = 200,
  nmf_st.count = 20,
  nmf_n.ini = 30,
  nmf_ini.nndsvd = TRUE,
  nmf_scaling = "F-ratio",
  mofa_convergence_mode = "medium",
  mofa_maxiter = 1000,
  mofa_environment = NULL,
  mofa_lib_path = NULL,
  anf_neighbors = 20,
  kkmeans_algorithm = "spectral",
  kkmeans_refine = FALSE,
  kkmeans_maxiter = 100,
  kkmeans_n_init = 100,
  kkmeans_tol = 1e-08,
  mkkm_mr_lambda = 1,
  mkkm_mr_tolerance = 1e-08,
  mkkm_mr_mosek = FALSE,
  mkkm_mr_mosek_verbosity = 1L,
  ecmc_a = 1,
  ecmc_b = 1,
  ecmc_eps = 1e-06,
  ecmc_maxiter = 100,
  ecmc_mkkm_mr = TRUE,
  data_is_kernels = FALSE,
  zero_var_removal = TRUE,
  mvc_threads = 1,
  gene_id_list = NULL,
  preprocess_data = TRUE,
  ...
)

Arguments

dat_list

List of input data.frames for input.

meta_data

A single data.frame or a list that includes meta data for each view. If a list is provided, at the moment only the first element is used (by appending to clustering output).

multi_omic_methods

Vector of algorithm names to be applied. See details.

n_clusters

Integer vector of number of clusters to output.

distance_metric

Distance metric for clustering factorized data (only for MOFA).

correlation_method

Correlation method for distance_metric, if applicable.

standardize_data

If set, standardizes data before clustering.

non_negativity_transform

Vector of transformation names for IntNMF. See details below.

view_distributions

A vector specifying the distribution to use for each view. Used by iCluster+, iClusterBayes and MOFA2. Options are "gaussian", "bernoulli" and "poisson".

icp_lambda

iCluster+ L1 penalty for each view. See iClusterPlus.

icp_burnin

iCluster+ number of MCMC burn in samples for approximating joint distribution of latent variables. See iClusterPlus.

icp_draw

iCluster+ number of MCMC samples to draw after burn in for approximating joint distribution of latent variables. See iClusterPlus.

icp_maxiter

iCluster+ maximum number of Newton-Rhapson (EM) iterations. See iClusterPlus. iClusterBayes.

icp_sdev

iCluster+ MCMC random walk standard deviation. See iClusterPlus.

icp_eps

iCluster+ algorithm convergence threshold. See iClusterPlus.

icb_burnin

iClusteBayes number of samples for MCMC burn in. See iClusterBayes.

icb_draw

iClusteBayes number of MCMC samples to draw after burn in. See iClusterBayes.

icb_sdev

iClusteBayes MCMC random walk standard deviation. See iClusterBayes.

icb_thin

iClusteBayes MCMC thinning, only one sample in every icb_thin samples will be used. See iClusterBayes.

nmf_maxiter

Maxiter for IntNMF. See nmf.mnnals.

nmf_st.count

Count stability for IntNMF. See nmf.mnnals.

nmf_n.ini

Number of initializations for IntNMF. See nmf.mnnals.

nmf_ini.nndsvd

If set, IntNMF uses NNDSVD for initialization. See nmf.mnnals.

nmf_scaling

Omic weights that are used for scaling. Defaults to the Frobenius norm ratio similarly to Chalise et al. 2017.

mofa_convergence_mode

MOFA convergence threshold. See get_default_training_options.

mofa_maxiter

MOFA maximum iterations. See get_default_training_options.

mofa_environment

If set, uses the specified Python environment (with mofapy). Defaults to basilisk.

mofa_lib_path

Path to libpython. May be required if using non-default mofa_environment.

anf_neighbors

Number of neighbours to use in knn-graph.

kkmeans_algorithm

See kernel_kmeans.

kkmeans_refine

See kernel_kmeans.

kkmeans_maxiter

See kernel_kmeans.

kkmeans_n_init

See kernel_kmeans.

kkmeans_tol

See kernel_kmeans.

mkkm_mr_lambda

Regularization parameter for mkkm_mr.

mkkm_mr_tolerance

Convergence threshold for mkkm_mr.

mkkm_mr_mosek

If set, uses Rmosek for convex optimization instead of CVXR for mkkm_mr.

mkkm_mr_mosek_verbosity

MOSEK verbosity parameter for mkkm_mr.

ecmc_a

Regularization parameter for ECMC.

ecmc_b

Regularization parameter for ECMC.

ecmc_eps

Convergence threshold for ECMC.

ecmc_maxiter

Maximum number of iterations for ECMC.

ecmc_mkkm_mr

If set, uses mkkm_mr on consensus kernels obtained from ECMC. Otherwise uses the average kernel and kernel k-means.

data_is_kernels

If TRUE, input data is assumed to be kernel matrices. Otherwise kernels are computed based on input data and the kernels parameter of get_multi_omic_kernels.

zero_var_removal

If set, removes all zero variance features from the data. It is called fold-wise, because this is assumed to be run inside CV.

mvc_threads

Number of threads to use for supported operations.

gene_id_list

List of gene/feature names for each view. If set, matches pipeline standardized feature names ("dim1", "dim2", ...) to names on the list. Required for pathway kernels.

preprocess_data

If the input data has already been processed by the COPS-pipeline, this should be disabled.

...

Arguments are passed to clustering_analysis when using MOFA and get_multi_omic_kernels when using kernel methods.

Details

Supported methods:

  • "ANF" - Affinity Network Fusion ANF

  • "iClusterPlus" or "iCluster+" - iClusterPlus. Supports only up to 4 views.

  • "iClusterBayes" - codeiClusterBayes. Supports only up to 6 views

  • "IntNMF" - Integrative Non-negative Matrix Factorization nmf.mnnals.

  • "average_kernel" - kernel k-means with average kernel.

  • "mkkm_mr" - Multiple Kernel K-Means with Matrix-induced Regularization mkkm_mr.

  • "ECMC" - Enhanced Consensus Multi-view Clustering ECMC.

  • "MOFA2" - Multi-Omics Factor Analysis. See vignette("getting_started_R", "MOFA2"). Resulting factorization is clustered with single-view algorithms by using clustering_analysis.

For supported kernels see get_multi_omic_kernels:

NMF non-negativity transform may be necessary if non-negativity was not considered while pre-processing the data. There are a few convenience functions included to transform the data as needed:

  • "logistic" - 1/(1 + exp(-x)), maps input from (-Inf,Inf) to [0,1]. Used for e.g. microarray data or methylation M-values.

  • "rank" - ranks values and divides by length, maps input from (-Inf,Inf) to [0,1].

  • "offset2" - adds 2 to input. Useful for e.g. copy number alterations (assuming no alterations lower than -2).

Value

data.frame of clustering results


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.