multi_omic_clustering | R Documentation |
Multi-omic clustering via multi-view clustering or integration
multi_omic_clustering(
dat_list,
meta_data = NULL,
multi_omic_methods = "ANF",
n_clusters = 2,
distance_metric = "euclidean",
correlation_method = "spearman",
standardize_data = FALSE,
non_negativity_transform = rep_len("none", length(dat_list)),
view_distributions = rep_len("gaussian", length(dat_list)),
icp_lambda = rep(0.03, length(dat_list)),
icp_burnin = 100,
icp_draw = 200,
icp_maxiter = 20,
icp_sdev = 0.05,
icp_eps = 1e-04,
icb_burnin = 1000,
icb_draw = 1200,
icb_sdev = 0.5,
icb_thin = 1,
nmf_maxiter = 200,
nmf_st.count = 20,
nmf_n.ini = 30,
nmf_ini.nndsvd = TRUE,
nmf_scaling = "F-ratio",
mofa_convergence_mode = "medium",
mofa_maxiter = 1000,
mofa_environment = NULL,
mofa_lib_path = NULL,
anf_neighbors = 20,
kkmeans_algorithm = "spectral",
kkmeans_refine = FALSE,
kkmeans_maxiter = 100,
kkmeans_n_init = 100,
kkmeans_tol = 1e-08,
mkkm_mr_lambda = 1,
mkkm_mr_tolerance = 1e-08,
mkkm_mr_mosek = FALSE,
mkkm_mr_mosek_verbosity = 1L,
ecmc_a = 1,
ecmc_b = 1,
ecmc_eps = 1e-06,
ecmc_maxiter = 100,
ecmc_mkkm_mr = TRUE,
data_is_kernels = FALSE,
zero_var_removal = TRUE,
mvc_threads = 1,
gene_id_list = NULL,
preprocess_data = TRUE,
...
)
dat_list |
List of input |
meta_data |
A single |
multi_omic_methods |
Vector of algorithm names to be applied. See details. |
n_clusters |
Integer vector of number of clusters to output. |
distance_metric |
Distance metric for clustering factorized data (only for MOFA). |
correlation_method |
Correlation method for |
standardize_data |
If set, standardizes data before clustering. |
non_negativity_transform |
Vector of transformation names for IntNMF. See details below. |
view_distributions |
A vector specifying the distribution to use for each view. Used by iCluster+, iClusterBayes and MOFA2. Options are "gaussian", "bernoulli" and "poisson". |
icp_lambda |
iCluster+ L1 penalty for each view.
See |
icp_burnin |
iCluster+ number of MCMC burn in samples for approximating
joint distribution of latent variables.
See |
icp_draw |
iCluster+ number of MCMC samples to draw after burn in for
approximating joint distribution of latent variables.
See |
icp_maxiter |
iCluster+ maximum number of Newton-Rhapson (EM) iterations.
See |
icp_sdev |
iCluster+ MCMC random walk standard deviation.
See |
icp_eps |
iCluster+ algorithm convergence threshold.
See |
icb_burnin |
iClusteBayes number of samples for MCMC burn in.
See |
icb_draw |
iClusteBayes number of MCMC samples to draw after burn in.
See |
icb_sdev |
iClusteBayes MCMC random walk standard deviation.
See |
icb_thin |
iClusteBayes MCMC thinning, only one sample in every icb_thin
samples will be used.
See |
nmf_maxiter |
Maxiter for IntNMF. See
|
nmf_st.count |
Count stability for IntNMF.
See |
nmf_n.ini |
Number of initializations for IntNMF.
See |
nmf_ini.nndsvd |
If set, IntNMF uses NNDSVD for initialization.
See |
nmf_scaling |
Omic weights that are used for scaling. Defaults to the Frobenius norm ratio similarly to Chalise et al. 2017. |
mofa_convergence_mode |
MOFA convergence threshold.
See |
mofa_maxiter |
MOFA maximum iterations.
See |
mofa_environment |
If set, uses the specified Python environment (with mofapy). Defaults to basilisk. |
mofa_lib_path |
Path to libpython. May be required if using non-default
|
anf_neighbors |
Number of neighbours to use in knn-graph. |
kkmeans_algorithm |
See |
kkmeans_refine |
See |
kkmeans_maxiter |
See |
kkmeans_n_init |
See |
kkmeans_tol |
See |
mkkm_mr_lambda |
Regularization parameter for |
mkkm_mr_tolerance |
Convergence threshold for |
mkkm_mr_mosek |
If set, uses |
mkkm_mr_mosek_verbosity |
MOSEK verbosity parameter for |
ecmc_a |
Regularization parameter for |
ecmc_b |
Regularization parameter for |
ecmc_eps |
Convergence threshold for |
ecmc_maxiter |
Maximum number of iterations for |
ecmc_mkkm_mr |
If set, uses |
data_is_kernels |
If |
zero_var_removal |
If set, removes all zero variance features from the data. It is called fold-wise, because this is assumed to be run inside CV. |
mvc_threads |
Number of threads to use for supported operations. |
gene_id_list |
List of gene/feature names for each view. If set, matches pipeline standardized feature names ("dim1", "dim2", ...) to names on the list. Required for pathway kernels. |
preprocess_data |
If the input data has already been processed by the
|
... |
Arguments are passed to |
Supported methods:
"ANF" - Affinity Network Fusion ANF
"iClusterPlus" or "iCluster+" - iClusterPlus
. Supports only up to 4 views.
"iClusterBayes" - codeiClusterBayes. Supports only up to 6 views
"IntNMF" - Integrative Non-negative Matrix Factorization
nmf.mnnals
.
"average_kernel" - kernel k-means with average kernel.
"mkkm_mr" - Multiple Kernel K-Means with Matrix-induced Regularization
mkkm_mr
.
"ECMC" - Enhanced Consensus Multi-view Clustering ECMC
.
"MOFA2" - Multi-Omics Factor Analysis.
See vignette("getting_started_R", "MOFA2")
.
Resulting factorization is clustered with single-view algorithms by using
clustering_analysis
.
For supported kernels see get_multi_omic_kernels
:
NMF non-negativity transform may be necessary if non-negativity was not considered while pre-processing the data. There are a few convenience functions included to transform the data as needed:
"logistic" - 1/(1 + exp(-x))
, maps input from (-Inf,Inf) to [0,1].
Used for e.g. microarray data or methylation M-values.
"rank" - ranks values and divides by length, maps input from (-Inf,Inf) to [0,1].
"offset2" - adds 2 to input. Useful for e.g. copy number alterations (assuming no alterations lower than -2).
data.frame
of clustering results
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.