fastAutoSmCCNet: Automated SmCCNet to Streamline the SmCCNet Pipeline

View source: R/AutoSmCCNet.R

fastAutoSmCCNetR Documentation

Automated SmCCNet to Streamline the SmCCNet Pipeline

Description

Automated SmCCNet automatically identifies the project problem (single-omics vs multi-omics), and type of analysis (CCA for quantitative phenotype vs. PLS for binary phenotype) based on the input data that is provided. This method automatically preprocesses data, chooses scaling factors, subsampling percentage, and optimal penalty terms, then runs through the complete SmCCNet pipeline without the requirement for users to provide additional information. This function will store all the subnetwork information to a user-defined directory, as well as return all the global network and evaluation information. Refer to the automated SmCCNet vignette for more information.

Usage

fastAutoSmCCNet(
  X,
  Y,
  AdjustedCovar = NULL,
  preprocess = FALSE,
  Kfold = 5,
  EvalMethod = "accuracy",
  subSampNum = 100,
  DataType,
  BetweenShrinkage = 2,
  ScalingPen = c(0.1, 0.1),
  CutHeight = 1 - 0.1^10,
  min_size = 10,
  max_size = 100,
  summarization = "NetSHy",
  saving_dir = getwd(),
  ncomp_pls = 3,
  tuneLength = 5,
  tuneRangeCCA = c(0.1, 0.5),
  tuneRangePLS = c(0.5, 0.9),
  seed = 123
)

Arguments

X

A list of matrices with same set and order of subjects (n).

Y

Phenotype variable of either numeric or binary, for binary variable, for binary Y, it should be binarized to 0,1 before running this function.

AdjustedCovar

A data frame of covariates of interest to be adjusted for through regressing-out approach, argument preprocess need to be set to TRUE if adjusting covariates are supplied.

preprocess

Whether the data preprocessing step should be conducted, default is set to FALSE. If regressing out covariates is needed, provide corresponding covariates to AdjustCovar argument.

Kfold

Number of folds for cross-validation, default is set to 5.

EvalMethod

The evaluation methods used to selected the optimal penalty parameter(s) when binary phenotype is given. The selections is among 'accuracy', 'auc', 'precision', 'recall', and 'f1', default is set to 'accuracy'.

subSampNum

Number of subsampling to run, the higher the better in terms of accuracy, but at a cost of computational time, we generally recommend 500-1000 to increase robustness for larger data, default is set to 100.

DataType

A vector indicating annotation of each dataset of X, example would be c('gene', 'miRNA').

BetweenShrinkage

A real number > 0 that helps shrink the importance of omics-omics correlation component, the larger this number is, the greater the shrinkage it is, default is set to 2.

ScalingPen

A numeric vector of length 2 used as the penalty terms for scaling factor determination method: default set to 0.1 for both datasets, and should be between 0 and 1.

CutHeight

A numeric value specifying the cut height for hierarchical clustering, should be between 0 and 1, default is set to 1 - 0.1^10.

min_size

Minimally possible subnetwork size after network pruning, default set to 10.

max_size

Maximally possible subnetwork size after network pruning, default set to 100.

summarization

Summarization method used for network pruning and summarization, should be either 'NetSHy' or 'PCA'.

saving_dir

Directory where user would like to store the subnetwork results, default is set to the current working directory.

ncomp_pls

Number of components for PLS algorithm, only used when binary phenotype is given, default is set to 3.

tuneLength

The total number of candidate penalty term values for each omics data, default is set to 5.

tuneRangeCCA

A vector of length 2 that represents the range of candidate penalty term values for each omics data based on canonical correlation analysis, default is set to c(0.1,0.5).

tuneRangePLS

A vector of length 2 that represents the range of candidate penalty term values for each omics data based on partial least squared discriminant analysis, default is set to c(0.5,0.9).

seed

Random seed for result reproducibility, default is set to 123.

Value

This function returns the global adjacency matrix, omics data details, network clustering outcomes, and cross-validation results. Pruned subnetwork modules are saved in the directory specified by the user.

Examples



# library(SmCCNet)
# set.seed(123)
# data("ExampleData")
# Y_binary <- ifelse(Y > quantile(Y, 0.5), 1, 0)
## single-omics PLS
# result <- fastAutoSmCCNet(X = list(X1), Y = as.factor(Y_binary), Kfold = 3, 
#                          subSampNum = 100, DataType = c('Gene'),
#                          saving_dir = getwd(), EvalMethod = 'auc', 
#                          summarization = 'NetSHy', 
#                          CutHeight = 1 - 0.1^10, ncomp_pls = 5)
## single-omics CCA
# result <- fastAutoSmCCNet(X = list(X1), Y = Y, Kfold = 3, preprocess = FALSE,
#                           subSampNum = 50, DataType = c('Gene'),
#                           saving_dir = getwd(), summarization = 'NetSHy',
#                           CutHeight = 1 - 0.1^10)
## multi-omics PLS
# result <- fastAutoSmCCNet(X = list(X1,X2), Y = as.factor(Y_binary), 
#                           Kfold = 3, subSampNum = 50, 
#                           DataType = c('Gene', 'miRNA'), 
#                           CutHeight = 1 - 0.1^10,
#                           saving_dir = getwd(), EvalMethod = 'auc', 
#                           summarization = 'NetSHy',
#                           BetweenShrinkage = 5, ncomp_pls = 3)
## multi-omics CCA
# result <- fastAutoSmCCNet(X = list(X1,X2), Y = Y, 
#                           K = 3, subSampNum = 50, DataType = c('Gene', 'miRNA'), 
#                           CutHeight = 1 - 0.1^10,
#                           saving_dir = getwd(),  
#                           summarization = 'NetSHy',
#                           BetweenShrinkage = 5)


KechrisLab/SmCCNet documentation built on April 18, 2024, 9:46 p.m.