more: more: Multi-Omics Regulation

View source: R/more.R

moreR Documentation

more: Multi-Omics Regulation

Description

more fits a GLM regression model (when the selected method is GLM) or a PLS model (when the selected method is PLS) for all genes in the dataset to identify the potential regulators that show a significant impact on gene expression under specific experimental conditions.

Usage

more(
  GeneExpression,
  data.omics,
  associations = NULL,
  omic.type = NULL,
  edesign = NULL,
  clinic = NULL,
  clinic.type = NULL,
  center = TRUE,
  scale = TRUE,
  scaletype = "auto",
  epsilon = 1e-05,
  min.variation = 0,
  interactions.reg = TRUE,
  family.glm = gaussian(),
  elasticnet.glm = NULL,
  col.filter.glm = "cor",
  correlation.glm = 0.7,
  thres.isgl = 0.7,
  gr.method.isgl = "cor",
  alfa.pls = 0.05,
  p.method.pls = "jack",
  vip.pls = 0.8,
  method = "glm"
)

Arguments

GeneExpression

Data frame containing gene expression data with genes in rows and experimental samples in columns. Row names must be the gene IDs.

data.omics

List where each element corresponds to a different omic data type to be considered (miRNAs, transcription factors, methylation, etc.). The names of the list will represent the omics, and each element in the list should be a data matrix with omic regulators in rows and samples in columns.

associations

List where each element corresponds to a different omic data type (miRNAs, transcription factors, methylation, etc.). The names of the list will represent the omics. Each element in the list should be a data frame with 2 columns (optionally 3), describing the potential interactions between genes and regulators for that omic. First column must contain the genes (or features in GeneExpression object), second column must contain the regulators, and an optional third column can be added to describe the type of interaction (e.g., for methylation, if a CpG site is located in the promoter region of the gene, in the first exon, etc.). If the user lacks prior knowledge of the potential regulators, they can set the parameter to NULL. In this case, all regulators in data.omics will be treated as potential regulators for all genes. In this case, for computational efficiency, it is recommended to use pls2 method. Additionally, if the users have prior knowledge for certain omics and want to set other omics to NULL, they can do so.

edesign

Data frame describing the experimental design. Rows must be the samples (columns in GeneExpression) and columns must be the experimental variables to be included in the model (e.g. treatment, etc.).

clinic

Data.frame with all clinical variables to consider,with samples in rows and variables in columns.

clinic.type

Vector which indicates the type of data of variables introduced in clinic. The user should code as 0 numeric variables and as 1 categorical or binary variables. By default is set to NULL. In this case, the data type will be predicted automatically. However, the user must verify the prediction and manually input the vector if incorrect.

center

By default TRUE. It determines whether centering is applied to data.omics.

scale

By default TRUE. It determines whether scaling is applied to data.omics.

scaletype

Type of scaling to be applied. Three options:

  • auto : Applies the autoscaling.

  • pareto : Applies the pareto scaling.

    \frac{X_k}{s_k \sqrt[4]{m_b}}

  • block : Applies the block scaling.

    \frac{X_k}{s_k \sqrt{m_b}}

considering m_b the number of variables of the block. By default, auto.

epsilon

Convergence threshold for coordinate descent algorithm in elasticnet. Default value, 1e-5.

min.variation

For numerical regulators, it specifies the minimum change required across conditions to retain the regulator in the regression models. In the case of binary regulators, if the proportion of the most common value is equal to or inferior this value, the regulator is considered to have low variation and will be excluded from the regression models. The user has the option to set a single value to apply the same filter to all omics, provide a vector of the same length as omics if they want to specify different levels for each omics, or use 'NA' when they want to apply a minimum variation filter but are uncertain about the threshold. By default, 0.

interactions.reg

If TRUE, the model includes interactions between regulators and experimental variables. By default, TRUE.

family.glm

Error distribution and link function to be used in the model when method glm. By default, gaussian().

elasticnet.glm

ElasticNet mixing parameter. There are three options:

  • NULL : The parameter is selected from a grid of values ranging from 0 to 1 with 0.1 increments. The chosen value optimizes the mean cross-validated error when optimizing the lambda values.

  • A number between 0 and 1 : ElasticNet is applied with this number being the combination between Ridge and Lasso penalization (elasticnet=0 is the ridge penalty, elasticnet=1 is the lasso penalty).

  • A vector with the mixing parameters to try. The one that optimizes the mean cross-validated error when optimizing the lambda values will be used.

By default, NULL.

col.filter.glm

Type of correlation coefficients to use when applying the multicollinearity filter when glm method is used.

  • cor: Computes the correlation between omics. Pearson correlation between numeric variables, phi coefficient between numeric and binary and biserial correlation between binary variables.

  • pcor : Computes the partial correlation.

correlation.glm

Value to determine the presence of collinearity between two regulators when using the glm method. By default, 0.7.

thres.isgl

Threshold for the correlation when gr.method.isgl is 'cor' or threshold for the percentage of variability to explain when 'pca'. By default, 0.7.

gr.method.isgl

Grouping approach to create groups of variables in ISGL penalization. There are two options: 'cor' to cluster variables using correlations and 'pca' to use Principal Component Analysis approach. By default, 'cor'.

alfa.pls

Significance level for variable selection in pls1and pls2 method. By default, 0.05.

p.method.pls

Type of resampling method to apply for the p-value calculation when pls1 or pls2 method. Two options:

  • jack : Applies Jack-Knife resampling technique.

  • perm : Applies a resampling technique in which the response variable is permuted 100 times to obtain the distribution of the coefficients and compute then their associated p-value.

By default, jack.

vip.pls

Value of VIP above which a variable can be considered significant in addition to the computed p-value in p.method. By default, 0.8.

method

Model to be fitted. Four options:

  • glm : Applies a Generalized Linear Model (GLM) with ElasticNet regularization.

  • pls1 : Applies a Partial Least Squares (PLS) model, one for each of the genes at GeneExpression.

  • pls2 : Applies a PLS model to all genes at the same time, only possible when associations= NULL.

  • isgl : Applies a Generalized Linear Model (GLM) with Iterative Sparse Group Lasso (ISGL) regularization.

By default, glm.

Value

List containing the following elements:

  • ResultsPerGene : List with as many elements as genes in GeneExpression. For each gene, it includes information about gene values, considered variables, estimated coefficients, detailed information about all regulators, and regulators identified as relevant (in glm scenario) or significant (in pls scenarios).

  • GlobalSummary : List with information about the fitted models, including model metrics, information about regulators, genes without models, regulators, master regulators and hub genes.

  • Arguments : List containing all the arguments used to generate the models.

Examples


data(TestData)

#Omic type
omic.type = c(1,0,0)
names(omic.type) = names(TestData$data.omics)
SimGLM = more(GeneExpression = TestData$GeneExpressionDE,
              associations = TestData$associations, 
              data.omics = TestData$data.omics,
              omic.type = omic.type,
              edesign = TestData$edesign,
              center = TRUE, scale = TRUE, 
              scaltype = 'auto',
              epsilon = 0.00001, family.glm = gaussian(), elasticnet = NULL,
              interactions.reg = TRUE,min.variation = 0,  col.filter.glm = 'cor',
              correlation.glm = 0.7, method  ='glm')


ConesaLab/MORE documentation built on March 7, 2024, 6:44 p.m.