GetISGL: 'GetISGL' fits a GLM model with Iterative Sparse Group Lasso...

View source: R/MORE_ISGL.R

GetISGLR Documentation

GetISGL fits a GLM model with Iterative Sparse Group Lasso (ISGL) penalization for all the genes in the data set to identify the experimental variables and potential regulators that show a relevant effect on the expression of each gene.

Description

GetISGL fits a GLM model with Iterative Sparse Group Lasso (ISGL) penalization for all the genes in the data set to identify the experimental variables and potential regulators that show a relevant effect on the expression of each gene.

Usage

GetISGL(
  GeneExpression,
  data.omics,
  associations = NULL,
  omic.type = 0,
  edesign = NULL,
  clinic = NULL,
  clinic.type = NULL,
  center = TRUE,
  scale = TRUE,
  interactions.reg = TRUE,
  min.variation = 0,
  gr.method = "cor",
  thres = 0.7
)

Arguments

GeneExpression

Data frame containing gene expression data with genes in rows and experimental samples in columns. Row names must be the gene IDs.

data.omics

List where each element corresponds to a different omic data type to be considered (miRNAs, transcription factors, methylation, etc.). The names of the list will represent the omics, and each element in the list should be a data matrix with omic regulators in rows and samples in columns.

associations

List where each element corresponds to a different omic data type (miRNAs, transcription factors, methylation, etc.). The names of the list will represent the omics. Each element in the list should be a data frame with 2 columns (optionally 3), describing the potential interactions between genes and regulators for that omic. First column must contain the genes (or features in GeneExpression object), second column must contain the regulators, and an optional third column can be added to describe the type of interaction (e.g., for methylation, if a CpG site is located in the promoter region of the gene, in the first exon, etc.). If the user lacks prior knowledge of the potential regulators, they can set the parameter to NULL. In this case, all regulators in data.omics will be treated as potential regulators for all genes. In this case, for computational efficiency, it is recommended to use pls2 method. Additionally, if the users have prior knowledge for certain omics and want to set other omics to NULL, they can do so.

edesign

Data frame describing the experimental design. Rows must be the samples (columns in GeneExpression) and columns must be the experimental variables to be included in the model (e.g. treatment, etc.).

clinic

Data.frame with all clinical variables to consider,with samples in rows and variables in columns.

clinic.type

Vector which indicates the type of data of variables introduced in clinic. The user should code as 0 numeric variables and as 1 categorical or binary variables. By default is set to NULL. In this case, the data type will be predicted automatically. However, the user must verify the prediction and manually input the vector if incorrect.

center

By default TRUE. It determines whether centering is applied to data.omics.

scale

By default TRUE. It determines whether scaling is applied to data.omics.

interactions.reg

If TRUE, the model includes interactions between regulators and experimental variables. By default, TRUE.

min.variation

For numerical regulators, it specifies the minimum change required across conditions to retain the regulator in the regression models. In the case of binary regulators, if the proportion of the most common value is equal to or inferior this value, the regulator is considered to have low variation and will be excluded from the regression models. The user has the option to set a single value to apply the same filter to all omics, provide a vector of the same length as omics if they want to specify different levels for each omics, or use 'NA' when they want to apply a minimum variation filter but are uncertain about the threshold. By default, 0.

gr.method

Methodology to apply to create the gorups. By default, cor.

thres

Threshold for the correlation when using gr.method ='cor' or threshold for the percentage of variability to explain when gr.method ='pca'. By default, 0.7.

Value

List containing the following elements:

  • ResultsPerGene : List with as many elements as genes in GeneExpression. For each gene, it includes information about gene values, considered variables, estimated coefficients, detailed information about all regulators, and regulators identified as relevant (in glm scenario) or significant (in pls scenarios).

  • GlobalSummary : List with information about the fitted models, including model metrics, information about regulators, genes without models, regulators, master regulators and hub genes.

  • Arguments : List containing all the arguments used to generate the models.


ConesaLab/MORE documentation built on March 7, 2024, 6:44 p.m.