interaction_model: Fits linear models with interaction to triplet data (Target,...

Description Usage Arguments Details Value Examples

View source: R/interaction_model.R

Description

Evaluates regulatory potential of DNA methylation (DNAm) on gene expression, by fitting robust linear model or zero inflated negative binomial model to triplet data. These models consist of terms to model direct effect of DNAm on target gene expression, direct effect of TF on gene expression, as well as an interaction term that evaluates the synergistic effect of DNAm and TF on gene expression.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
interaction_model(
  triplet,
  dnam,
  exp,
  cores = 1,
  tf.activity.es = NULL,
  sig.threshold = 0.05,
  fdr = TRUE,
  filter.correlated.tf.exp.dnam = TRUE,
  filter.triplet.by.sig.term = TRUE,
  stage.wise.analysis = FALSE,
  verbose = FALSE
)

Arguments

triplet

Data frame with columns for DNA methylation region (regionID), TF (TF), and target gene (target)

dnam

DNA methylation matrix or SummarizedExperiment object (columns: samples in the same order as exp matrix, rows: regions/probes)

exp

A matrix or SummarizedExperiment object object (columns: samples in the same order as dnam, rows: genes represented by ensembl IDs (e.g. ENSG00000239415))

cores

Number of CPU cores to be used. Default 1.

tf.activity.es

A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression. See get_tf_ES.

sig.threshold

Threshold to filter significant triplets. Select if interaction.pval < 0.05 or pval.dnam <0.05 or pval.tf < 0.05 in binary model

fdr

Uses fdr when using sig.threshold. Select if interaction.fdr < 0.05 or fdr.dnam <0.05 or fdr.tf < 0.05 in binary model

filter.correlated.tf.exp.dnam

If wilcoxon test of TF expression Q1 and Q4 is significant (pvalue < 0.05), triplet will be removed.

filter.triplet.by.sig.term

Filter significant triplets ? Select if interaction.pval < 0.05 or pval.dnam <0.05 or pval.tf < 0.05 in binary model

stage.wise.analysis

A boolean indicating if stagewise analysis should be performed to correct for multiple comparisons. If set to FALSE FDR analysis is performed.

verbose

A logical argument indicating if messages output should be provided.

Details

This function fits the linear model

log2(RNA target) ~ log2(TF) + DNAm + log2(TF) * DNAm

to triplet data as follow:

Model by considering DNAm as a binary variable - we defined a binary group for DNA methylation values (high = 1, low = 0). That is, samples with the highest DNAm levels (top 25 percent) has high = 1, samples with lowest DNAm levels (bottom 25 percent) has high = 0. Note that in this implementation, only samples with DNAm values in the first and last quartiles are considered.

In these models, the term log2(TF) evaluates direct effect of TF on target gene expression, DNAm evaluates direct effect of DNAm on target gene expression, and log2(TF)*DNAm evaluates synergistic effect of DNAm and TF, that is, if TF regulatory activity is modified by DNAm.

There are two implementations of these models, depending on whether there are an excessive amount (i.e. more than 25 percent) of samples with zero counts in RNAseq data:

To account for confounding effects from covariate variables, first use the get_residuals function to obtain RNA or DNAm residual values which have covariate effects removed, then fit interaction model. Note that no log2 transformation is needed when interaction_model is applied to residuals data.

Note that only triplets with TF expression not significantly different in high vs. low methylation groups will be evaluated (Wilcoxon test, p > 0.05).

Value

A dataframe with Region, TF, target, TF_symbo, target_symbol, estimates and P-values, after fitting robust linear models or zero-inflated negative binomial models (see Details above).

Model considering DNAm values as a binary variable generates quant_pval_metGrp, quant_pval_rna.tf, quant_pval_metGrp.rna.tf, quant_estimates_metGrp, quant_estimates_rna.tf, quant_estimates_metGrp.rna.tf.

Model.interaction indicates which model (robust linear model or zero inflated model) was used to fit Model 1, and Model.quantile indicates which model(robust linear model or zero inflated model) was used to fit Model 2.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000232886")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif(20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000232888")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000232886",
   "TF" = "ENSG00000232888"
)
results <- interaction_model(triplet, dnam, exp)

MethReg documentation built on Nov. 8, 2020, 8:01 p.m.