stratified_model: Fits linear models to triplet data (Target, TF, DNAm) for...

View source: R/stratified_model.R

stratified_modelR Documentation

Fits linear models to triplet data (Target, TF, DNAm) for samples with high DNAm or low DNAm separately, and annotates TF (activator/repressor) and DNam effect over TF activity (attenuate, enhance).

Description

Should be used after fitting interaction_model, and only for triplet data with significant TF*DNAm interaction. This analysis examines in more details on how TF activities differ in samples with high DNAm or low DNAm values.

Usage

stratified_model(
  triplet,
  dnam,
  exp,
  cores = 1,
  tf.activity.es = NULL,
  tf.dnam.classifier.pval.thld = 0.001,
  dnam.group.threshold = 0.25
)

Arguments

triplet

Data frame with columns for DNA methylation region (regionID), TF (TF), and target gene (target)

dnam

DNA methylation matrix or SummarizedExperiment (columns: samples in the same order as exp matrix, rows: regions/probes)

exp

A matrix or SummarizedExperiment (columns: samples in the same order as dnam matrix, rows: genes represented by ensembl IDs (e.g. ENSG00000239415))

cores

Number of CPU cores to be used. Default 1.

tf.activity.es

A matrix with normalized enrichment scores for each TF across all samples to be used in linear models instead of TF gene expression.

tf.dnam.classifier.pval.thld

P-value threshold to consider a linear model significant of not. Default 0.001. This will be used to classify the TF role and DNAm effect.

dnam.group.threshold

DNA methylation threshold percentage to define samples in the low methylated group and high methylated group. For example, setting the threshold to 0.3 (30%) will assign samples with the lowest 30% methylation in the low group and the highest 30% methylation in the high group. Default is 0.25 (25%), accepted threshold range (0.0,0.5].

Details

This function fits linear model log2(RNA target) = log2(TF)

to samples with highest DNAm values (top 25 percent) or lowest DNAm values (bottom 25 percent), separately.

There are two implementations of these models, depending on whether there are an excessive amount (i.e. more than 25 percent) of samples with zero counts in RNAseq data:

  • When percent of zeros in RNAseq data is less than 25 percent, robust linear models are implemented using rlm function from MASS package. This gives outlier gene expression values reduced weight. We used "psi.bisqure" option in function rlm (bisquare weighting, https://stats.idre.ucla.edu/r/dae/robust-regression/).

  • When percent of zeros in RNAseq data is more than 25 percent, zero inflated negative binomial models are implemented using zeroinfl function from pscl package. This assumes there are two processes that generated zeros (1) one where the counts are always zero (2) another where the count follows a negative binomial distribution.

To account for confounding effects from covariate variables, first use the get_residuals function to obtain RNA residual values which have covariate effects removed, then fit interaction model. Note that no log2 transformation is needed when interaction_model is applied to residuals data.

This function also provides annotations for TFs. A TF is annotated as activator if increasing amount of TF (higher TF gene expression) corresponds to increased target gene expression. A TF is annotated as repressor if increasing amount of TF (higher TF gene expression) corresponds to decrease in target gene expression. A TF is annotated as dual if in the Q1 methylation group increasing amount of TF (higher TF gene expression) corresponds to increase in target gene expression, while in Q4 methylation group increasing amount of TF (higher TF gene expression) corresponds to decrease in target gene expression (or the same but changing Q1 and Q4 in the previous sentence).

In addition, a region/CpG is annotated as enhancing if more TF regulation on gene transcription is observed in samples with high DNAm. That is, DNA methylation enhances TF regulation on target gene expression. On the other hand, a region/CpG is annotated as attenuating if more TF regulation on gene transcription is observed in samples with low DNAm. That is, DNA methylation reduces TF regulation on target gene expression.

Value

A data frame with Region, TF, target, TF_symbol target_symbol, results for fitting linear models to samples with low methylation (DNAmlow_pval_rna.tf, DNAmlow_estimate_rna.tf), or samples with high methylation (DNAmhigh_pval_rna.tf, DNAmhigh_pval_rna.tf.1), annotations for TF (class.TF) and (class.TF.DNAm).

Examples

library(dplyr)
dnam <- runif (20,min = 0,max = 1) %>%
  matrix(ncol = 1) %>%  t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)

exp.target <-  runif (20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.target) <- c("ENSG00000232886")
colnames(exp.target) <- paste0("Samples",1:20)

exp.tf <- runif (20,min = 0,max = 10) %>%
  matrix(ncol = 1) %>%  t
rownames(exp.tf) <- c("ENSG00000232888")
colnames(exp.tf) <- paste0("Samples",1:20)

exp <- rbind(exp.tf, exp.target)

triplet <- data.frame(
   "regionID" =  c("chr3:203727581-203728580"),
   "target" = "ENSG00000232886",
   "TF" = "ENSG00000232888"
)

results <- stratified_model(
  triplet = triplet,
  dnam = dnam,
  exp = exp
)

TransBioInfoLab/MethReg documentation built on July 28, 2023, 9:17 p.m.