Description Usage Arguments Details Value Examples

View source: R/interaction_model.R

Evaluates regulatory potential of DNA methylation (DNAm) on gene expression, by fitting robust linear model or zero inflated negative binomial model to triplet data. These models consist of terms to model direct effect of DNAm on target gene expression, direct effect of TF on gene expression, as well as an interaction term that evaluates the synergistic effect of DNAm and TF on gene expression.

1 2 3 4 5 6 7 8 9 10 11 12 13 |

`triplet` |
Data frame with columns for DNA methylation region (regionID), TF (TF), and target gene (target) |

`dnam` |
DNA methylation matrix or SummarizedExperiment object
(columns: samples in the same order as |

`exp` |
A matrix or SummarizedExperiment object object
(columns: samples in the same order as |

`cores` |
Number of CPU cores to be used. Default 1. |

`tf.activity.es` |
A matrix with normalized enrichment scores for each TF across all samples
to be used in linear models instead of TF gene expression. See |

`sig.threshold` |
Threshold to filter significant triplets. Select if interaction.pval < 0.05 or pval.dnam <0.05 or pval.tf < 0.05 in binary model |

`fdr` |
Uses fdr when using sig.threshold. Select if interaction.fdr < 0.05 or fdr.dnam <0.05 or fdr.tf < 0.05 in binary model |

`filter.correlated.tf.exp.dnam` |
If wilcoxon test of TF expression Q1 and Q4 is significant (pvalue < 0.05), triplet will be removed. |

`filter.triplet.by.sig.term` |
Filter significant triplets ? Select if interaction.pval < 0.05 or pval.dnam <0.05 or pval.tf < 0.05 in binary model |

`stage.wise.analysis` |
A boolean indicating if stagewise analysis should be performed to correct for multiple comparisons. If set to FALSE FDR analysis is performed. |

`verbose` |
A logical argument indicating if messages output should be provided. |

This function fits the linear model

`log2(RNA target) ~ log2(TF) + DNAm + log2(TF) * DNAm`

to triplet data as follow:

Model by considering `DNAm`

as a binary variable - we defined a binary group for
DNA methylation values (high = 1, low = 0). That is, samples with the highest
DNAm levels (top 25 percent) has high = 1, samples with lowest
DNAm levels (bottom 25 percent) has high = 0. Note that in this
implementation, only samples with DNAm values in the first and last quartiles
are considered.

In these models, the term `log2(TF)`

evaluates direct effect of TF on
target gene expression, `DNAm`

evaluates direct effect of DNAm on target
gene expression, and `log2(TF)*DNAm`

evaluates synergistic effect of DNAm
and TF, that is, if TF regulatory activity is modified by DNAm.

There are two implementations of these models, depending on whether there are an excessive amount (i.e. more than 25 percent) of samples with zero counts in RNAseq data:

When percent of zeros in RNAseq data is less than 25 percent, robust linear models are implemented using

`rlm`

function from`MASS`

package. This gives outlier gene expression values reduced weight. We used`"psi.bisqure"`

option in function`rlm`

(bisquare weighting, https://stats.idre.ucla.edu/r/dae/robust-regression/).When percent of zeros in RNAseq data is more than 25 percent, zero inflated negative binomial models are implemented using

`zeroinfl`

function from`pscl`

package. This assumes there are two processes that generated zeros (1) one where the counts are always zero (2) another where the count follows a negative binomial distribution.

To account for confounding effects from covariate variables, first use the `get_residuals`

function to obtain
RNA or DNAm residual values which have covariate effects removed, then fit interaction model. Note that no
log2 transformation is needed when `interaction_model`

is applied to residuals data.

Note that only triplets with TF expression not significantly different in high vs. low methylation groups will be evaluated (Wilcoxon test, p > 0.05).

A dataframe with `Region, TF, target, TF_symbo, target_symbol, estimates and P-values`

,
after fitting robust linear models or zero-inflated negative binomial models (see Details above).

Model considering DNAm values as a binary variable generates `quant_pval_metGrp`

,
`quant_pval_rna.tf`

, `quant_pval_metGrp.rna.tf`

,
`quant_estimates_metGrp`

, `quant_estimates_rna.tf`

, `quant_estimates_metGrp.rna.tf`

.

`Model.interaction`

indicates which model (robust linear model or zero inflated model)
was used to fit Model 1, and `Model.quantile`

indicates which model(robust linear model or zero
inflated model) was used to fit Model 2.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ```
library(dplyr)
dnam <- runif(20,min = 0,max = 1) %>%
matrix(ncol = 1) %>% t
rownames(dnam) <- c("chr3:203727581-203728580")
colnames(dnam) <- paste0("Samples",1:20)
exp.target <- runif(20,min = 0,max = 10) %>%
matrix(ncol = 1) %>% t
rownames(exp.target) <- c("ENSG00000232886")
colnames(exp.target) <- paste0("Samples",1:20)
exp.tf <- runif(20,min = 0,max = 10) %>%
matrix(ncol = 1) %>% t
rownames(exp.tf) <- c("ENSG00000232888")
colnames(exp.tf) <- paste0("Samples",1:20)
exp <- rbind(exp.tf, exp.target)
triplet <- data.frame(
"regionID" = c("chr3:203727581-203728580"),
"target" = "ENSG00000232886",
"TF" = "ENSG00000232888"
)
results <- interaction_model(triplet, dnam, exp)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.