knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(Hitman)

Simulate data

To demonstrate the package on a dataset, I'll simulate log2 expression data with 100 genes and 6 samples in two groups with a continuous covariate and a binary outcome. The covariate and the first gene will be associated with group. This is similar to the example data from limma contrasts.fit.

We treat this data as being already processed. For many datasets, there are zeroes or missing values that might need to be imputed; samples need to be normalized to make them comparable and amenable to statistical tests; absent genes need to be removed; sample outliers need to be assessed to examine whether some experimental variables or batch effects need to be accounted for, or the samples need to be removed or down-weighted; and trends between a gene's mean expression and its variance should be accounted for, especially in RNA-seq data, with limma's voom function.

set.seed(42)
M <- matrix(rnorm(100*6, sd=0.3), nrow=100, ncol=6)
dimnames(M) <- list(paste0("gene", 1:nrow(M)), paste0("sample", 1:ncol(M)))
M[1,1:3] <- M[1,1:3] + 2

pheno <- data.frame(grp=rep(c("First3", "Last3"), each=3), yy=rnorm(ncol(M)), stringsAsFactors = FALSE)
pheno$yy[1:3] <- pheno$yy[1:3]-2

Hitman

We can test what genes mediate the effect of group on the continuous phenotype. This is called high-throughput mediation analysis (hitman). We set the exposure (E) to group, the outcome (Y) to pheno.v, and test each gene in M as a mediator. For hitman we transform the exposure to numeric, and set the names of the phenoype to those of M.

exposure <- as.numeric(pheno$grp == "Last3")
res.hit <- hitman(E = exposure, M = M, Y = setNames(pheno$yy, nm=colnames(M)))

hitman returns a table with a combined ("EMY") p-value and FDR for each gene. This combined statistic is based on the association of E to the gene in M, and the conditional association of the gene in M to Y given E.

knitr::kable(signif(res.hit[1:10,], 3))


jdreyf/Hitman documentation built on April 12, 2025, 1:35 p.m.